CN116187483A - Model training method, device, apparatus, medium and program product - Google Patents
Model training method, device, apparatus, medium and program product Download PDFInfo
- Publication number
- CN116187483A CN116187483A CN202310093404.9A CN202310093404A CN116187483A CN 116187483 A CN116187483 A CN 116187483A CN 202310093404 A CN202310093404 A CN 202310093404A CN 116187483 A CN116187483 A CN 116187483A
- Authority
- CN
- China
- Prior art keywords
- edge
- server
- training
- equipment
- deployed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 119
- 238000000034 method Methods 0.000 title claims abstract description 79
- 230000006854 communication Effects 0.000 claims abstract description 25
- 238000004891 communication Methods 0.000 claims abstract description 24
- 230000008569 process Effects 0.000 claims abstract description 12
- 238000004590 computer program Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 21
- 238000004364 calculation method Methods 0.000 claims description 16
- 238000003860 storage Methods 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 claims description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000001186 cumulative effect Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 230000002776 aggregation Effects 0.000 description 3
- 238000004220 aggregation Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000008713 feedback mechanism Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 108010003272 Hyaluronate lyase Proteins 0.000 description 1
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Electrically Operated Instructional Devices (AREA)
Abstract
The present application relates to a model training method, apparatus, device, medium and program product. The method is applied to a federal learning system, the federal learning system comprises an edge server and a plurality of edge devices which are in communication connection with the edge server, and the method comprises the steps that in the process of training a global model deployed in the edge server, firstly, quality information which is sent by each edge device and used for representing the contribution degree of gradient information of a local model deployed in the edge device to the global model training is received, then, target edge devices are determined from the plurality of edge devices according to the quality information sent by each edge device, and the target edge devices are instructed to upload the gradient information of the local model deployed by the target edge devices; and finally, training the global model based on gradient information uploaded by the target edge equipment. By adopting the method, the efficiency and accuracy of global model training can be improved, and the performance is higher.
Description
Technical Field
The present application relates to the field of artificial intelligence technology, and in particular, to a model training method, apparatus, device, medium, and program product.
Background
With the development of mobile internet technology, the explosive growth of intelligent application and the massive data generated by network edges make the computation-intensive task generate larger time delay and bandwidth burden in terms of data transmission and model computation, and meanwhile, privacy security problems exist. Federal learning is a distributed machine learning technology, and can realize balance of data privacy protection and data sharing calculation by performing distributed model training among a plurality of data sources with local data and constructing a global model based on virtual fusion data only by exchanging model parameters or gradients of model parameters on the premise of not exchanging local individual or sample data.
However, when the global model is trained in the present federal learning, as the number of edge devices increases, the required wireless resources also increase, so that huge communication delay and resource consumption are brought, and the performance is not high.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a model training method, apparatus, device, medium and program product that have improved performance.
In a first aspect, the present application provides a model training method applied to an edge server in a federal learning system, where the federal learning system includes the edge server and a plurality of edge devices that establish a communication connection with the edge server, the method including:
In the process of training the global model deployed in the edge server, receiving quality information sent by each edge device, wherein the quality information is used for representing the contribution degree of gradient information of a local model deployed in the edge device to the global model training; determining target edge equipment from a plurality of edge equipment according to quality information sent by each edge equipment, and indicating the target edge equipment to upload gradient information of a local model deployed by the target edge equipment; and training the global model based on gradient information uploaded by the target edge equipment.
In one embodiment, determining a target edge device from a plurality of edge devices according to quality information sent by each edge device includes: and determining the target edge equipment from the plurality of edge equipment according to the convergence speed of federal learning and the quality information sent by each edge equipment.
In one embodiment, determining a target edge device from a plurality of edge devices according to a federally learned convergence rate and quality information sent by each edge device includes: obtaining a drift penalty function, wherein the drift penalty function is constructed according to the convergence speed of federal learning and quality information sent by each edge device; obtaining constraint conditions which are constructed according to the transmission energy information of each edge device; based on constraint conditions, taking the minimum value of the drift penalty function as a target, and solving the drift penalty function to obtain equipment quantity information; and determining the target edge device from the plurality of edge devices according to the device quantity information.
In one embodiment, determining a target edge device from a plurality of edge devices according to the device number information includes: according to the quality information sent by each edge device, determining the number of edge devices meeting the preset quality requirement in the plurality of edge devices as target edge devices.
In one embodiment, the method further comprises: acquiring current global model parameters of a global model; and sending the current global model parameters to each edge device, wherein the current global model parameters are used for each edge device to train the local model deployed respectively based on the current global model parameters so as to obtain gradient information of the local model.
In a second aspect, the present application further provides a model training method applied to an edge device in a federal learning system, where the federal learning system includes a plurality of edge devices and an edge server having communication connection with each edge device, and the method includes: transmitting quality information to an edge server, wherein the quality information is used for representing the contribution degree of gradient information of a local model deployed in edge equipment to global model training deployed in the edge server; and if the indication sent by the edge server is received, gradient information of the local model deployed in the edge device is sent to the edge server, the gradient information is used for the edge server to train the global model deployed in the middle edge server according to the gradient information, and the indication is that the edge server determines target edge devices from a plurality of edge devices according to quality information sent by the edge devices and sends the target edge devices.
In one embodiment, sending gradient information of a local model deployed in an edge device to an edge server for training a global model deployed in the edge server includes: determining whether the edge device receives an indication sent by an edge server in the last round; and acquiring gradient information of the local model deployed in the edge equipment according to the determination result, and sending the gradient information to the edge server.
In one embodiment, according to a determination result, acquiring gradient information of a local model deployed in an edge device includes: and if the edge equipment in the previous round receives the indication sent by the edge server, taking the gradient of the local model deployed in the edge equipment obtained by the calculation in the current round as gradient information.
In one embodiment, according to a determination result, acquiring gradient information of a local model deployed in an edge device includes: and if the edge equipment in the previous round does not receive the indication sent by the edge server, taking the sum of the gradient of the local model deployed in the edge equipment obtained by the calculation in the current round and the undelivered accumulated gradient as gradient information.
In one embodiment, the method further comprises: receiving current global model parameters of a global model deployed in an edge server; and performing local round training on the local model deployed in the edge equipment according to the current global model parameters to obtain gradient information of the local model.
In a third aspect, the present application further provides a model training device, which is applied to an edge server in a federal learning system, where the federal learning system includes a plurality of edge devices and an edge server that has communication connection with each edge device, and the device includes:
the receiving module is used for receiving quality information sent by each edge device in the process of training the global model deployed in the edge server, wherein the quality information is used for representing the contribution degree of gradient information of the local model deployed in the edge device to the global model training;
the determining module is used for determining target edge equipment from a plurality of edge equipment according to the quality information sent by each edge equipment and indicating the target edge equipment to upload gradient information of a local model deployed by the target edge equipment;
and the training module is used for training the global model based on gradient information uploaded by the target edge equipment.
In a fourth aspect, the present application further provides a model training device, which is applied to an edge device in a federal learning system, where the federal learning system includes a plurality of edge devices and an edge server that has communication connection with each edge device, and the device includes:
The first sending module is used for sending quality information to the edge server, wherein the quality information is used for representing the contribution degree of gradient information of the local model deployed in the edge equipment to global model training deployed in the edge server;
and the second sending module is used for sending the gradient information of the local model deployed in the edge equipment to the edge server for training the global model deployed in the edge server if the indication sent by the edge server is received, wherein the indication is determined by the edge server according to the quality information sent by each edge equipment.
In a fifth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the model training method according to any of the first or second aspects above when the computer program is executed.
In a sixth aspect, the present application also provides a computer-readable storage medium. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the model training method of any of the first or second aspects described above.
In a seventh aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the model training method of any of the first or second aspects described above.
The model training method, the device, the equipment, the medium and the program product are applied to a federal learning system, the federal learning system comprises an edge server and a plurality of edge equipment which are in communication connection with the edge server, and in the process that the edge server trains a global model deployed in the edge server, firstly, quality information which is sent by each edge equipment and used for representing the contribution degree of gradient information of a local model deployed in the edge equipment to global model training is received, then, a target edge equipment is determined from the plurality of edge equipment according to the quality information sent by each edge equipment, and the target edge equipment is instructed to upload the gradient information of the local model deployed by the target edge equipment; and finally, training the global model based on gradient information uploaded by the target edge equipment. In this way, in the training process of the global model of the edge server, the edge equipment with proper quantity and equipment quality is selected according to the quality information sent by the edge equipment, and the global model is trained according to the gradient information of the local model uploaded by the selected target edge equipment.
Drawings
FIG. 1 is a diagram of an application environment for a model training method in one embodiment;
FIG. 2 is a flow diagram of a model training method in one embodiment;
FIG. 3 is a flow chart of a model training method in another embodiment;
FIG. 4 is a flow chart of a model training method in another embodiment;
FIG. 5 is a flow chart of a model training method in another embodiment;
FIG. 6 is a flow chart of a model training method in another embodiment;
FIG. 7 is a flow chart of a model training method in another embodiment;
FIG. 8 is a flow chart of a model training method in another embodiment;
FIG. 9 is a schematic diagram of a model training method in another embodiment;
FIG. 10 is a block diagram of a model training device in another embodiment;
FIG. 11 is a block diagram of a model training device in another embodiment;
FIG. 12 is an internal block diagram of a computer device in one embodiment;
fig. 13 is an internal structural view of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
The model training method provided by the embodiment of the application can be applied to the federal learning system shown in the figure 1. The federal learning system includes an edge server 101 and a plurality of edge devices 102, the edge server 101 having a communication connection established with the edge devices 102. The edge server 101 may be implemented as a stand-alone server or a server cluster formed by a plurality of servers. The edge device 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, and the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like.
The edge server 101 has a global model deployed thereon, and the edge device 102 has a local model and a local data set deployed thereon.
In one embodiment, as shown in fig. 2, a model training method is provided, and the method is applied to the edge server in fig. 1 for illustration, and includes the following steps:
in step 201, the edge server receives quality information sent by each edge device during training of the global model deployed in the edge server.
The quality information is used to characterize the degree of contribution of gradient information of a local model deployed in the edge device to global model training. The federal learning is a distributed machine learning technology, and a global model based on virtual fusion data is built by performing distributed model training among a plurality of data sources with local data only through a mode of exchanging model parameters or gradients of model parameters on the premise that local individual or sample data do not need to be exchanged. That is, gradient information of the local model in the edge device is required in the process of training the global model in the edge server. In order to reduce communication delay and resource overhead, only a proper number of edge devices are selected to participate in aggregation and global model updating in each round of training, and the edge server needs to receive quality information sent by the edge devices for the selection criteria of the edge devices because different edge devices have different contribution degrees to improving the accuracy and convergence of the global model in the edge server.
The target edge device is an edge device which is selected from a plurality of edge devices and meets the quality requirement and is used for participating in aggregation and global model updating, the target edge device can be multiple, and after the target edge device is determined, the edge server instructs the target edge device to upload gradient information of a local model on the target edge device for training of the global model.
In step 203, the edge server trains the global model based on the gradient information uploaded by the target edge device.
Alternatively, the federal learning system of the present application consists of one edge server and N distributed single-antenna edge devices,wherein each edge device n locally collects or generates a data set D n . Define a global model update to a communication round to +.>Is indexed, where T is the total number of communication rounds trained by the global model. S is S t Representing the set of edge devices selected for the t-th round, the number of which is +.>Wherein a is n,t E {0,1} represents a binary variable, a, of whether edge device n was selected at round t n,t =1 represents selected, a n,t =0 indicates not selected.
The gradient information uploaded by the edge equipment is aggregated through an uplink based on air calculation, the calculation is completed in the transmission process during the air calculation, and in each round of training process, the target edge equipment transmits local model gradient information in parallel through sharing a common radio resource block, so that an aggregated signal is obtained as follows:
Wherein x is n,t For the gradient information sent by edge device n at round t,phi (·) represents the preprocessing and normalization operations to ensure x n,t With zero mean and squareDifference P n,t ;/>Local model gradient information of the edge device n in the t-th training. />For quasi-static channel gain between edge device n and edge server in round t,representing a complex set. />Mean value 0, variance +.>Additive white gaussian noise of (c).
Edge server receives aggregate signal y t And performing post-treatment and normalization removal operation, then performing global model updating, finishing the training of the round, and repeating the steps until the global model converges or reaches a preset training round. The edge server updates the global model according to the following equation:
wherein w is t-1 The current global model parameters are used for training the t-th round; η (eta) t Is the learning rate of the federal learning system; phi- 1 (. Cndot.) represents post-processing and de-normalization operations performed on the edge servers; sigma (sigma) t Representing a power spreading factor, namely a denoising factor, so as to determine a Signal-Noise Ratio (SNR) of a server receiving end;
the model training method is applied to a federal learning system, the federal learning system comprises an edge server and a plurality of edge devices which are in communication connection with the edge server, and in the process that the edge server trains a global model deployed in the edge server, firstly, quality information which is sent by each edge device and used for representing the contribution degree of gradient information of a local model deployed in the edge device to the global model training is received, then, target edge devices are determined from the plurality of edge devices according to the quality information sent by each edge device, and the target edge devices are instructed to upload the gradient information of the local model deployed by the target edge devices; and finally, training the global model based on gradient information uploaded by the target edge equipment. In this way, in the training process of the global model of the edge server, the edge equipment with proper quantity and equipment quality is selected according to the quality information sent by the edge equipment, and the global model is trained according to the gradient information of the local model uploaded by the selected target edge equipment.
As shown in fig. 3, the step of determining a target edge device from a plurality of edge devices according to the federally learned convergence rate and quality information sent by each edge device includes:
in step 301, the edge server obtains a drift penalty term function.
The drift penalty term function is constructed from the convergence rate of federal learning and the quality information sent by each edge device.
In step 302, the edge server obtains constraints.
The constraint is constructed from the transmission energy information of each edge device. For example, the average transmission energy of the edge device n in the T-wheel training is constrained to be
And 303, based on the constraint condition, the edge server aims at minimizing the value of the drift penalty function, and solves the drift penalty function to obtain the equipment quantity information.
The specific calculation procedure described above is as follows, given that, when m +.n,thus measuring the overall dataset +.>The global loss function of (2) is:
wherein,,the global loss function obtained by equation (3) is used to measure the average fitting degree of the global model to the whole data set, and the objective of federal learning is to find the optimal model parameter w * To minimize the global loss function or its expected value, namely:
Applying the telescoping and rules, the optimization problem of edge device selection can be expressed as:
wherein E [. Cndot.]Is performed on the randomness of channel noise and data sampling based on local random gradient descent, and the first constraint represents that the average transmission energy of the edge device n in the T-round training is constrained to beThe form of the observation problem P1 shows that E [ f (w) t )]-E[f(w t-1 )]Is more difficult to express. To solve this problem, the present application uses convergence analysis to obtain E [ f (w) t )]-E[f(w t-1 )]The closed upper bound of (2) is approximately replaced, and the transmission power, the channel noise, the average accumulated residual upper bound and the random gradient are comprehensively considered for analysis. Given a set of scheduling devices S for the t-th round t And small lot size->The single round convergence rate for federal learning is given by:
where l is a smoothing constant, η t Is the learning rate, m is the estimated average of the global gradients, g t Is the gradient of the global model and,representing the average probability that the device is not selected in the t-th round, G 2 Is the upper bound of variance, delta, of the local random gradient 2 Is the estimated variance value of the global gradient.
From equation (6), the number of edge devices selected per roundPlays a critical role in resisting the influence of channel noise. Thus, an efficient edge device scheduling mechanism requires optimization of the edge device selection strategy a according to the channel noise level in each round n,t . Further observations are made of equation (6), where the first three terms are independent of scheduling policy and can be treated as constants. Therefore, the constant term lη is ignored t 2 2, letting:
thereby optimizing problem P 1 Can be converted into P 2 Solving for
As described above, the number of selected edge devices and the quality of the edge devices are two key elements in the scheduling mechanism, so that the edge device scheduling mechanism of the present application can determine the number of selected edge devices so as to effectively reduce the interference of channel noise in the communication process, and ensure that under the condition of limited resources, the quality of all the selected edge devices is as high as possible to fully utilize the training performance thereof. Inspired by a drift-penalty algorithm in Lyapunov optimization theory, the optimization problem of the edge device scheduling mechanism can be solved by P 3 The following is given:
wherein α is a balance convergence upper bound adjustable term U t And device quality I n,t And Lyapunov factor.
The edge server solves the problem P by solving 3 Obtaining an edge device scheduling decision a n,t The edge server iterates the possible number of selected edge devices from k=1 to k=n and calculates a corresponding drift penalty term So that p t (k) K corresponding to the minimum * The value is the optimal number of selected edge devices.
In step 304, the edge server determines a target edge device from the plurality of edge devices according to the device number information.
Optionally, according to quality information sent by each edge device, determining the number of edge devices meeting the preset quality requirement in the plurality of edge devices as target edge devices. After determining the number of the selected edge devices, the edge server ranks the quality information of all the edge devices from high to low, and selects the first k of the quality information * The corresponding edge devices are target edge devices.
In one embodiment, when the local model on the edge device is trained to obtain the gradient of the local model, the global model parameters of the global model on the edge server are needed, as shown in fig. 4, and the method further includes:
in step 401, an edge server obtains current global model parameters of a global model.
The current global model parameters are model parameters of the global model in the current training round, and the edge server obtains the current global model parameters of the global model when each round of training starts.
In step 402, the edge server sends the current global model parameters to each edge device.
Optionally, the edge server broadcasts the current global model parameters to all edge devices. The current global model parameters are used for each edge device to train the local model deployed respectively based on the current global model parameters so as to obtain gradient information of the local model.
In one embodiment, as shown in fig. 5, a model training method is provided, and the method is applied to the edge device 102 in the figure for illustration, and includes the following steps,
in step 501, the edge device sends quality information to an edge server.
The quality information is used to characterize the degree of contribution of gradient information of a local model deployed in the edge device to global model training deployed in the edge server. And each edge device sends the quality information to the edge server through the control channel, and the edge server selects the edge devices participating in the global model training according to the quality information of each edge device. Alternatively, the quality information of each edge device may be calculated by the edge device according to an algorithm, and may be determined, for example, according to a neural network model.
The gradient information is used for the edge server to train the global model deployed in the central edge server according to the gradient information, and the indication is that the edge server determines the target edge device from a plurality of edge devices according to the quality information sent by each edge device and sends the target edge device. And receiving an indication sent by the edge server, namely that the edge device is a target edge device and is selected to participate in the updating of the global model, and sending gradient information of the local model to the edge server by the target edge device to participate in the training process of the global model.
In one embodiment, after the edge server broadcasts the current global model parameters to all edge devices, as illustrated in fig. 6, the method further comprises:
in step 601, an edge device receives current global model parameters of a global model deployed in an edge server.
At the beginning of each round of training, the edge server broadcasts current global model parameters to all edge devices, as described in the above embodiments, w t-1 For the edge device to perform local model training.
Wherein l (w; s) is defined i ,q i ) Data samples (s i ,q i ) Is the loss function of (2), then edge deviceAt local data set D n The local loss function above is:
wherein, |D n I represents dataset D n Is of a size of (a) and (b).
Upon receiving the global model parameters w t-1 Thereafter, each edge device is locally small by lot sizeA small batch of random gradient descent algorithm is run on the model to calculate the gradient of the local model, namely:
in one embodiment of the present application, in order to more fully utilize the data set to improve the stability and accuracy of model training, a dynamic cumulative residual feedback mechanism is proposed that allows the edge device to transmit cumulative local gradients that were not sent in the past, while the unselected edge devices locally save the gradient information of the local model during the present round of training and wait for the next time to be selected. As shown in fig. 7, the step of sending gradient information of the local model deployed in the edge device to the edge server for training the global model deployed in the edge server includes:
In step 701, the edge device determines whether the edge device has received an indication sent by the edge server for the previous round.
When the edge device determines the gradient information of the local model of the current round, it needs to determine whether the edge device of the previous round receives the indication sent by the edge server, that is, whether the edge device of the previous round is selected to send the gradient information of the local model.
Wherein, the determination result includes the following two cases:
in the first case, if it is determined that the edge device in the previous round receives the instruction sent by the edge server, the gradient of the local model deployed in the edge device obtained by the calculation in the current round is used as gradient information.
And in the second case, if the edge equipment in the previous round does not receive the indication sent by the edge server, taking the sum of the gradient of the local model deployed in the edge equipment obtained by the calculation in the current round and the undelivered accumulated gradient as gradient information.
For example, gradient information defining a local model of the selected edge device n at the t-th round Gradient g for current communication round n,t And cumulative residual error r n,t Is a combination of (a) and (b). Specifically, the->r n,t Respectively given by:
wherein, 0.ltoreq.ζ.ltoreq.1 reflects the importance degree of the accumulated residual error for the current training. S is S t-1 Representing the set of the edge devices selected in the t-1 round, namely, if the edge device n of the t-1 round is selected, the accumulated residual is 0, taking the gradient of the local model deployed in the edge device obtained by the calculation in the round as gradient information, and if the edge device n of the previous round is not selected, the accumulated residual is ζg n,t-1 And taking the sum of the gradient and the accumulated residual error of the local model obtained by the calculation of the round as gradient information.
In the above embodiment, through the dynamic accumulation residual feedback mechanism, the edge device is allowed to transmit the accumulation local gradient which is not transmitted in the past, and the unselected device locally stores the model gradient of the present round and waits for the next transmission, so that the data set can be more comprehensively utilized to improve the stability and accuracy of model training.
In the above embodiment, due to the diversity of the local data set, the channel condition, the energy state and the like of different edge devices, the contribution of different edge devices to the global model is different, so that the quality information of the edge devices is determined by comprehensively considering the importance of update information, the channel condition and the energy consumption level, and the selection of the edge devices is determined by the quality information, thereby realizing high communication efficiency, high precision and high energy efficiency.
In an embodiment of the present application, please refer to fig. 8, which shows a flowchart of a model training method provided in an embodiment of the present application, the model training method includes the following steps:
in step 801, an edge server obtains current global model parameters of a global model.
In step 802, the edge server sends current global model parameters to each edge device.
In step 803, the edge device receives current global model parameters of the global model deployed in the edge server.
And step 804, the edge device performs local round training on the local model deployed in the edge device according to the current global model parameters to obtain gradient information of the local model.
In step 805, the edge device sends quality information to the edge server.
In step 806, the edge server determines a target edge device from the plurality of edge devices according to the quality information sent by each edge device, and instructs the target edge device to upload gradient information of the local model deployed by the target edge device.
In step 807, the edge server trains the global model based on gradient information uploaded by the target edge device.
In order to facilitate readers to understand the technical scheme provided by the embodiment of the application, the application of the model training method of the application is described below. Referring to fig. 9, the system includes an edge server and N edge devices, a global model is disposed on the edge server, and a local model and a local data set are disposed on the edge devices. The edge server broadcasts global model parameters to N edge devices through a broadcast channel at the beginning of training, the N edge devices transmit the quality of the edge devices to the edge server through a control channel, and the edge server calculates and selects proper quantity of edge devices with higher device quality to participate in aggregation and global model updating according to the convergence rate of federal learning and the device quality uploaded by the N edge devices. For example, as shown in fig. 9, the edge device 1 is not selected, the edge device 2 and the edge device N are selected, and the edge device 2 and the edge device N transmit gradient information of the local model to the edge server through a noisy channel based on air calculation, and update of the global model is performed.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a model training device for realizing the model training method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in one or more embodiments of the model training device provided below may be referred to above for the limitation of the model training method, which is not repeated here.
In one embodiment, as shown in fig. 10, there is provided a model training apparatus 1000 applied to an edge server in a federal learning system, the federal learning system including the edge server and a plurality of edge devices having communication connections established with the edge server, including: a receiving module 1001, a determining module 1002 and a training module 1003, wherein:
the receiving module 1001 is configured to receive quality information sent by each edge device during training of a global model deployed in an edge server, where the quality information is used to characterize a contribution degree of gradient information of a local model deployed in the edge device to global model training;
the determining module 1002 is configured to determine a target edge device from a plurality of edge devices according to quality information sent by each edge device, and instruct the target edge device to upload gradient information of a local model deployed by the target edge device;
the training module 1003 is configured to train the global model based on gradient information uploaded by the target edge device.
In an optional embodiment of the present application, the determining module 1002 is specifically configured to determine the target edge device from a plurality of edge devices according to the federally learned convergence speed and quality information sent by each edge device.
In an optional embodiment of the present application, the determining module is specifically configured to obtain a drift penalty function, where the drift penalty function is constructed according to a convergence speed of federal learning and quality information sent by each edge device; obtaining constraint conditions which are constructed according to the transmission energy information of each edge device; based on constraint conditions, taking the minimum value of the drift penalty function as a target, and solving the drift penalty function to obtain equipment quantity information; and determining the target edge device from the plurality of edge devices according to the device quantity information.
In an optional embodiment of the present application, the determining module 1002 is specifically configured to determine, according to quality information sent by each edge device, the number of edge devices, which is the number of devices meeting a preset quality requirement, in the plurality of edge devices, as the target edge device.
In an optional embodiment of the present application, the system further includes a parameter sending module, configured to obtain a current global model parameter of the global model; and sending the current global model parameters to each edge device, wherein the current global model parameters are used for each edge device to train the local model deployed respectively based on the current global model parameters so as to obtain gradient information of the local model.
In one embodiment, as shown in fig. 1100, there is provided a model training apparatus applied to an edge device in a federal learning system, the federal learning system including a plurality of edge devices and an edge server having a communication connection with each edge device, the apparatus comprising: a first sending module 1101, a second sending module 1102, wherein:
the first sending module 1101 is configured to send quality information to an edge server, where the quality information is used to characterize a contribution degree of gradient information of a local model deployed in an edge device to global model training deployed in the edge server;
the second sending module 1102 is configured to send gradient information of a local model deployed in an edge device to an edge server for training a global model deployed in the edge server if an instruction sent by the edge server is received, where the instruction is determined by the edge server according to quality information sent by each edge device.
In an optional embodiment of the present application, the second sending module 1102 is specifically configured to determine whether the edge device receives an indication sent by the edge server in the previous round; and acquiring gradient information of the local model deployed in the edge equipment according to the determination result, and sending the gradient information to the edge server.
In an optional embodiment of the present application, the second sending module 1102 is specifically configured to, if it is determined that the edge device of the previous round receives the indication sent by the edge server, take, as gradient information, a gradient of the local model deployed in the edge device obtained by the calculation of the previous round.
In an optional embodiment of the present application, the second sending module 1102 is specifically configured to, if it is determined that the edge device in the previous round does not receive the indication sent by the edge server, take, as gradient information, a sum of a gradient of the local model deployed in the edge device obtained by the calculation in the current round and an undelivered cumulative gradient.
In an optional embodiment of the present application, the system further includes a parameter receiving module, configured to receive a current global model parameter of a global model deployed in the edge server; and performing local round training on the local model deployed in the edge equipment according to the current global model parameters to obtain gradient information of the local model.
The various modules in the model training apparatus described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 12. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store data for the edge device. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a model training method.
In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 13. The computer device includes a processor, a memory, a communication interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a model training method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 12 and 13 are block diagrams of only some of the structures associated with the present application and are not intended to limit the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, which when executed by the processor implements the model training method provided in the method embodiments described above.
In one embodiment, a computer readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the model training method provided in the method embodiments described above.
In one embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the model training method provided in the method embodiments described above.
It should be noted that, user information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as Static Random access memory (Static Random access memory AccessMemory, SRAM) or dynamic Random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.
Claims (12)
1. A model training method, applied to an edge server in a federal learning system, the federal learning system including the edge server and a plurality of edge devices having communication connections established with the edge server, the method comprising:
in the process of training the global model deployed in the edge server, receiving quality information sent by each edge device, wherein the quality information is used for representing the contribution degree of gradient information of a local model deployed in the edge device to the global model training;
Determining target edge equipment from the plurality of edge equipment according to the quality information sent by the edge equipment, and indicating the target edge equipment to upload gradient information of a local model deployed by the target edge equipment;
and training the global model based on the gradient information uploaded by the target edge equipment.
2. The method of claim 1, wherein the determining a target edge device from the plurality of edge devices based on the quality information sent by each of the edge devices comprises:
and determining the target edge equipment from the plurality of edge equipment according to the convergence speed of federal learning and the quality information sent by each edge equipment.
3. The method of claim 2, wherein the determining the target edge device from the plurality of edge devices based on the federally learned convergence rates and the quality information sent by each of the edge devices comprises:
obtaining a drift penalty function, wherein the drift penalty function is constructed according to the convergence speed of federal learning and quality information sent by each edge device;
obtaining constraint conditions, wherein the constraint conditions are constructed according to the transmission energy information of each edge device;
Based on the constraint condition, aiming at the minimum value of the drift penalty function, solving the drift penalty function to obtain equipment quantity information;
determining the target edge device from the plurality of edge devices according to the device number information;
the determining the target edge device from the plurality of edge devices according to the device number information comprises the following steps:
and determining the number of the edge devices meeting the preset quality requirement in the plurality of edge devices as the target edge device according to the quality information sent by each edge device.
4. The method according to claim 1, wherein the method further comprises:
acquiring current global model parameters of the global model;
and sending the current global model parameters to each edge device, wherein the current global model parameters are used for each edge device to train the local model deployed respectively based on the current global model parameters so as to obtain gradient information of the local model.
5. A model training method, applied to edge devices in a federal learning system, the federal learning system including a plurality of edge devices and an edge server having a communication connection established with each of the edge devices, the method comprising:
Transmitting quality information to the edge server, wherein the quality information is used for representing the contribution degree of gradient information of a local model deployed in the edge equipment to global model training deployed in the edge server;
and if an indication sent by the edge server is received, gradient information of a local model deployed in the edge device is sent to the edge server, wherein the gradient information is used for training a global model deployed in the edge server by the edge server according to the gradient information, and the indication is that the edge server determines a target edge device from the plurality of edge devices according to quality information sent by the edge devices and sends the target edge device.
6. The method of claim 5, wherein the sending gradient information of the local model deployed in the edge device to the edge server for training of the global model deployed in the edge server comprises:
determining whether the edge equipment receives an indication sent by the edge server in the last round;
according to a determination result, gradient information of a local model deployed in the edge equipment is obtained, and the gradient information is sent to the edge server;
According to the determination result, obtaining gradient information of the local model deployed in the edge device includes:
if the edge equipment receives the indication sent by the edge server in the previous round, taking the gradient of the local model deployed in the edge equipment obtained by the calculation in the current round as the gradient information;
according to the determination result, gradient information of the local model deployed in the edge device is obtained, and the method further comprises:
and if the edge equipment does not receive the indication sent by the edge server in the previous round, taking the sum of the gradient of the local model deployed in the edge equipment and the undelivered accumulated gradient obtained by the calculation in the current round as the gradient information.
7. The method of claim 6, wherein the method further comprises:
receiving current global model parameters of a global model deployed in the edge server;
and performing local round training on the local model deployed in the edge equipment according to the current global model parameters to obtain gradient information of the local model.
8. A model training apparatus for use with an edge server in a federal learning system, the federal learning system including the edge server and a plurality of edge devices having communication connections established with the edge server, the apparatus comprising:
The receiving module is used for receiving quality information sent by each edge device in the process of training the global model deployed in the edge server, wherein the quality information is used for representing the contribution degree of gradient information of the local model deployed in the edge device to the global model training;
the determining module is used for determining target edge equipment from the plurality of edge equipment according to the quality information sent by the edge equipment and indicating the target edge equipment to upload gradient information of a local model deployed by the target edge equipment;
and the training module is used for training the global model based on the gradient information uploaded by the target edge equipment.
9. A model training apparatus for use with edge devices in a federal learning system, the federal learning system including a plurality of edge devices and an edge server having a communication connection established with each of the edge devices, the apparatus comprising:
the first sending module is used for sending quality information to the edge server, wherein the quality information is used for representing the contribution degree of gradient information of a local model deployed in the edge equipment to global model training deployed in the edge server;
And the second sending module is used for sending the gradient information of the local model deployed in the edge equipment to the edge server for training the global model deployed in the edge server if the indication sent by the edge server is received, wherein the indication is determined by the edge server according to the quality information sent by each edge equipment.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 4 or 5 to 7 when the computer program is executed.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4 or 5 to 7.
12. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 4 or 5 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310093404.9A CN116187483A (en) | 2023-02-10 | 2023-02-10 | Model training method, device, apparatus, medium and program product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310093404.9A CN116187483A (en) | 2023-02-10 | 2023-02-10 | Model training method, device, apparatus, medium and program product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116187483A true CN116187483A (en) | 2023-05-30 |
Family
ID=86432208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310093404.9A Pending CN116187483A (en) | 2023-02-10 | 2023-02-10 | Model training method, device, apparatus, medium and program product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116187483A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116451593A (en) * | 2023-06-14 | 2023-07-18 | 北京邮电大学 | Reinforced federal learning dynamic sampling method and equipment based on data quality evaluation |
CN117010485A (en) * | 2023-10-08 | 2023-11-07 | 之江实验室 | Distributed model training system and gradient protocol method in edge scene |
CN117172338A (en) * | 2023-11-02 | 2023-12-05 | 数据空间研究院 | Contribution evaluation method in longitudinal federal learning scene |
CN117521783A (en) * | 2023-11-23 | 2024-02-06 | 北京天融信网络安全技术有限公司 | Federal machine learning method, apparatus, storage medium and processor |
-
2023
- 2023-02-10 CN CN202310093404.9A patent/CN116187483A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116451593A (en) * | 2023-06-14 | 2023-07-18 | 北京邮电大学 | Reinforced federal learning dynamic sampling method and equipment based on data quality evaluation |
CN116451593B (en) * | 2023-06-14 | 2023-11-14 | 北京邮电大学 | Reinforced federal learning dynamic sampling method and equipment based on data quality evaluation |
CN117010485A (en) * | 2023-10-08 | 2023-11-07 | 之江实验室 | Distributed model training system and gradient protocol method in edge scene |
CN117010485B (en) * | 2023-10-08 | 2024-01-26 | 之江实验室 | Distributed model training system and gradient protocol method in edge scene |
CN117172338A (en) * | 2023-11-02 | 2023-12-05 | 数据空间研究院 | Contribution evaluation method in longitudinal federal learning scene |
CN117172338B (en) * | 2023-11-02 | 2024-02-02 | 数据空间研究院 | Contribution evaluation method in longitudinal federal learning scene |
CN117521783A (en) * | 2023-11-23 | 2024-02-06 | 北京天融信网络安全技术有限公司 | Federal machine learning method, apparatus, storage medium and processor |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ding et al. | A multi-channel transmission schedule for remote state estimation under DoS attacks | |
CN116187483A (en) | Model training method, device, apparatus, medium and program product | |
CN113902021B (en) | Energy-efficient clustered federal edge learning strategy generation method and device | |
Liu et al. | Online computation offloading and resource scheduling in mobile-edge computing | |
Xia et al. | Federated-learning-based client scheduling for low-latency wireless communications | |
Lee et al. | Adaptive transmission scheduling in wireless networks for asynchronous federated learning | |
CN113469325A (en) | Layered federated learning method, computer equipment and storage medium for edge aggregation interval adaptive control | |
CN110856268B (en) | Dynamic multichannel access method for wireless network | |
Liu et al. | Fedpa: An adaptively partial model aggregation strategy in federated learning | |
CN114338628B (en) | Nested meta-learning method and system based on federated architecture | |
CN117349672B (en) | Model training method, device and equipment based on differential privacy federal learning | |
Zhang et al. | Vehicle selection and resource allocation for federated learning-assisted vehicular network | |
Liu et al. | Fine-grained offloading for multi-access edge computing with actor-critic federated learning | |
CN112235062A (en) | Federal learning method and system for resisting communication noise | |
Yan et al. | Deep reinforcement learning based offloading for mobile edge computing with general task graph | |
Albaseer et al. | Semi-supervised federated learning over heterogeneous wireless iot edge networks: Framework and algorithms | |
Liu et al. | FedAGL: A communication-efficient federated vehicular network | |
Yin et al. | Joint user scheduling and resource allocation for federated learning over wireless networks | |
Yu et al. | Deep reinforcement learning for wireless networks | |
Li et al. | Dynamic user-scheduling and power allocation for SWIPT aided federated learning: A deep learning approach | |
Yang et al. | Client selection for federated bayesian learning | |
Chen et al. | Enhanced hybrid hierarchical federated edge learning over heterogeneous networks | |
Yan et al. | Bayesian optimization for online management in dynamic mobile edge computing | |
CN115756873B (en) | Mobile edge computing and unloading method and platform based on federation reinforcement learning | |
CN116887205A (en) | Wireless federal segmentation learning algorithm for cooperative intelligence of Internet of things |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |