Disclosure of Invention
Based on this, the application provides a distributed federated learning collaborative computing method for an intelligent factory, which ensures the security of the federated learning process, and solves the problems of association between the edge server and the participants, bandwidth resource allocation and the problem of computing resource allocation of the participants by using a Deep Reinforcement Learning (DRL) technology.
In order to achieve the above object, the present application provides a distributed federated learning collaborative computing method, which specifically includes the following steps: carrying out deep reinforcement learning model training; respectively deploying the trained deep reinforcement learning model to each edge server for federal learning; and (5) ending the federal study.
As above, the deep reinforcement learning model training specifically includes the following sub-steps: initializing network parameters and state information of the deep reinforcement learning model; each participant trains a local model according to the network parameters and state information initialized by the deep reinforcement learning model; generating a bandwidth allocation strategy in response to the completion of the simulation training of the local model, and updating AC network parameters in a single step at each time slot; generating an association strategy and a calculation resource allocation strategy in response to the completion of the simulation transmission of the local model, and updating DQN network parameters; detecting whether the deep reinforcement learning model is converged or the maximum iteration times; and if the local model is not converged or the maximum iteration times are not converged, starting the next iteration and carrying out the training of the local model again.
The above, wherein the metal surface defect detection model is used as the local model.
As above, the initialized state information specifically includes: initializing parameters and convergence accuracy of the Actor network, the Critic network and the DQN network, and position coordinates [ x ] of each participant
k,y
k]Initial mini-batch value
CPU frequency f
kPosition coordinates of edge servers [ x ]
m,y
m]And maximum bandwidth B
mSlot length Δ t and maximum number of iterations I.
As above, the training process of the participator for the local model is to use the local data set D
kDivided into a plurality of sizes of
And b, training the small batch b by updating the local weight through the following formula so as to complete the training of the local model, wherein the training process is represented as:
wherein, eta represents the learning rate,
the gradient of the loss function for each small batch b is represented,
representing the local model of party k in the ith iteration.
The above, wherein the simulation training of the local model further comprises determining the time required by the participant k during the ith round of local training and the time required by the participant k during the ith round of local trainingWorkshop
The concrete expression is as follows:
wherein, c
kDenotes the number of CPU cycles for the participant k to train a single data sample, τ denotes the number of iterations for the participant to execute the MBGD algorithm, f
kRepresenting the CPU cycle frequency at which participant k trains,
indicating the mini-batch value of participant k at the time the ith round performed the local training.
As above, the current fast-scale state space is used as the input of the AC network, so as to obtain a fast-scale action space, i.e. a bandwidth resource allocation policy; the fast-scale state space s is represented as:
representing the size of the model for each participant to have outstanding transmissions,
representing the transmission rate of an uploading model of each time slot participant, wherein t represents a time slot, and delta t represents the time slot length;
fast scale motion space
The fast-scale action space is a bandwidth resource allocation strategy, wherein
Indicating the bandwidth allocated by the edge server m for party k per slot.
In the above, in the process of uploading the parameters of the trained deep reinforcement learning model to the edge server according to the determined bandwidth resource allocation strategy, the available uplink data transmission rate between the i-th round participant k and the edge server m
Expressed as:
wherein, P
kWhich represents the transmission power of the participant k,
representing the power spectral density of additive white gaussian noise,
k denotes the channel gain of the participant k and the edge server m, and ψ 0 denotes the channel power gain at the reference distance.
The method also comprises the time for the ith round participant k to upload the parameters of the deep reinforcement learning model to the edge server m
The concrete expression is as follows:
wherein xi represents the size of the metal surface defect detection model,
indicating the available upstream data transmission rate between the ith round participant k and the edge server m.
A distributed federated learning collaborative computing system, comprising: a deep reinforcement learning unit and a federal learning unit; the deep reinforcement learning unit is used for carrying out deep reinforcement learning model training; and the federated learning unit is used for performing federated learning according to the association strategy generated by the deep reinforcement learning model and the calculation, namely the bandwidth resource allocation strategy.
The application has the following beneficial effects:
(1) the distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment break the dependence of the traditional federated learning on the central server and effectively ensure the privacy protection and the security in the federated learning process, aiming at a distributed federated learning framework.
(2) The distributed federated learning collaborative computing method and system provided by the embodiment achieve the design goal of minimizing the total time delay of federated learning from two angles, that is, the total iteration turn is reduced and the time consumed by each iteration turn is reduced are considered at the same time, the computing and communication resources of each participant and the edge server are fully utilized, and the utility maximization of federated learning is achieved.
(3) The distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment take the influence of the calculated amount of each participant on the model precision into consideration, adjust the weight occupied by the local model of each participant in the global aggregation process, ensure the fairness of the aggregation process, and are beneficial to accelerating the model convergence.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The method and the device solve the problem of minimizing the total time delay in a distributed federated learning system framework, namely minimizing the total time delay when a global model reaches target precision, and give emphasis to the problems of association between an edge server and participants in the system, bandwidth resource allocation and calculation resource allocation of the participants.
Scene assumption is as follows: the application uses the set K ═ {1, 2, …, K } to represent all the participants of federal learning, and the size of the data set of participant K is represented as D
kFor each sample d in the dataset
n={x
n,y
n},x
nVector representing input, y
nRepresenting a vector x
nCorresponding output tag with [ x ]
k,y
k]Representing the location coordinates of party k; all small base stations as edge servers are represented by the set of M {1, 2, …, M }, and x
m,y
m]Indicating the location coordinates of the edge server m. In addition, the iteration turns of the federal learning are represented by I ═ {1, 2, I, I },
indicating that the participant k establishes a communication connection with the edge server m in the ith iteration, otherwise, the participant k establishes a communication connection with the edge server m
Representing the mini-batch value of participant k at the time of the ith round of performing local training; all slots of each iteration are denoted by T ═ 1, 2, T, Δ T denotes the slot length,
representing edgesThe edge server m allocates bandwidth for the participant k in each time slot; omega
iA global model representing the ith round is shown,
representing the local model of party k in the ith iteration.
The technical problem to be solved by the present application is how to solve the problem of minimizing the total time delay of collaborative computation in the federal learning process, and the problem is specifically expressed as follows:
where C1 indicates that each participant can only connect to one edge server; c2 indicates that each edge server is connected to at least one participant; c3 indicates that each edge server does not allocate bandwidth beyond its maximum bandwidth capacity; c4 indicates that the mini-batch value of each round of the participant does not exceed the data size of the participant.
Represents the time required by the participant k in the ith round of local training, wherein
Representing the bandwidth allocated by the edge server m to party k per slot,
indicating that the participant k establishes a communication connection with the edge server m in the ith iteration, otherwise, the participant k establishes a communication connection with the edge server m
The size of the data set for party k is denoted Dk.
Indicating the mini-batch value of participant k at the time the ith round performed the local training. B is
mRepresenting the maximum bandwidth of each edge server.
The problem, which has dynamic constraints and long-term goals and the current state of the system depends only on the state and actions taken at the previous iteration, satisfies the markov property, can be expressed as a Markov Decision Process (MDP), i.e., MDP { S, a, γ, R }. Wherein S represents a state space, A represents an action space, gamma represents a discount factor, and R represents a reward function. Meanwhile, the solution of the problem is also converted into the determination of the optimal action selection corresponding to the current state under different states.
Further, the above problem can be translated into solving the association and bandwidth resource allocation problem between the edge server and the participants and the computational resource allocation problem of the participants. In this problem, there are three decision variables, one for each
Wherein
And
are discrete variables and only vary between different polymerization runs, and
the method is a continuous variable and changes among each time slot, so that deep reinforcement learning with double time scales can be adopted, an aggregation turn i is taken as a time interval of a slow time scale, and a DQN network is adopted to generate an association strategy and a calculation resource allocation strategy in the current state on the slow time scale; and with the time slot length delta t as a time interval of a fast time scale, performing single-step updating on the fast time scale by adopting an Actor-critic (AC) network to generate a bandwidth resource allocation strategy in the current state.
Based on the above thought, the present application provides a flowchart of a distributed federated learning collaborative computing method as shown in fig. 1, which specifically includes the following steps.
Step S110: and carrying out deep reinforcement learning model training.
Wherein, the deep reinforcement learning model is trained in advance by adopting an off-line training mode and an on-line executing mode. The training deep reinforcement learning model (DRL model) is specifically a training AC network and a training DQN network. Wherein the DRL model training comprises the following substeps:
step S1101: and initializing the network parameters and the state information of the DRL model.
Specifically, the initialized state information specifically includes: initializing parameters of an Actor network, a Critic network and a DQN network, initializing an association strategy and position coordinates [ x ] of each participant
k,y
k]Initial mini-batch value
CPU frequency f
kPosition coordinates of edge servers [ x ]
m,y
m]And maximum bandwidth B
mTime slot length delta t and maximum iteration number I, local model parameters used in the process of simulating the federal learning.
Step S1102: each participant performs training of its own local model.
And simulating a federal learning process according to the network parameters and the state information initialized in the step 1101, namely simulating each participant to train a local model according to a mini-batch value output by the DQN network. The purpose of simulating the federal learning process is to train the DRL model.
Preferably, each participant uses an optimization method of a small-batch random Gradient Descent (MBGD) method to perform training of the local model.
Local data set D
kDivided into a plurality of sizes of
And b, training the small batch b by updating the local weight through the following formula so as to complete the training of the local model, wherein the training process is represented as:
wherein, eta represents the learning rate,
the gradient of the loss function for each small batch b is represented,
representing the local model of party k in the ith iteration.
Wherein, after the simulation training of the local model, the method also comprises the steps of determining the time required by the participant k during the ith round of local training,
time required by participant k in ith round of local training
The concrete expression is as follows:
wherein, ckDenotes the number of CPU cycles for the participant k to train a single data sample, τ denotes the number of iterations for the participant to execute the MBGD algorithm, fkRepresenting the CPU cycle frequency at which participant k trains.
Step S1103: in response to completing the simulated local model training, a bandwidth allocation policy is generated and the local model transmission is simulated while updating the AC network parameters in a single step at each time slot.
And simultaneously, the AC network observes the fast scale state s of the current time slot, outputs a fast scale action A (t), and adopts a Bellman equation to update AC network parameters.
In particular, the fast-scale state is represented as
A local model size representing the outstanding transmission of each participant, wherein
ξ denotes the local model size,
representing the transmission rate at which each timeslot participant uploads the local model,
specifically, the available upstream data transmission rate between the ith round participant k and the edge server m is represented as:
wherein, P
kWhich represents the transmission power of the participant k,
representing the power spectral density of additive white gaussian noise,
indicating the channel gain, ψ, of the participant k and the edge server m
0Representing the channel power gain at the reference distance.
Fast scale motion
I.e., bandwidth resource allocation policy, wherein
Indicating the bandwidth allocated by the edge server m for party k per slot.
The fast scale reward function R (t) is expressed as:
where μ (t) is a parameter for adjusting the reward function.
Discount factor γ: to reduce the impact of future rewards on the current, more distant rewards have less effect. The jackpot achieved by selecting the fast-scale action a (t) in the fast-scale state s may be defined as:
step S1104: and responding to the transmission of the simulated local model, simulating global model aggregation, generating a next round of association strategy and calculation resource allocation strategy, and updating the DQN network parameters.
Wherein the local model parameters of each participant are weighted by the following formula to obtain global model parameters omegaiAnd detecting the global model accuracy:
where α + β ═ 1 denotes two parameters for adjusting the weight ratio.
Since the association policy in step S1103 is initialized in advance, updating of the association policy needs to be performed. Specifically, the current slow scale state S is used as the input of the DQN network, the slow scale action A is output, namely the association strategy and the calculation resource allocation strategy are associated, and the parameters of the DQN network are updated by adopting a Bellman equation.
Wherein the slow scale state is represented as S ═ t
k,t
k,m],
Representing the time vector consumed by each participant in local training,
representing a time vector consumed by each participant to upload the model, wherein
Representing the time it takes for participant k to upload the model to edge server m.
The slow scale motion is denoted as a ═ a, B],
An association vector, i.e. an updated association policy,
and representing a mini-batch vector when each participant executes local model training, namely a computing resource allocation strategy.
Slow scale reward function RiExpressed as:
where mu is a parameter for adjusting the reward function,
indicating the accuracy of the ith round global model.
The jackpot achieved by selecting the slow-scale action a in the slow-scale state S may be defined as:
step S1105: and detecting whether the DRL model converges or reaches the maximum iteration number.
And if the convergence is not achieved or the maximum iteration number is reached, adding 1 to the iteration number, repeatedly executing the steps S1102-S1104, starting the next iteration, and taking the global model as the local model of each participant to re-simulate the local model training.
In the next iteration process, the association strategy generated in the last iteration and the mini-batch vector required by the next local model training are utilized, and then a new bandwidth allocation strategy is generated in the next iteration process according to the fast scale state space observed by the AC network in the current time slot, and a new association strategy and a calculation resource allocation strategy are generated by the DQN in the slow scale state space. By analogy, the bandwidth resource allocation policy, the association policy, and the computing resource allocation policy are continuously updated.
If convergence or the maximum iteration number is reached, training of the AC network and the DQN network is completed, that is, training of the DRL model is completed, and step S1106 is performed.
Step S1106: and sending each parameter of the trained DRL model to an edge server.
The edge server loads a DRL model, namely the trained AC network and DQN network, and is used for generating an association strategy and a bandwidth and computing resource allocation strategy in the current state, and completing the deployment of the DRL model.
Step S120: and responding to the fact that the trained DRL model is respectively deployed to each edge server, and performing federal learning.
Since the DRL model is to solve the problem of minimizing the federal learning delay, the DRL model is applied to the federal learning process in step S120 after the DRL model is trained in step S110.
Wherein step S120 specifically includes the following substeps:
step S1201: the local model is initialized.
Wherein, the proper metal surface defect detection model selected by the appointed party is used as a local model.
Specifically, the parameters of the metal surface defect detection model, the learning rate, the initial mi i-batch value and the iteration times of the metal surface defect detection model are broadcasted to other participants through an edge server, and each participant uses the metal surface defect detection model as a local model to complete the initialization of the local model.
Step S1202: and responding to the completion of the initialization of the local model, and performing local model training by each participant according to the calculation resource allocation strategy in the current state.
In this step, the calculation resource allocation policy in the current state is the calculation resource allocation policy output by the trained DQN network after step S110 is executed.
The training mode of the local model is trained according to the existing method, which is not described herein.
Step S1203: and each participant uploads the local model parameters trained by the participant to the edge server respectively according to the association strategy and the bandwidth resource allocation strategy.
Specifically, the association policy and the bandwidth resource allocation policy at this time are the association policy and the bandwidth resource allocation policy output by the AC network and the DQN network after the step S110 is executed.
Step S1204: and carrying out global model aggregation on the local model uploaded by each participant, and sending the global model parameters and the calculation resource allocation strategy to each participant.
Specifically, the local models uploaded by all the participants are aggregated into a global model.
In the aggregation process, an edge server serving as a central server temporarily is selected according to the position information of the edge server, and the selection is specifically performed according to the following formula:
wherein, [ x ]m,ym]The position coordinates of each edge server are shown, and the set M {1, 2, …, M } shows all the small base stations as edge servers.
Further, after obtaining the temporary central server according to the above formula, the temporary central server weights the local model parameters of each participant by using the following formula, and finally obtains the global model parameter ωi:
Where α + β ═ 1 denotes two parameters for adjusting the weight ratio.
At this time, the calculation resource allocation policy sent to each participant is the calculation resource allocation policy required for the next iteration after the step 1202 and 1203 are executed. Time vector t consumed by local training of each participant in training of local model at step S1202
kWith the change, in step S1203, each participant uploads the time vector t consumed by the model
k,mA change has also occurred, so that in the current state space S ═ t
k,t
k,m]The change also occurs, and the resulting association policy a ═ a, B]The change has occurred in the form of a change,
the change is also generated, namely the mini-batch vector used in the next iteration is changed, and the change of the mini-batch vector brings the change of the calculation resource allocation strategy, namely the calculation resource allocation strategy used in the next iteration is changed.
Step S1205: and judging whether the global model reaches the preset convergence precision or the maximum iteration number.
And if the global model does not reach the preset convergence precision or the maximum iteration number, adding 1 to the iteration number, and re-executing the step S1202, namely re-training the local model.
The local model is re-trained according to the global model participation and the computing resource allocation strategy sent to each participant in step S1204.
Specifically, the global model received by each participant is used as the local model again, and the local model is retrained again according to the calculation resource allocation strategy sent to each participant in step S1204 and required by the next iteration. I.e. steps S1202-1204 are repeatedly performed.
If the global model reaches the preset convergence accuracy or reaches the maximum iteration number, ignoring the global model and the calculation resource allocation strategy sent to each participant in step S1204, and performing step S130 without performing the training of the local model.
Step S130: the federal learning process is ended.
As shown in fig. 2, the distributed federated learning collaborative computing system provided for the present application specifically includes: deep reinforcement learning model training unit 210, federal learning unit 220.
The deep reinforcement learning model training unit 210 is configured to perform deep reinforcement learning model training.
The federal learning unit 220 is connected to the deep reinforcement learning model training unit 210, and is configured to perform federal learning according to the association policy and the calculation, i.e., the bandwidth resource allocation policy generated by the deep reinforcement learning model.
The application has the following beneficial effects:
(3) the distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment break the dependence of the traditional federated learning on the central server and effectively ensure the privacy protection and the security in the federated learning process, aiming at a distributed federated learning framework.
(4) The distributed federated learning collaborative computing method and system provided by the embodiment achieve the design goal of minimizing the total time delay of federated learning from two angles, that is, the total iteration turn is reduced and the time consumed by each iteration turn is reduced are considered at the same time, the computing and communication resources of each participant and the edge server are fully utilized, and the utility maximization of federated learning is achieved.
(5) The distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment take the influence of the calculated amount of each participant on the model precision into consideration, adjust the weight occupied by the local model of each participant in the global aggregation process, ensure the fairness of the aggregation process, and are beneficial to accelerating the model convergence.
The above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalents to some of them, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.