CN113467952A

CN113467952A - Distributed federated learning collaborative computing method and system

Info

Publication number: CN113467952A
Application number: CN202110802910.1A
Authority: CN
Inventors: 张天魁; 刘天泽; 陈泽仁; 徐琪; 章园
Original assignee: Jiangxi Xinbingrui Technology Co ltd; Beijing University of Posts and Telecommunications
Current assignee: Jiangxi Xinbingrui Technology Co ltd; Beijing University of Posts and Telecommunications
Priority date: 2021-07-15
Filing date: 2021-07-15
Publication date: 2021-10-01
Anticipated expiration: 2041-07-15
Also published as: CN113467952B

Abstract

The present application discloses a distributed federated learning collaborative computing method and system, wherein the distributed federated learning collaborative computing method specifically includes the following steps: performing deep reinforcement learning model training; in response to deploying the trained deep reinforcement learning model to each The edge server performs federated learning; the federated learning is over. This application aims at the distributed federated learning framework, breaking the traditional federated learning's dependence on the central server, and effectively ensuring the privacy protection and security of the federated learning process.

Description

Distributed federated learning collaborative computing method and system

Technical Field

The application relates to the field of communication, in particular to a distributed federated learning collaborative computing method and system.

Background

The metal material workpiece is an important component of some products in the machining process, and the quality of the metal material workpiece directly influences the market competitiveness of enterprise products, so that the detection of the surface defects of the metal material workpiece in the machining process is very important. For the defect detection of the metal surface, a deep learning technology can be utilized to collect workpiece images from a production line, defect information is extracted from the images, and a network detection and defect identification model of the metal material workpiece defects is established by learning the surface defect characteristics of the metal workpiece. Commonly used detection models include Fast R-CNN, Mask R-CNN, and the like. However, in an industrial park, some plants have problems of limited data size and poor data quality. In addition, due to problems of industry competition, privacy protection and the like, data is difficult to share and integrate among different enterprises, so that the factories are difficult to train high-quality detection models.

Federal Learning (FL) is an artificial intelligence Learning framework developed to deal with the data privacy protection problem faced when artificial intelligence is actually applied. The core objective of the method is to realize the cooperative learning among a plurality of participants and establish a shared and globally effective artificial intelligence model on the premise that each participant does not need to directly exchange data. Under a federal learning framework, the participants firstly train local models locally, then encrypt parameters of the local models and upload the encrypted parameters to a central server, the server conducts safety aggregation on the local models and then sends updated global model parameters to the participants, and the iteration process is repeated until the global models reach target precision. In the process, the uploading and downloading of the participator are parameters of the model, and the data are always kept in the local area, so that the data privacy of the client can be well protected.

In addition, the federal learning framework still presents some safety issues. A centralized manager of model aggregation may be vulnerable to various threats (e.g., single point of failure and DDoS attacks), whose failure (e.g., skewing all local model updates) may cause the entire learning process to fail. Although the problems of insufficient data volume, privacy disclosure and the like of each participant are well solved by federal learning, and the safety problem in the process of federal learning can also be well solved by a distributed federal learning framework, the delay problem of distributed federal learning is rarely concerned by the current academic community. Considering that different participants have different training speeds in the same round, the participant who completes the calculation first enters passive waiting time, which causes waste of resources. Meanwhile, the network connection between the participant and the edge server is unstable, the network quality can change continuously due to environmental factors, the time required for uploading the model has large uncertainty, and the time required for model aggregation is likely to be prolonged.

Therefore, how to improve the accuracy of the global model of each round and reduce the total time delay of the global model to reach the target accuracy while reducing the time required by each round of federal learning is a problem waiting to be solved.

Disclosure of Invention

Based on this, the application provides a distributed federated learning collaborative computing method for an intelligent factory, which ensures the security of the federated learning process, and solves the problems of association between the edge server and the participants, bandwidth resource allocation and the problem of computing resource allocation of the participants by using a Deep Reinforcement Learning (DRL) technology.

In order to achieve the above object, the present application provides a distributed federated learning collaborative computing method, which specifically includes the following steps: carrying out deep reinforcement learning model training; respectively deploying the trained deep reinforcement learning model to each edge server for federal learning; and (5) ending the federal study.

As above, the deep reinforcement learning model training specifically includes the following sub-steps: initializing network parameters and state information of the deep reinforcement learning model; each participant trains a local model according to the network parameters and state information initialized by the deep reinforcement learning model; generating a bandwidth allocation strategy in response to the completion of the simulation training of the local model, and updating AC network parameters in a single step at each time slot; generating an association strategy and a calculation resource allocation strategy in response to the completion of the simulation transmission of the local model, and updating DQN network parameters; detecting whether the deep reinforcement learning model is converged or the maximum iteration times; and if the local model is not converged or the maximum iteration times are not converged, starting the next iteration and carrying out the training of the local model again.

The above, wherein the metal surface defect detection model is used as the local model.

As above, the initialized state information specifically includes: initializing parameters and convergence accuracy of the Actor network, the Critic network and the DQN network, and position coordinates [ x ] of each participant_k，y_k]Initial mini-batch value

CPU frequency f_kPosition coordinates of edge servers [ x ]_m，y_m]And maximum bandwidth B_mSlot length Δ t and maximum number of iterations I.

As above, the training process of the participator for the local model is to use the local data set D_kDivided into a plurality of sizes of

And b, training the small batch b by updating the local weight through the following formula so as to complete the training of the local model, wherein the training process is represented as:

wherein, eta represents the learning rate,

the gradient of the loss function for each small batch b is represented,

representing the local model of party k in the ith iteration.

The above, wherein the simulation training of the local model further comprises determining the time required by the participant k during the ith round of local training and the time required by the participant k during the ith round of local trainingWorkshop

The concrete expression is as follows:

wherein, c_kDenotes the number of CPU cycles for the participant k to train a single data sample, τ denotes the number of iterations for the participant to execute the MBGD algorithm, f_kRepresenting the CPU cycle frequency at which participant k trains,

indicating the mini-batch value of participant k at the time the ith round performed the local training.

As above, the current fast-scale state space is used as the input of the AC network, so as to obtain a fast-scale action space, i.e. a bandwidth resource allocation policy; the fast-scale state space s is represented as:

representing the size of the model for each participant to have outstanding transmissions,

representing the transmission rate of an uploading model of each time slot participant, wherein t represents a time slot, and delta t represents the time slot length;

fast scale motion space

The fast-scale action space is a bandwidth resource allocation strategy, wherein

Indicating the bandwidth allocated by the edge server m for party k per slot.

In the above, in the process of uploading the parameters of the trained deep reinforcement learning model to the edge server according to the determined bandwidth resource allocation strategy, the available uplink data transmission rate between the i-th round participant k and the edge server m

Expressed as:

wherein, P_kWhich represents the transmission power of the participant k,

representing the power spectral density of additive white gaussian noise,

k denotes the channel gain of the participant k and the edge server m, and ψ 0 denotes the channel power gain at the reference distance.

The method also comprises the time for the ith round participant k to upload the parameters of the deep reinforcement learning model to the edge server m

The concrete expression is as follows:

wherein xi represents the size of the metal surface defect detection model,

indicating the available upstream data transmission rate between the ith round participant k and the edge server m.

A distributed federated learning collaborative computing system, comprising: a deep reinforcement learning unit and a federal learning unit; the deep reinforcement learning unit is used for carrying out deep reinforcement learning model training; and the federated learning unit is used for performing federated learning according to the association strategy generated by the deep reinforcement learning model and the calculation, namely the bandwidth resource allocation strategy.

The application has the following beneficial effects:

(1) the distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment break the dependence of the traditional federated learning on the central server and effectively ensure the privacy protection and the security in the federated learning process, aiming at a distributed federated learning framework.

(2) The distributed federated learning collaborative computing method and system provided by the embodiment achieve the design goal of minimizing the total time delay of federated learning from two angles, that is, the total iteration turn is reduced and the time consumed by each iteration turn is reduced are considered at the same time, the computing and communication resources of each participant and the edge server are fully utilized, and the utility maximization of federated learning is achieved.

(3) The distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment take the influence of the calculated amount of each participant on the model precision into consideration, adjust the weight occupied by the local model of each participant in the global aggregation process, ensure the fairness of the aggregation process, and are beneficial to accelerating the model convergence.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings can be obtained by those skilled in the art according to the drawings.

FIG. 1 is a flow chart of a distributed federated learning collaborative computing method as presented herein;

FIG. 2 is a schematic diagram of a distributed federated learning collaborative computing system as presented herein.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The method and the device solve the problem of minimizing the total time delay in a distributed federated learning system framework, namely minimizing the total time delay when a global model reaches target precision, and give emphasis to the problems of association between an edge server and participants in the system, bandwidth resource allocation and calculation resource allocation of the participants.

Scene assumption is as follows: the application uses the set K ═ {1, 2, …, K } to represent all the participants of federal learning, and the size of the data set of participant K is represented as D_kFor each sample d in the dataset_n＝{x_n，y_n}，x_nVector representing input, y_nRepresenting a vector x_nCorresponding output tag with [ x ]_k，y_k]Representing the location coordinates of party k; all small base stations as edge servers are represented by the set of M {1, 2, …, M }, and x_m，y_m]Indicating the location coordinates of the edge server m. In addition, the iteration turns of the federal learning are represented by I ═ {1, 2, I, I },

indicating that the participant k establishes a communication connection with the edge server m in the ith iteration, otherwise, the participant k establishes a communication connection with the edge server m

Representing the mini-batch value of participant k at the time of the ith round of performing local training; all slots of each iteration are denoted by T ═ 1, 2, T, Δ T denotes the slot length,

representing edgesThe edge server m allocates bandwidth for the participant k in each time slot; omegaⁱA global model representing the ith round is shown,

representing the local model of party k in the ith iteration.

The technical problem to be solved by the present application is how to solve the problem of minimizing the total time delay of collaborative computation in the federal learning process, and the problem is specifically expressed as follows:

where C1 indicates that each participant can only connect to one edge server; c2 indicates that each edge server is connected to at least one participant; c3 indicates that each edge server does not allocate bandwidth beyond its maximum bandwidth capacity; c4 indicates that the mini-batch value of each round of the participant does not exceed the data size of the participant.

Represents the time required by the participant k in the ith round of local training, wherein

Representing the bandwidth allocated by the edge server m to party k per slot,

The size of the data set for party k is denoted Dk.

Indicating the mini-batch value of participant k at the time the ith round performed the local training. B is_mRepresenting the maximum bandwidth of each edge server.

The problem, which has dynamic constraints and long-term goals and the current state of the system depends only on the state and actions taken at the previous iteration, satisfies the markov property, can be expressed as a Markov Decision Process (MDP), i.e., MDP { S, a, γ, R }. Wherein S represents a state space, A represents an action space, gamma represents a discount factor, and R represents a reward function. Meanwhile, the solution of the problem is also converted into the determination of the optimal action selection corresponding to the current state under different states.

Further, the above problem can be translated into solving the association and bandwidth resource allocation problem between the edge server and the participants and the computational resource allocation problem of the participants. In this problem, there are three decision variables, one for each

Wherein

And

are discrete variables and only vary between different polymerization runs, and

the method is a continuous variable and changes among each time slot, so that deep reinforcement learning with double time scales can be adopted, an aggregation turn i is taken as a time interval of a slow time scale, and a DQN network is adopted to generate an association strategy and a calculation resource allocation strategy in the current state on the slow time scale; and with the time slot length delta t as a time interval of a fast time scale, performing single-step updating on the fast time scale by adopting an Actor-critic (AC) network to generate a bandwidth resource allocation strategy in the current state.

Based on the above thought, the present application provides a flowchart of a distributed federated learning collaborative computing method as shown in fig. 1, which specifically includes the following steps.

Step S110: and carrying out deep reinforcement learning model training.

Wherein, the deep reinforcement learning model is trained in advance by adopting an off-line training mode and an on-line executing mode. The training deep reinforcement learning model (DRL model) is specifically a training AC network and a training DQN network. Wherein the DRL model training comprises the following substeps:

step S1101: and initializing the network parameters and the state information of the DRL model.

Specifically, the initialized state information specifically includes: initializing parameters of an Actor network, a Critic network and a DQN network, initializing an association strategy and position coordinates [ x ] of each participant_k，y_k]Initial mini-batch value

CPU frequency f_kPosition coordinates of edge servers [ x ]_m，y_m]And maximum bandwidth B_mTime slot length delta t and maximum iteration number I, local model parameters used in the process of simulating the federal learning.

Step S1102: each participant performs training of its own local model.

And simulating a federal learning process according to the network parameters and the state information initialized in the step 1101, namely simulating each participant to train a local model according to a mini-batch value output by the DQN network. The purpose of simulating the federal learning process is to train the DRL model.

Preferably, each participant uses an optimization method of a small-batch random Gradient Descent (MBGD) method to perform training of the local model.

Local data set D_kDivided into a plurality of sizes of

wherein, eta represents the learning rate,

the gradient of the loss function for each small batch b is represented,

representing the local model of party k in the ith iteration.

Wherein, after the simulation training of the local model, the method also comprises the steps of determining the time required by the participant k during the ith round of local training,

time required by participant k in ith round of local training

The concrete expression is as follows:

wherein, c_kDenotes the number of CPU cycles for the participant k to train a single data sample, τ denotes the number of iterations for the participant to execute the MBGD algorithm, f_kRepresenting the CPU cycle frequency at which participant k trains.

Step S1103: in response to completing the simulated local model training, a bandwidth allocation policy is generated and the local model transmission is simulated while updating the AC network parameters in a single step at each time slot.

And simultaneously, the AC network observes the fast scale state s of the current time slot, outputs a fast scale action A (t), and adopts a Bellman equation to update AC network parameters.

In particular, the fast-scale state is represented as

A local model size representing the outstanding transmission of each participant, wherein

ξ denotes the local model size,

representing the transmission rate at which each timeslot participant uploads the local model,

specifically, the available upstream data transmission rate between the ith round participant k and the edge server m is represented as:

wherein, P_kWhich represents the transmission power of the participant k,

representing the power spectral density of additive white gaussian noise,

indicating the channel gain, ψ, of the participant k and the edge server m₀Representing the channel power gain at the reference distance.

Fast scale motion

I.e., bandwidth resource allocation policy, wherein

Indicating the bandwidth allocated by the edge server m for party k per slot.

The fast scale reward function R (t) is expressed as:

where μ (t) is a parameter for adjusting the reward function.

Discount factor γ: to reduce the impact of future rewards on the current, more distant rewards have less effect. The jackpot achieved by selecting the fast-scale action a (t) in the fast-scale state s may be defined as:

step S1104: and responding to the transmission of the simulated local model, simulating global model aggregation, generating a next round of association strategy and calculation resource allocation strategy, and updating the DQN network parameters.

Wherein the local model parameters of each participant are weighted by the following formula to obtain global model parameters omegaⁱAnd detecting the global model accuracy:

where α + β ═ 1 denotes two parameters for adjusting the weight ratio.

Since the association policy in step S1103 is initialized in advance, updating of the association policy needs to be performed. Specifically, the current slow scale state S is used as the input of the DQN network, the slow scale action A is output, namely the association strategy and the calculation resource allocation strategy are associated, and the parameters of the DQN network are updated by adopting a Bellman equation.

Wherein the slow scale state is represented as S ═ t_k，t_k，m]，

Representing the time vector consumed by each participant in local training,

representing a time vector consumed by each participant to upload the model, wherein

Representing the time it takes for participant k to upload the model to edge server m.

The slow scale motion is denoted as a ═ a, B]，

An association vector, i.e. an updated association policy,

and representing a mini-batch vector when each participant executes local model training, namely a computing resource allocation strategy.

Slow scale reward function RⁱExpressed as:

where mu is a parameter for adjusting the reward function,

indicating the accuracy of the ith round global model.

The jackpot achieved by selecting the slow-scale action a in the slow-scale state S may be defined as:

step S1105: and detecting whether the DRL model converges or reaches the maximum iteration number.

And if the convergence is not achieved or the maximum iteration number is reached, adding 1 to the iteration number, repeatedly executing the steps S1102-S1104, starting the next iteration, and taking the global model as the local model of each participant to re-simulate the local model training.

In the next iteration process, the association strategy generated in the last iteration and the mini-batch vector required by the next local model training are utilized, and then a new bandwidth allocation strategy is generated in the next iteration process according to the fast scale state space observed by the AC network in the current time slot, and a new association strategy and a calculation resource allocation strategy are generated by the DQN in the slow scale state space. By analogy, the bandwidth resource allocation policy, the association policy, and the computing resource allocation policy are continuously updated.

If convergence or the maximum iteration number is reached, training of the AC network and the DQN network is completed, that is, training of the DRL model is completed, and step S1106 is performed.

Step S1106: and sending each parameter of the trained DRL model to an edge server.

The edge server loads a DRL model, namely the trained AC network and DQN network, and is used for generating an association strategy and a bandwidth and computing resource allocation strategy in the current state, and completing the deployment of the DRL model.

Step S120: and responding to the fact that the trained DRL model is respectively deployed to each edge server, and performing federal learning.

Since the DRL model is to solve the problem of minimizing the federal learning delay, the DRL model is applied to the federal learning process in step S120 after the DRL model is trained in step S110.

Wherein step S120 specifically includes the following substeps:

step S1201: the local model is initialized.

Wherein, the proper metal surface defect detection model selected by the appointed party is used as a local model.

Specifically, the parameters of the metal surface defect detection model, the learning rate, the initial mi i-batch value and the iteration times of the metal surface defect detection model are broadcasted to other participants through an edge server, and each participant uses the metal surface defect detection model as a local model to complete the initialization of the local model.

Step S1202: and responding to the completion of the initialization of the local model, and performing local model training by each participant according to the calculation resource allocation strategy in the current state.

In this step, the calculation resource allocation policy in the current state is the calculation resource allocation policy output by the trained DQN network after step S110 is executed.

The training mode of the local model is trained according to the existing method, which is not described herein.

Step S1203: and each participant uploads the local model parameters trained by the participant to the edge server respectively according to the association strategy and the bandwidth resource allocation strategy.

Specifically, the association policy and the bandwidth resource allocation policy at this time are the association policy and the bandwidth resource allocation policy output by the AC network and the DQN network after the step S110 is executed.

Step S1204: and carrying out global model aggregation on the local model uploaded by each participant, and sending the global model parameters and the calculation resource allocation strategy to each participant.

Specifically, the local models uploaded by all the participants are aggregated into a global model.

In the aggregation process, an edge server serving as a central server temporarily is selected according to the position information of the edge server, and the selection is specifically performed according to the following formula:

wherein, [ x ]_m，y_m]The position coordinates of each edge server are shown, and the set M {1, 2, …, M } shows all the small base stations as edge servers.

Further, after obtaining the temporary central server according to the above formula, the temporary central server weights the local model parameters of each participant by using the following formula, and finally obtains the global model parameter ωⁱ：

Where α + β ═ 1 denotes two parameters for adjusting the weight ratio.

At this time, the calculation resource allocation policy sent to each participant is the calculation resource allocation policy required for the next iteration after the step 1202 and 1203 are executed. Time vector t consumed by local training of each participant in training of local model at step S1202_kWith the change, in step S1203, each participant uploads the time vector t consumed by the model_k，mA change has also occurred, so that in the current state space S ═ t_k，t_k，m]The change also occurs, and the resulting association policy a ═ a, B]The change has occurred in the form of a change,

the change is also generated, namely the mini-batch vector used in the next iteration is changed, and the change of the mini-batch vector brings the change of the calculation resource allocation strategy, namely the calculation resource allocation strategy used in the next iteration is changed.

Step S1205: and judging whether the global model reaches the preset convergence precision or the maximum iteration number.

And if the global model does not reach the preset convergence precision or the maximum iteration number, adding 1 to the iteration number, and re-executing the step S1202, namely re-training the local model.

The local model is re-trained according to the global model participation and the computing resource allocation strategy sent to each participant in step S1204.

Specifically, the global model received by each participant is used as the local model again, and the local model is retrained again according to the calculation resource allocation strategy sent to each participant in step S1204 and required by the next iteration. I.e. steps S1202-1204 are repeatedly performed.

If the global model reaches the preset convergence accuracy or reaches the maximum iteration number, ignoring the global model and the calculation resource allocation strategy sent to each participant in step S1204, and performing step S130 without performing the training of the local model.

Step S130: the federal learning process is ended.

As shown in fig. 2, the distributed federated learning collaborative computing system provided for the present application specifically includes: deep reinforcement learning model training unit 210, federal learning unit 220.

The deep reinforcement learning model training unit 210 is configured to perform deep reinforcement learning model training.

The federal learning unit 220 is connected to the deep reinforcement learning model training unit 210, and is configured to perform federal learning according to the association policy and the calculation, i.e., the bandwidth resource allocation policy generated by the deep reinforcement learning model.

The application has the following beneficial effects:

(3) the distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment break the dependence of the traditional federated learning on the central server and effectively ensure the privacy protection and the security in the federated learning process, aiming at a distributed federated learning framework.

(4) The distributed federated learning collaborative computing method and system provided by the embodiment achieve the design goal of minimizing the total time delay of federated learning from two angles, that is, the total iteration turn is reduced and the time consumed by each iteration turn is reduced are considered at the same time, the computing and communication resources of each participant and the edge server are fully utilized, and the utility maximization of federated learning is achieved.

(5) The distributed federated learning collaborative computing method and the distributed federated learning collaborative computing system provided by the embodiment take the influence of the calculated amount of each participant on the model precision into consideration, adjust the weight occupied by the local model of each participant in the global aggregation process, ensure the fairness of the aggregation process, and are beneficial to accelerating the model convergence.

The above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: those skilled in the art can still make modifications or easily conceive of changes to the technical solutions described in the foregoing embodiments, or make equivalents to some of them, within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. a distributed federated learning collaborative computing method, is characterized in that, specifically comprises the following steps:

Perform deep reinforcement learning model training;

In response to deploying the trained deep reinforcement learning model to each edge server, perform federated learning:

Federated Learning is over.

2. The distributed federated learning collaborative computing method as claimed in claim 1, characterized in that, carrying out deep reinforcement learning model training, specifically comprising the following substeps:

Initialize the network parameters and state information of the deep reinforcement learning model;

Each participant trains their own local models according to the network parameters and state information initialized by the deep reinforcement learning model;

In response to completing the simulated training of the local model, a bandwidth allocation strategy is generated, and AC network parameters are updated in a single step at each time slot;

In response to completing the simulated transmission of the local model, an association strategy and a computing resource allocation strategy are generated, and the DQN network parameters are updated;

Detect whether the deep reinforcement learning model has converged or the maximum number of iterations;

If there is no convergence or the maximum number of iterations, start the next round of iterations and re-train the local model.

3. The distributed federated learning collaborative computing method according to claim 2, wherein the metal surface defect detection model is used as a local model.

4. The distributed federated learning collaborative computing method according to claim 2, wherein the initialized state information specifically includes: parameters and convergence accuracy of the initialized Actor network, Critic network, and DQN network, the position coordinates of each participant [ x _k , y _k ], initial mini-batch value

The CPU frequency f _k , the position coordinates [x _m , y _m ] of each edge server, the maximum bandwidth B _m , the time slot length Δt and the maximum number of iterations I.

5. The distributed federated learning collaborative computing method according to claim 2, wherein the training process of the local model is performed by the participant, and the local data set D is _divided into several sizes of

and update the local weights by the following formula to train the mini-batch b, so as to complete the training of the local model, where the training process is expressed as:

where η is the learning rate,

represents the gradient of the loss function for each mini-batch b,

Represents the local model of the participating parties in the i-th iteration.

6. distributed federated learning collaborative computing method as claimed in claim 2, is characterized in that, when carrying out the simulation training of local model, also comprises, the time required to participate in Fangliang when determining the i-th round of local training,

The time required to participate in Fangliang during the i-th round of local training

Specifically expressed as:

Among them, c _k represents the number of CPU cycles when the participant trains a single data sample, τ represents the number of iterations when the participant executes the MBGD algorithm, f _k represents the CPU cycle frequency when the participant k is training,

Represents the mini-batch value of participant k when performing local training in the ith round.

7. The distributed federated learning collaborative computing method according to claim 2, wherein the current fast-scale state space is used as the input of the AC network to obtain a fast-scale action space, that is, a bandwidth resource allocation strategy;

The fast-scale state space s is expressed as:

Indicates the size of the model that each participant has not completed the transfer,

Represents the transmission rate of the uploading model of each time slot participant, t represents the time slot, and Δt represents the length of the time slot;

fast-scale action space

The fast-scale action space is the bandwidth resource allocation strategy, where

represents the bandwidth allocated by edge server m to participant k in each time slot.

8. The distributed federated learning collaborative computing method as claimed in claim 1, characterized in that, in the process of uploading each parameter of the trained deep reinforcement learning model to the edge server according to the determined bandwidth resource allocation strategy, the i-th round. Available upstream data transfer rate between participant k and edge server m

Expressed as:

where P _k represents the transmission power of participant k,

represents the power spectral density of additive white Gaussian noise,

represents the channel gain of participant k and edge server m, and ψ ₀ represents the channel power gain at the reference distance.

9. The distributed federated learning collaborative computing method according to claim 1, further comprising, the time taken by the i-th round participant k to upload the parameters of the deep reinforcement learning model to the edge server m

Specifically expressed as:

Among them, ξ represents the size of the metal surface defect detection model,

represents the available uplink data transmission rate between the i-th round participant k and the edge server m.

10. A distributed federated learning collaborative computing system, characterized in that it specifically comprises: a deep reinforcement learning unit and a federated learning unit;

The deep reinforcement learning unit is used for deep reinforcement learning model training;

The federated learning unit is used to perform federated learning according to the association strategy and the calculation, that is, the bandwidth resource allocation strategy, generated by the deep reinforcement learning model.