CN117409294A - Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency - Google Patents

Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency Download PDF

Info

Publication number
CN117409294A
CN117409294A CN202311408911.3A CN202311408911A CN117409294A CN 117409294 A CN117409294 A CN 117409294A CN 202311408911 A CN202311408911 A CN 202311408911A CN 117409294 A CN117409294 A CN 117409294A
Authority
CN
China
Prior art keywords
communication frequency
edge
global
image classification
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311408911.3A
Other languages
Chinese (zh)
Inventor
罗龙
张弛
陈栖栖
虞红芳
孙罡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202311408911.3A priority Critical patent/CN117409294A/en
Publication of CN117409294A publication Critical patent/CN117409294A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/94Hardware or software architectures specially adapted for image or video understanding
    • G06V10/95Hardware or software architectures specially adapted for image or video understanding structured as a network, e.g. client-server architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/96Management of image or video recognition tasks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control
    • H04W28/08Load balancing or load distribution
    • H04W28/09Management thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • H04W28/20Negotiating bandwidth

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention discloses a cloud edge end collaborative distributed learning method and a system based on self-adaptive communication frequency, wherein the method comprises the steps of receiving local image classification models uploaded by all edge servers, and performing global aggregation update to obtain a global image classification model; lips of all clients uploaded by all edge servers in statistical training processThe Chitz constant and the gradient estimate variance, and estimate the best communication frequency of the edge server and the client with best performance; according to the calculation and communication performance of the edge servers, the communication frequency of each edge server is adjusted; transmitting global image classification model, communication frequencyAnd the optimal communication frequency of the client is given to the edge server i; judging whether the communication resource exceeds the limit, if so, ending the training of the global image classification model, issuing the global image classification model to the client and the edge server, and otherwise, returning to the first step.

Description

Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency
Technical Field
The invention relates to an information sharing technology, in particular to a cloud edge end collaborative distributed learning method and system based on self-adaptive communication frequency.
Background
In recent years, the development of artificial intelligence and machine learning techniques is rapid, the technical levels of cloud computing, blockchain and the like are rapidly developed along with the continuous improvement of computing power, and machine learning is one of core technologies for supporting the future intelligent society. The machine learning technology extracts useful information from massive training data to carry out iterative training, outputs a machine learning model meeting the precision requirement to be applied to each specific application scene so as to improve the high accuracy of the model, for example, the mobile equipment (client) carries out image classification/recognition model, and because of the priority of samples in the local data set, the machine learning technology can learn the experience knowledge of the models on other mobile equipment through data sharing so as to improve the classification precision of the local image classification model.
As the number of mobile devices and internet of things devices connected to the internet has proliferated, the network edge generates large amounts of data. Traditionally with high performance data center clusters, cloud-centric model training is faced with extremely high communication costs: transferring massive amounts of training data from different mobile devices to a single cloud computing data center is slow and can result in high communication costs. The existing cloud side end collaborative distributed learning (CEC-CDL) shares model parameters but does not share original training data, so that the data privacy is protected; and meanwhile, two-stage synchronous aggregation is introduced to realize the trade-off between training performance and communication efficiency.
In order to reduce data traffic in the process that the mobile equipment improves the accuracy of the mobile equipment through model training, a plurality of work suggestions are introduced into cloud edge end cooperative distributed learning, and a two-stage aggregation mechanism of local aggregation and global aggregation is adopted, so that communication time expenditure is reduced through reducing communication times. In the technical schemes, all the client nodes and the edge server nodes are distributed with the same and fixed communication frequency, and the communication time cost is reduced by utilizing local aggregation and global aggregation, so that the training efficiency is improved.
The communication frequency optimization method simply distributes the same and fixed communication frequency to all client nodes and edge server nodes respectively, and the communication frequency distribution depends on experience or needs parameter adjustment, so that the efficiency is low, the effectiveness cannot be ensured, and the communication frequency distribution cannot be changed along with the training process. Meanwhile, the technical schemes neglect the influence of system isomerism (in Yun Bianduan collaborative distributed learning systems, the calculation and communication performances of the participating nodes (client and edge servers) are different), and great synchronous waiting time can be generated among the participating nodes with great performance difference, so that serious problem of a straggler is caused, and training speed is influenced.
In order to reduce the systematic heterogeneous effects in cloud-edge collaborative distributed learning, and mitigate the synchronization barrier, some work suggests introducing adaptive communication frequency adjustment. The most advanced solution adopting the design is to set the communication frequency of the slowest client and the edge server to be 1, and the rest clients and the edge server are distributed with the communication frequency matched with the node performance so as to reduce the waiting time caused by the synchronous barrier and improve the training efficiency under the heterogeneous scene of the system.
Compared with the method that the same and fixed communication frequencies are respectively distributed to all the clients and the edge servers, the technical scheme in the second prior art does not analyze the convergence of the distributed collaborative learning of the communication frequency optimization method designed by the method, the communication frequencies of the client nodes and the edge server nodes are adjusted in an empirically self-adaptive manner, and the convergence of model training cannot be guaranteed. Meanwhile, the optimal technical scheme II is sensitive to communication bandwidth setting, and the performance difference under different bandwidth setting is extremely large, so that the performance cannot be ensured.
Disclosure of Invention
Aiming at the defects in the prior art, the cloud edge end cooperative distributed learning method and system based on the self-adaptive communication frequency provided by the invention solve the problem that the cloud edge end cooperative distributed learning training time is long due to the fact that the fixed frequency is adopted in the existing distributed learning.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
in a first aspect, a cloud edge end collaborative distributed learning method based on adaptive communication frequency is provided, and the method is applied to a cloud server, and includes the steps of:
s1, receiving local image classification models uploaded by all edge servers, and performing global aggregation update to obtain global image classification models;
s2, calculating Lipschitz constants and gradient estimation variances of all clients uploaded by all edge servers in the training process, and estimating the optimal communication frequency of the edge server and the clients with optimal performance;
s3, according to the calculation and communication performance of the edge servers, the communication frequency of each edge server with the best non-performance is adjusted:
wherein,the communication frequency of the edge server i in the h round of global training round is used; y is h The completion time of the h round global training round is the completion time of the h round global training round; />Transmitting the communication time of the local image classification model in the h round of global training round for the edge server i; />The time for completing local iterative computation for the client l with the best performance under the h round of global training round edge server i; />Representing a downward rounding;
s4, transmitting the global image classification model, the communication frequency of the corresponding edge server and the optimal communication frequency of the client to the edge server;
s5, judging whether the communication resource exceeds the limit, if so, ending the training of the global image classification model, issuing the global image classification model to the client and the edge server, and otherwise, returning to the step S1.
The beneficial effects of the technical scheme are as follows: by using the method for cloud edge collaborative distributed learning training, communication frequency matched with calculation and communication performance can be distributed for each edge server, so that the completion time of each edge server is close, delay caused by a synchronous barrier is greatly relieved, and communication time is reduced.
Further, the method for estimating the best communication frequency of the edge server and the client with the best performance comprises the following steps:
s21, obtaining the adopted communication frequency k 1 And a search space formed by the range of the intermediate variable u;
s22, randomly selecting Lipschitz constant and gradient estimation variance of a client to average and then using the average as gradient estimation variance sigma during each search 2
S23, estimating variance sigma according to gradient 2 Searching within the search space results in approximately equipartition of the gradient g (H, k 1 U) k corresponding to the smallest time 1 And u, approximately equally dividing the gradient g (H, k 1 The calculation formula of u) is:
wherein f (omega) 0 ) Is an initial cloudA server global loss function; omega 0 Is an initial model parameter; n is the total number of clients; sigma (sigma) 2 Estimating variance for the gradient; l is Lipschitz constant; h is the global training total round;
s24, g (H, k) 1 U) k corresponding to the smallest time 1 Calculating the communication frequency k 2 =uk 1
S25, g (H, k) 1 U) k corresponding to the smallest time 1 As the optimal communication frequency of the edge server with optimal performance, the communication frequency k is adopted 2 As the best communication frequency for the best performing client under each edge server.
The beneficial effects of the technical scheme are as follows: by using the method, the optimal communication frequency between the edge server with the best performance and the client can be estimated for the cloud edge collaborative distributed learning training, the model convergence can be ensured, the optimal communication frequency can be obtained by self-adaptively calculating according to the training process, and the training efficiency of the cloud edge collaborative distributed learning can be improved while the model quality is improved.
Further, completion time t h And time ofThe calculation formula of (2) is as follows:
wherein η is the learning rate; c 1 Is a constant;the communication frequency of a client j under the edge server i in the h round of global training round is used; a is that i A client set under the edge server i; />Calculating time for single-round local iteration of each client; u is the communication frequency k 2 And communication frequency k 1 Is a proportional relationship of (a).
The beneficial effects of the technical scheme are as follows: by adopting the model to calculate the time, the completion time and the completion time of the fastest client under the edge server i can be obtained with extremely low calculation cost, and a reference is provided for the self-adaptive adjustment of the communication frequencies of the edge server and the client.
Further, the formula for performing global aggregation update is:
wherein, among them,the global aggregate weight of the model of the edge server i is obtained for the h global training round; m is the total number of edge servers; />The method comprises the steps of (1) classifying a local image of an edge server i for the h-th global training round; v h And the global image classification model of the cloud server is used for the h-th global training round.
Further, the global image classification model is a CNN model or a ResNet9 model.
In a second aspect, a cloud edge end collaborative distributed learning method based on adaptive communication frequency is provided, and the method is applied to an edge server, and includes the steps of:
a1, receiving a global image classification model issued by a cloud server, a communication frequency corresponding to an edge server and an optimal communication frequency of a client to the edge server;
a2, calculating the communication frequency of each client with optimal non-performance according to the calculation and communication performance of the client:
wherein,the communication frequency of a client j under the edge server i in the h round of global training round is used; />The time for completing local iterative computation for the client l with the best performance under the h round of global training round edge server i;transmitting the communication time of the local image classification model in the h round of global training round for the client j under the edge server i; />Calculating time for single local iteration of a client j under the edge server i in the h round of global training round;
a3, transmitting the global image classification model and the communication frequency of the corresponding client to the client;
a4, receiving local image classification models uploaded by all clients, and counting Lipschitz constants and gradient estimation variances of training processes uploaded by all clients;
a5, carrying out aggregation updating on the local image classification models uploaded by all the clients to obtain local image classification models;
a6, judging whether the local aggregation times reach the preset aggregation times, if so, entering a step A7, otherwise, sending the local image classification model to each client, and returning to the step A4;
and A7, uploading a local image classification model and Lipschitz constants and gradient estimation variances uploaded by all clients according to communication frequencies issued by the cloud server.
The beneficial effects of the technical scheme are as follows: by using the method for cloud edge collaborative distributed learning training, communication frequencies matched with calculation and communication performances of each client under each edge server can be distributed, so that the completion time of each client is close, delay caused by a synchronous barrier is greatly relieved, and communication time is reduced.
In a third aspect, a cloud edge end collaborative distributed learning method based on adaptive communication frequency is provided, and the method applies a client and includes the steps of:
c1, receiving a global image classification model and a local aggregation model issued by an edge serverThe corresponding communication frequency is adopted, and the local data set is adopted to carry out iterative computation and update of the local image classification model;
and C2, estimating Lipschitz constant and gradient estimation variance of the local image classification model training process:
wherein,a loss function of a customer service end j under the edge server i; omega h-1 And->The global image classification model of the cloud server and the local image classification model of the customer service end j under the edge server i in the h-1 th global training round are respectively; l (L) ij And->Lipschitz constant and gradient estimation variance of customer service end j under edge server i are respectively calculated; />Classifying sample data for images in a local data set of a customer service end j under an edge server i in the h-1 th global training round;
and C3, uploading a local image classification model, a Lipschitz constant and a gradient estimation variance according to the communication frequency issued by the edge server.
The beneficial effects of the technical scheme are as follows: by using the method, lipschitz constant and gradient estimation variance of the training process of the local image classification model can be obtained with extremely low calculation cost, and a basis is provided for self-adaptively adjusting the communication frequency of the client and the edge server for the cloud server in the next global training round.
Further, the local image classification model update formula of the client is:
wherein,the local image classification model of the customer service end j under the edge server i in the h-1 th global training round is obtained.
The beneficial effects of the technical scheme are as follows: when the local image classification model is updated, the local model is updated by using small batches of image classification samples, so that the calculation cost can be saved, the calculation time can be reduced, and the calculated local model is unbiased in theory.
In a fourth aspect, a cloud edge collaborative distributed learning system based on adaptive communication frequency optimization is provided, which includes a cloud server, a plurality of edge servers and a plurality of clients, wherein the cloud server communicates with the plurality of edge servers, and each edge server communicates with the plurality of clients.
The beneficial effects of the invention are as follows: according to the scheme, the communication frequencies of different clients are explored and adjusted to control local updating, meanwhile, the communication frequencies of different edge servers are adjusted to control local aggregation of a model, the optimal communication frequency under each global training round is obtained through calculation by quantifying the relation between the communication frequencies and training performances, and the communication frequencies are optimized aiming at the performances of different clients and edge servers to relieve the problem of a straggler, so that the overall training performance is improved.
According to the scheme, edge calculation and local aggregation are fully utilized, the communication frequency of the client node and the edge server node is adjusted on the premise of ensuring convergence, communication overhead in the training process is reduced, and the communication bottleneck problem is relieved. In the process of estimating the optimal communication frequency of the edge server and the client with the best performance, the cloud edge collaborative distributed learning system is helped to adaptively adjust the communication frequency according to the training process, and the iteration of model training is accelerated while the model performance is improved.
According to the scheme, the influence of system isomerism on the cloud edge collaborative distributed learning system is considered, the model aggregation updating is carried out in a weak synchronization updating mode, and the influence of the synchronization barrier on the training efficiency is effectively relieved. The effectiveness and the high efficiency of the scheme provided by the invention are verified through experimental tests, and compared with the existing scheme, the scheme improves the model convergence accuracy by 16% at most and reduces the training completion time by 4.7 times at most.
After the client adopts the distributed learning of the scheme, the local image classification model on the client can learn the advantages of the models on other clients faster, so that the defect of insufficient classification prediction performance of the model under the limited condition of a local data set is overcome, and the prediction classification precision of the model is improved.
Drawings
Fig. 1 is a flowchart of a cloud edge end collaborative distributed learning method based on adaptive communication frequency optimization applied to a cloud server.
Fig. 2 is a flowchart of a cloud edge end collaborative distributed learning method based on adaptive communication frequency optimization applied to an edge server.
Fig. 3 is a flowchart of a cloud edge end collaborative distributed learning method based on adaptive communication frequency optimization applied to customer service end.
Fig. 4 is an architecture diagram of a cloud-edge collaborative distributed learning system based on adaptive communication frequency optimization.
FIG. 5 is a graph of test accuracy of training a CNN model on a Fashion-MNIST dataset, (a) bandwidth b ce =b ec (b) is bandwidth b ce =10b ec
FIG. 6 is a test of training ResNet9 model on CIFAR-10 datasetAccuracy, (a) is bandwidth b ce =b ec (b) is bandwidth b ce =10b ec
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Referring to fig. 1, fig. 1 shows a flowchart of a cloud edge end collaborative distributed learning method based on adaptive communication frequency optimization applied to a cloud server, and the method S includes steps S1 to S5.
In step S1, receiving local image classification models uploaded by all edge servers, and performing global aggregation update to obtain a global image classification model; the formula for global aggregate update is:
wherein, among them,the global aggregate weight of the model of the edge server i is obtained for the h global training round; m is the total number of edge servers; />The method comprises the steps of (1) classifying a local image of an edge server i for the h-th global training round; omega h And the global image classification model of the cloud server is used for the h-th global training round.
In step S2, statistics is performed on Lipschitz constants and gradient estimation variances of all clients uploaded by all edge servers in the training process, and the optimal communication frequencies of the edge servers and clients with the optimal performance are estimated;
in one embodiment of the present invention, a method for estimating the best communication frequency of the best performing edge server and client comprises:
s21, obtaining the adopted communication frequency k 1 And a search space formed by the range of the intermediate variable u;
s22, randomly selecting Lipschitz constant and gradient estimation variance of a client to average and then using the average as gradient estimation variance sigma during each search 2
S23, estimating variance sigma according to gradient 2 Searching within the search space results in approximately equipartition of the gradient g (H, k 1 U) k corresponding to the smallest time 1 And u, approximately equally dividing the gradient g (H, k 1 The calculation formula of u) is:
wherein f (omega) 0 ) Global loss function for the initial cloud server; omega 0 Is an initial model parameter; n is the total number of clients; sigma (sigma) 2 Estimating variance for the gradient; l is Lipschitz constant; h is the global training total round;
s24, g (H, k) 1 U) k corresponding to the smallest time 1 Calculating the communication frequency k 2 =uk 1
S25, g (H, k) 1 U) k corresponding to the smallest time 1 As the optimal communication frequency of the edge server with optimal performance, the communication frequency k is adopted 2 As the best communication frequency for the best performing client under each edge server.
In step S3, according to the calculation and communication performance of the edge servers, the communication frequency of each edge server with the best non-performance is adjusted:
wherein,the communication frequency of the edge server i in the h round of global training round is used; t is t h The completion time of the h round global training round is the completion time of the h round global training round; />Transmitting the communication time of the local image classification model in the h round of global training round for the edge server i; />The time for completing local iterative computation for the client l with the best performance under the h round of global training round edge server i; />Representing a downward rounding;
in step S4, the global image classification model, the communication frequency of the corresponding edge server and the optimal communication frequency of the client are sent to the edge server;
in step S5, it is determined whether the communication resource exceeds the limit, and if the global image classification model has converged, if any determination condition is satisfied, the global image classification model training is ended, and the global image classification model is issued to the client and the edge server, otherwise, step S1 is returned.
In practice, the present scheme preferably completes time t h And time ofThe calculation formula of (2) is as follows:
wherein η is the learning rate; c 1 Is a constant;the communication frequency of a client j under the edge server i in the h round of global training round is used; a is that i A client set under the edge server i; />Calculating time for single-round local iteration of each client; u is the communication frequency k 2 And communication frequency k 1 Is a proportional relationship of (a).
The global image classification model, the local image classification model and the local image classification model of the scheme are CNN models or ResNet9 models.
Referring to fig. 2, fig. 2 shows a flowchart of a cloud edge collaborative distributed learning method based on adaptive communication frequency optimization applied to an edge server; as shown in fig. 2, the method a includes steps A1 to A7.
In step A1, receiving a global image classification model issued by a cloud server, a communication frequency corresponding to an edge server and an optimal communication frequency of a client to the edge server;
in step A2, according to the calculation and communication performance of the clients, the communication frequency of each client with the best non-performance is calculated:
wherein,the communication frequency of a client j under the edge server i in the h round of global training round is used; />The time for completing local iterative computation for the client l with the best performance under the h round of global training round edge server i;transmitting the communication time of the local image classification model in the h round of global training round for the client j under the edge server i; />Calculating time for single local iteration of a client j under the edge server i in the h round of global training round;
in step A3, the global image classification model and the communication frequency of the corresponding client are sent to the client;
in step A4, receiving local image classification models uploaded by all clients, and counting Lipschitz constants and gradient estimation variances of training processes uploaded by all clients;
in step A5, the local image classification models uploaded by all clients are aggregated and updated to obtain local image classification models:
wherein,the method comprises the steps of (1) classifying a local image of an edge server i for the h-th global training round; n (N) i The number of clients under edge server i; />The model local aggregation weight of the client j under the edge server i is obtained when the h global training round is carried out; />And (3) a local image classification model of the client j under the edge server i in the h global training round.
In the step A6, judging whether the local aggregation times reach the preset aggregation times, if so, entering the step A7, otherwise, sending the local image classification model to each client, and returning to the step A4;
in step A7, uploading the local image classification model and Lipschitz constants and gradient estimation variances uploaded by all clients according to the communication frequency issued by the cloud server.
Referring to fig. 3, fig. 3 shows a flowchart of a cloud edge end collaborative distributed learning method based on adaptive communication frequency optimization applied to a customer service end; as shown in FIG. 3, the method C includes steps C1 to C3.
In step C1, a global image classification model and a local aggregation model issued by an edge server are receivedThe corresponding communication frequency is adopted, and the local data set is adopted to carry out iterative computation and update of the local image classification model;
in step C2, the Lipschitz constant and gradient estimation variance of the local image classification model training process are estimated:
wherein,a loss function of a customer service end j under the edge server i; omega h-1 And->The global image classification model of the cloud server and the local image classification model of the customer service end j under the edge server i in the h-1 th global training round are respectively; l (L) ij And->Lipschitz constant and gradient estimation variance of customer service end j under edge server i are respectively calculated; />Classifying sample data for images in a local data set of a customer service end j under an edge server i in the h-1 th global training round;
in step C3, the local image classification model, lipschitz constant and gradient estimation variance are uploaded according to the communication frequency issued by the edge server.
In implementation, the local image classification model updating formula of the preferable client side of the scheme is as follows:
wherein,the local image classification model of the customer service end j under the edge server i in the h-1 th global training round is obtained.
As shown in fig. 4, the scheme further provides a cloud edge collaborative distributed learning system based on adaptive communication frequency optimization, which comprises a cloud server, a plurality of edge servers and a plurality of clients, wherein the cloud server is communicated with the plurality of edge servers, and each edge server is communicated with the plurality of clients.
The accuracy of image classification by the cloud edge end collaborative distributed learning method provided by the scheme is described below by combining a specific example:
data set selection
The scheme uses different models and real-world image classification data sets to perform performance tests, and specifically comprises the following steps: (1) A data set Fashion-MNIST and a 2-layer CNN model with a parameter quantity of 0.58M; (2) Data set CIFAR-10 and ResNet9 model with 2.45M parameters.
Performance index
The index of the performance test is the accuracy of the image classification of the global model, namely the ratio of the number of images which can be correctly classified by the global model in the image of the test set to the number of images of the test set.
Isomerism arrangement
To simulate computational heterogeneity, the present scheme assumes that the computation time of the CNN local iteration on Fashion-MNIST follows a uniform distribution of U (0.5, 3), while the computation time of ResNet9 on CIFAR-10 follows a uniform distribution of U (2, 6). To simulate network heterogeneity, the scheme enables bandwidth b between edge server and cloud server ec Fluctuating between 0.5Mbps and 5Mbps, while bandwidth b between client and edge server ce Two settings are used, one is b ce =b ec The other is b ce =10b ec
Contrast algorithm
The scheme uses three algorithms, i.e. HierFAVG (HierFAVG is a widely used cloud edge cooperative distributed learning method, which respectively allocates the same and fixed communication frequency to each client and edge server), HFL (HFL is a classical cloud edge cooperative distributed learning method, which respectively allocates the same and fixed communication frequency to each client and edge server), and RAF (RAF is a current most advanced cloud edge cooperative distributed learning method, which respectively allocates the communication frequency to the slowest client and edge server to be 1, and then adaptively adjusts the communication frequencies of other clients and edge servers based on the same as a comparison algorithm of the scheme method (CDlada). Wherein HierFAVG and HFL are fixed communication frequency algorithms, respectively using the method (k) 1 =6,k 2 =10) and (k 1 =5,k 2 =50) communication frequency setting; RAF is a communication frequency adaptive adjustment algorithm for tree-based hierarchical training systems.
As shown in fig. 5 and 6, the test precision results of the CNN model training on the fascion-MNIST data set in the independent co-distribution scene and the test precision results of the ResNet9 model training on the CIFAR-10 data set in the non-independent co-distribution scene are respectively obtained by different algorithms.
Experimental results show that in multiple scenes, the CDlambda performance is superior to all comparison algorithms, namely HierFAVG, HFL and RAF, and is not affected by network bandwidth setting, and model convergence accuracy can be improved by 16% up to a maximum, and training completion time is reduced by 4.7 times.
According to the scheme, under the guarantee of model convergence, the communication frequencies of the client and the edge server are adaptively adjusted, the communication frequencies matched with the performance of each client and the edge server are distributed according to the calculation and communication capabilities of each client and each edge server, delay caused by system isomerism is reduced, and training efficiency of cloud edge collaborative distributed learning is improved while model convergence is guaranteed.
The method of the scheme is relative to HierFAVG and HFL: the influence of system isomerism is considered, communication frequencies matched with the clients and edge servers with different performances are allocated to the clients and the edge servers, delay caused by a synchronous barrier is relieved, and training efficiency is improved;
the method in the scheme is relative to RAF: and performing convergence analysis, and adaptively adjusting the communication frequency of the client and the edge server under the convergence guarantee, thereby providing the convergence guarantee for the method.

Claims (9)

1. The cloud edge end collaborative distributed learning method based on the self-adaptive communication frequency is characterized by comprising the following steps of:
s1, receiving local image classification models uploaded by all edge servers, and performing global aggregation update to obtain global image classification models;
s2, calculating Lipschitz constants and gradient estimation variances of all clients uploaded by all edge servers in the training process, and estimating the optimal communication frequency of the edge server and the clients with optimal performance;
s3, according to the calculation and communication performance of the edge servers, the communication frequency of each edge server with the best non-performance is adjusted:
wherein,the communication frequency of the edge server i in the h round of global training round is used; t is t h The completion time of the h round global training round is the completion time of the h round global training round; />Transmitting the communication time of the local image classification model in the h round of global training round for the edge server i; />The time for completing local iterative computation for the client l with the best performance under the h round of global training round edge server i; />Representing a downward rounding;
s4, transmitting the global image classification model, the communication frequency of the corresponding edge server and the optimal communication frequency of the client to the edge server;
s5, judging whether the communication resource exceeds the limit, if so, ending the training of the global image classification model, issuing the global image classification model to the client and the edge server, and otherwise, returning to the step S1.
2. The cloud-edge collaborative distributed learning method according to claim 1, wherein the method comprises the following steps: the method for estimating the best communication frequency of the edge server and the client with the best performance comprises the following steps:
s21, obtaining the adopted communication frequency k 1 And a search space formed by the range of the intermediate variable u;
s22, randomly selecting Lipschitz constant and gradient estimation variance of a client to average and then using the average as gradient estimation variance sigma during each search 2
S23, estimating variance sigma according to gradient 2 Searching within the search space results in approximately equipartition of the gradient g (H, k 1 U) k corresponding to the smallest time 1 And u, approximately equally dividing the gradient g (H, k 1 The calculation formula of u) is:
wherein f (omega) 0 ) Global loss function for the initial cloud server; omega 0 Is an initial model parameter; n is the total number of clients; sigma (sigma) 2 Estimating variance for the gradient; l is Lipschitz constant; h is the global training total round;
s24, g (H, k) 1 U) k corresponding to the smallest time 1 Calculating the communication frequency k 2 =uk 1
S25, g (H, k) 1 U) k corresponding to the smallest time 1 As the optimal communication frequency of the edge server with optimal performance, the communication frequency k is adopted 2 As the best communication frequency for the best performing client under each edge server.
3. The cloud-edge collaborative distributed learning method according to claim 2, characterized in that: completion time t h And time ofThe calculation formula of (2) is as follows:
wherein η is the learning rate; c 1 Is a constant;the communication frequency of a client j under the edge server i in the h round of global training round is used; a is that i A client set under the edge server i; />Calculating time for single-round local iteration of each client; />The communication frequency of a client j under the edge server i in the h round of global training round is used; u is the communication frequency k 2 And communication frequency k 1 Is a proportional relationship of (a).
4. The cloud-edge collaborative distributed learning method according to claim 1, wherein the method comprises the following steps: the formula for global aggregate update is:
wherein,the global aggregate weight of the model of the edge server i is obtained for the h global training round; m is the total number of edge servers; />The method comprises the steps of (1) classifying a local image of an edge server i for the h-th global training round; omega h And the global image classification model of the cloud server is used for the h-th global training round.
5. The cloud edge end collaborative distributed learning method according to any one of claims 1-4, wherein: the global image classification model is a CNN model or a ResNet9 model.
6. The cloud edge end collaborative distributed learning method based on the self-adaptive communication frequency is characterized by comprising the following steps of:
a1, receiving a global image classification model issued by a cloud server, a communication frequency corresponding to an edge server and an optimal communication frequency of a client to the edge server;
a2, calculating the communication frequency of each client with optimal non-performance according to the calculation and communication performance of the client:
wherein,the communication frequency of a client j under the edge server i in the h round of global training round is used; />The time for completing local iterative computation for the client l with the best performance under the h round of global training round edge server i; />Transmitting the communication time of the local image classification model in the h round of global training round for the client j under the edge server i; />Calculating time for single local iteration of a client j under the edge server i in the h round of global training round;
a3, transmitting the global image classification model and the communication frequency of the corresponding client to the client;
a4, receiving local image classification models uploaded by all clients, and counting Lipschitz constants and gradient estimation variances of training processes uploaded by all clients;
a5, carrying out aggregation updating on the local image classification models uploaded by all the clients to obtain local image classification models;
a6, judging whether the local aggregation times reach the preset aggregation times, if so, entering a step A7, otherwise, sending the local image classification model to each client, and returning to the step A4;
and A7, uploading a local image classification model and Lipschitz constants and gradient estimation variances uploaded by all clients according to communication frequencies issued by the cloud server.
7. The cloud edge end collaborative distributed learning method based on the self-adaptive communication frequency is characterized by comprising the following steps of:
c1, receiving a global image classification model and a local aggregation model issued by an edge serverThe corresponding communication frequency is adopted, and the local data set is adopted to carry out iterative computation and update of the local image classification model;
and C2, estimating Lipschitz constant and gradient estimation variance of the local image classification model training process:
wherein,a loss function of a customer service end j under the edge server i; omega h-1 And->The global image classification model of the cloud server and the local image classification model of the customer service end j under the edge server i in the h-1 th global training round are respectively; l (L) uj And->Lipschitz constant and gradient estimation variance of customer service end j under edge server i are respectively calculated; />Classifying sample data for images in a local data set of a customer service end j under an edge server i in the h-1 th global training round;
and C3, uploading a local image classification model, a Lipschitz constant and a gradient estimation variance according to the communication frequency issued by the edge server.
8. The cloud-edge collaborative distributed learning method based on the adaptive communication frequency according to claim 7, wherein the method comprises the following steps: the local image classification model updating formula of the client is as follows:
wherein,the local image classification model of the customer service end j under the edge server i in the h-1 th global training round is obtained.
9. Cloud edge end collaborative distributed learning system based on self-adaptive communication frequency, which is characterized in that: the cloud server for executing the cloud edge end cooperative distributed learning method according to any one of claims 1 to 5, a plurality of edge servers for executing the cloud edge end cooperative distributed learning method according to claim 6, and a plurality of clients for executing the cloud edge end cooperative distributed learning method according to claim 7 or 8; the cloud server communicates with a number of edge servers, each of which communicates with a number of clients.
CN202311408911.3A 2023-10-27 2023-10-27 Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency Pending CN117409294A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311408911.3A CN117409294A (en) 2023-10-27 2023-10-27 Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311408911.3A CN117409294A (en) 2023-10-27 2023-10-27 Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency

Publications (1)

Publication Number Publication Date
CN117409294A true CN117409294A (en) 2024-01-16

Family

ID=89486771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311408911.3A Pending CN117409294A (en) 2023-10-27 2023-10-27 Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency

Country Status (1)

Country Link
CN (1) CN117409294A (en)

Similar Documents

Publication Publication Date Title
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
WO2021227508A1 (en) Deep reinforcement learning-based industrial 5g dynamic multi-priority multi-access method
CN110460880B (en) Industrial wireless streaming media self-adaptive transmission method based on particle swarm and neural network
Xie et al. Adaptive online decision method for initial congestion window in 5G mobile edge computing using deep reinforcement learning
CN107948083B (en) SDN data center congestion control method based on reinforcement learning
CN113222179A (en) Federal learning model compression method based on model sparsification and weight quantization
CN111711666B (en) Internet of vehicles cloud computing resource optimization method based on reinforcement learning
CN111629380A (en) Dynamic resource allocation method for high-concurrency multi-service industrial 5G network
Jiang et al. Fedmp: Federated learning through adaptive model pruning in heterogeneous edge computing
CN113518007B (en) Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning
CN112637883A (en) Federal learning method with robustness to wireless environment change in power Internet of things
CN113778691B (en) Task migration decision method, device and system
CN113469325A (en) Layered federated learning method, computer equipment and storage medium for edge aggregation interval adaptive control
CN115374853A (en) Asynchronous federal learning method and system based on T-Step polymerization algorithm
CN115967990A (en) Classification and prediction-based border collaborative service unloading method
Zheng et al. Digital twin empowered heterogeneous network selection in vehicular networks with knowledge transfer
Cui et al. Multi-Agent Reinforcement Learning Based Cooperative Multitype Task Offloading Strategy for Internet of Vehicles in B5G/6G Network
CN114866545A (en) Semi-asynchronous layered federal learning method and system based on air calculation
CN111930435B (en) Task unloading decision method based on PD-BPSO technology
CN117409294A (en) Cloud edge end cooperative distributed learning method and system based on self-adaptive communication frequency
CN113672372B (en) Multi-edge collaborative load balancing task scheduling method based on reinforcement learning
CN114785692A (en) Virtual power plant aggregation regulation and control communication network flow balancing method and device
CN114860345B (en) Calculation unloading method based on cache assistance in smart home scene
He et al. Client selection and resource allocation for federated learning in digital-twin-enabled industrial internet of things
Ma Multi-Task Offloading via Graph Neural Networks in Heterogeneous Multi-access Edge Computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination