CN113923605B - Distributed edge learning system and method for industrial internet - Google Patents

Distributed edge learning system and method for industrial internet Download PDF

Info

Publication number
CN113923605B
CN113923605B CN202111240693.8A CN202111240693A CN113923605B CN 113923605 B CN113923605 B CN 113923605B CN 202111240693 A CN202111240693 A CN 202111240693A CN 113923605 B CN113923605 B CN 113923605B
Authority
CN
China
Prior art keywords
model
local
computing device
local model
computing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111240693.8A
Other languages
Chinese (zh)
Other versions
CN113923605A (en
Inventor
江智慧
余官定
袁建涛
刘胜利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202111240693.8A priority Critical patent/CN113923605B/en
Publication of CN113923605A publication Critical patent/CN113923605A/en
Application granted granted Critical
Publication of CN113923605B publication Critical patent/CN113923605B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/06Selective distribution of broadcast services, e.g. multimedia broadcast multicast service [MBMS]; Services to user groups; One-way selective calling services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/16Central resource management; Negotiation of resources or communication parameters, e.g. negotiating bandwidth or QoS [Quality of Service]
    • H04W28/18Negotiating wireless communication parameters
    • H04W28/22Negotiating communication rate
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/70Services for machine-to-machine communication [M2M] or machine type communication [MTC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W72/00Local resource management
    • H04W72/04Wireless resource allocation
    • H04W72/044Wireless resource allocation based on the type of the allocated resource
    • H04W72/0453Resources in frequency domain, e.g. a carrier in FDMA
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The invention discloses a distributed edge learning system and method facing to an industrial internet, which comprises a base station and computing equipment, wherein the two computing equipment transmit data through a D2D link; after training a local model according to local data, the computing equipment shares the local model with all other adjacent computing equipment through broadcasting according to the optimal broadcast data rate and the optimal bandwidth, estimates a global model according to the local model shared by other adjacent computing equipment, and uploads the optimal broadcast data rate and the relevant information of the local model to a base station; and the base station determines the model deviation between the estimated global model and the real global model of each computing device according to the local model related information, the optimal broadcast data rate and the device network information, defines the deviation reduction rate as the model deviation reduction amount of all the computing devices per increasing unit bandwidth, and performs bandwidth allocation by taking the deviation reduction rates of all the computing devices as the same target so as to determine the optimal bandwidth allocated to the computing devices.

Description

Distributed edge learning system and method for industrial internet
Technical Field
The invention relates to the field of artificial intelligence and communication, in particular to a distributed edge learning system and method for industrial internet.
Background
In recent years, machine learning has been considered as a promising driver to inject Artificial Intelligence (AI) into the network edge. As the data generated by massive distributed edge devices has increased explosively, traditional centralized learning algorithms have been replaced by distributed learning algorithms. It has resulted in the emergence of a new computing paradigm, edge learning, that can quickly access distributed data and utilize the computing resources of various edge devices.
In distributed Learning, Federal Learning (FL) and parameter-server training are two major frameworks. They employ a distributed random Gradient Descent (SGD) algorithm and are designed for different scenarios. Since data privacy is critical today, federal learning is widely studied. In the federal learning framework, devices do not need to transmit raw data to a data center. In order to utilize rich data local to the device, in federal learning, the device only needs to upload its local gradient or local model to the center periodically, and then the center aggregates them to obtain a high-quality global model. Unlike federal learning, the purpose of parameter-server training is to solve the large-scale learning problem, and data privacy is not a major concern. In this framework, the model is chunked by the server. Each device downloads a portion of the data from the server and trains only a specified portion of the model. The server then collects the patch models from the devices to update the global model.
Both of the above frameworks are centralized and both rely on a central node to collect model parameters or gradients. However, such an architecture may cause central node congestion. Therefore, the centralized framework is not always suitable in some scenarios, such as scenarios with high requirements on robustness and security of the system. Therefore, decentralized learning frameworks that rely on point-to-point communication replace the centralized framework in these scenarios. But considering the decentralized edge learning framework in wireless systems will encounter some problems. In this scenario, a Device-to-Device (D2D) link is used for data transmission. Due to random channel fading and noise, wireless D2D communication is always unreliable. It cannot be simply assumed that the communication between the devices is perfect and transmission errors of the D2D link need to be taken into account. Transmission errors may cause some devices to fail to receive the local models of all other devices, affecting the quality of the global model, and thus affecting the convergence rate of the model.
This therefore prompted the study of a decentralized edge learning framework based on unreliable D2D communication. How to improve the convergence rate of the framework and to propose a method to realize the framework is an urgent problem to be solved.
Disclosure of Invention
In view of the foregoing, it is an object of the present invention to provide a distributed edge learning system and method for industrial internet. The system and the method reduce the model deviation and improve the convergence rate of the model in a given time delay by jointly optimizing the broadcast data rate and the bandwidth allocation, and moreover, the computing equipment does not need to transmit local original data, so that the privacy and the safety of a user can be well protected.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, an embodiment provides an industrial internet-oriented distributed edge learning system, which includes a base station and a plurality of computing devices, wherein data is transmitted between the computing devices through D2D links;
the computing device is used for local model training and estimating a global model, and comprises: after a local model is trained according to local data, the local model is shared with all other adjacent computing equipment through broadcasting according to the optimal broadcast data rate and the optimal bandwidth, a global model is estimated according to the local model shared by other adjacent computing equipment, and the optimal broadcast data rate and the relevant information of the local model are uploaded to a base station, wherein the optimal broadcast data rate is determined according to the size of the local model, the total time delay of each round of training and the local computation time delay;
the base station is used for network coordination and bandwidth allocation, and comprises: determining the model deviation between the estimated global model and the real global model of each computing device according to the relevant information of the local model, the optimal broadcast data rate and the device network information, defining the deviation reduction rate as the model deviation reduction amount of all the computing devices per increasing unit bandwidth, and performing bandwidth allocation by taking the deviation reduction rates of all the computing devices as the same target to determine the optimal bandwidth among the computing devices.
In one embodiment, when the computing device trains the local model from the local data, the model is updated according to a gradient descent method using the following formula:
Figure BDA0003319415240000031
wherein k represents a calculationDevice index,. l denotes the number of training rounds,. eta (l-1) The learning rate of the l-1 st round is expressed,
Figure BDA0003319415240000032
a local model representing the ith computing device update,
Figure BDA0003319415240000033
a local model representing the l-1 th round of the kth computing device,
Figure BDA0003319415240000034
representing the gradient vector of the k-th computing device.
In one embodiment, the optimal broadcast data rate determined according to the size of the local model, the total time delay of each round of training, and the local computation time delay is expressed as:
Figure BDA0003319415240000035
wherein the content of the first and second substances,
Figure BDA0003319415240000036
represents the optimal broadcast data rate, S represents the local model size, T represents the total delay of a given round of training,
Figure BDA0003319415240000037
and representing local calculation time delay, wherein the total time delay of each round of training comprises two parts, one part is the local calculation time delay, namely the time required by the calculation device to update the local model, and the other part is communication time delay, namely the time required by the calculation device to broadcast and share to the adjacent calculation devices.
In one embodiment, the global model is estimated from local models shared from neighboring other computing devices using the following formula:
Figure BDA0003319415240000041
wherein the content of the first and second substances,
Figure BDA0003319415240000042
a global model representing the ith computing device's round estimate, i and k being indices of the computing devices,
Figure BDA0003319415240000043
local model, α, representing the ith computing device update k,i Is a binary index, when the instantaneous channel capacity from the k-th computing device to the i-th computing device is greater than or equal to the optimal broadcast data rate, the local model of the k-th computing device can be successfully shared with the i-th computing device, and then alpha is k,i 1 is ═ 1; otherwise, α k,i K denotes the number of all computing devices adjacent to the ith computing device.
In one embodiment, the local model related information includes a modulus of the local model of the current round, a modulus of a difference between the local model of the current round and the local model of the previous round, a modulus of a gradient of the previous round, a local loss reduction amount, a size of the local model of the current round, a total time delay of each round of training, and a local computation time delay.
In one embodiment, the determining a model bias between the estimated global model and the true global model for each computing device as a function of the local model-related information, the optimal broadcast data rate, and the device network information is:
Figure BDA0003319415240000044
wherein the content of the first and second substances,
Figure BDA0003319415240000045
represents the optimal broadcast data rate for the k-th computing device, B k Representing the bandwidth, P, allocated to the kth computing device for broadcast k Representing the transmission power, σ, of the k-th computing device k,i Represents the variance of the k-th to i-th computing device link channel power gain, N 0 Representing noise power, K representing the number of computing devices, Delta k,i Is represented byModel bias of the i-th computing device due to unreliable transmission from the k-th computing device to the i-th computing device, A k Local model related information representing the kth computing device includes a modulus of the local model of the current round, a modulus of a difference between the local model of the current round and the local model of the previous round, a modulus of a gradient of the previous round, and a local loss reduction amount.
In one embodiment, when bandwidth allocation is performed with the target of making the deviation reduction rates of all the computing devices the same, the bandwidth allocation algorithm adopted is as follows:
(a) inputting information about the local model, the transmission power P k Variance σ of power gain of link channel k,i Noise power N 0
(b) Initializing lagrange parameter lambda (0) And the number of training rounds l is 0;
(c) using a gradient descent method to obtain values of the Lagrangian parameters of the next round, i.e.
Figure BDA0003319415240000051
Wherein eta is λ Denotes the step size, L 2 ({B k }, λ) represents the corresponding lagrange function;
(d) initializing bandwidth allocation
Figure BDA0003319415240000052
And the number of iterations j is 0;
(e) obtaining bandwidth allocation after a new iteration by using gradient descent method, i.e.
Figure BDA0003319415240000053
Wherein the content of the first and second substances,
Figure BDA0003319415240000054
denotes the step size, Δ k,i Representing a model bias for the ith computing device resulting from unreliable transmission between the kth computing device to the ith computing device;
(f) the number of iterations increases, i.e., j + 1;
(g) repeating the steps (e) to (f) until convergence;
(h) obtaining current optimal bandwidth allocation, wherein l is l + 1;
(i) repeating the steps (c) to (h) until convergence;
(j) and outputting the optimal bandwidth allocation.
In one embodiment, the base station acquires long time scale information of the computing device during network coordination, and performs channel estimation based on the long time scale information.
In a second aspect, an embodiment provides an industrial internet-oriented distributed edge learning method, where the distributed edge learning method employs the distributed edge learning system of the first aspect, and the distributed edge learning method includes the following steps:
step 1, a base station assists a computing device in network coordination;
step 2, the computing equipment feeds back the relevant information of the local model to the base station;
step 3, the base station performs bandwidth allocation based on the received local model related information;
step 4, the computing device updates the local model by using the local data set of the computing device;
and 5, sharing the local model to the adjacent computing equipment by the computing equipment based on the allocated optimal bandwidth and the allocated optimal broadcast data rate, and receiving the local model of the adjacent computing equipment to estimate the global model.
Compared with the prior art, the invention has the beneficial effects that at least:
compared with a central edge learning framework, the decentralized edge learning framework does not need a central node to collect a model and aggregate a global model, so that the congestion of the central node is avoided, and the method is suitable for scenes with high requirements on the robustness and the safety of a system; considering a decentralized edge learning framework based on D2D communication is more consistent with a real system than the hypothetical communication ideal; compared with direct data transmission, the transmission model can fully protect the privacy and the security of user data; the overall deviation of the estimated global model and the real global model can be reduced by jointly optimizing the broadcast data rate and the bandwidth allocation, and the accuracy and the convergence rate of the model are improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic diagram of an industrial Internet-oriented distributed edge learning system provided by an embodiment;
FIG. 2 is a diagram illustrating an iterative process of computing and sharing by a computing device, according to an embodiment;
FIG. 3 is a flow diagram of bandwidth allocation provided by an embodiment;
fig. 4 is a flowchart of a distributed edge learning method according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the detailed description and specific examples, while indicating the scope of the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention.
Example 1
Fig. 1 is a schematic diagram of an industrial internet-oriented distributed edge learning system provided in embodiment 1, and as shown in fig. 1, the distributed edge learning system provided in the embodiment includes a base station and a plurality of computing devices, and is intended to cooperatively train a learning model. Each computing device may be a single antenna device in communication with a base station. To take advantage of the proximity of computing devices and reduce data traffic in the system, data is transmitted between two computing devices and the computing devices over D2D links.
The computing device may be a mobile terminal disposed at an edge terminal, communicating with the base station via mobile communication technology. Of course, the computing device may employ other wireless communication means. In the system, a plurality of computing devices adopt parallel computing to improve learning efficiency, namely, each computing device trains the whole model on local data, specifically comprising the training of the local model and the estimation of the global model.
As shown in fig. 2, each computing device contains two phases in each round of training, the first phase being a computing phase that updates the local model with a subset of local data. Updating the model according to the gradient descent method using the following formula:
Figure BDA0003319415240000071
where k denotes the computing device index, l denotes the number of training rounds, η (l-1) The learning rate of the l-1 st round is expressed,
Figure BDA0003319415240000072
a local model representing the ith computing device update,
Figure BDA0003319415240000073
a local model representing the l-1 th round of the kth computing device,
Figure BDA0003319415240000081
representing the gradient vector of the k-th computing device.
The second phase is a communication phase in which each computing device shares its local model with all neighboring other computing devices through broadcast. On the one hand, each computing device broadcasts its local model to all other computing devices, and on the other hand, each computing device receives a local model shared by the other computing devices for estimating a global model. In an embodiment, the global model is estimated from local models shared from other neighboring computing devices using the following formula:
Figure BDA0003319415240000082
wherein the content of the first and second substances,
Figure BDA0003319415240000083
indicating the ith computing deviceThe global model of the l rounds of estimation, i and k are both indices of the computing device,
Figure BDA0003319415240000084
local model, α, representing the ith computing device update k,i Is a binary index, when the instantaneous channel capacity from the k-th computing device to the i-th computing device is greater than or equal to the optimal broadcast data rate, the local model of the k-th computing device can be successfully shared with the i-th computing device, and then alpha is k,i 1 is ═ 1; otherwise, α k,i K denotes the number of all computing devices adjacent to the ith computing device.
And the two stages are iteratively executed until the model converges to obtain a final global model.
In the second phase, each computing device shares the local model with all neighboring other computing devices via broadcast according to the optimal broadcast data rate and optimal bandwidth. The optimal broadcast data rate is determined according to the size of the local model, the total time delay of each round of training and the local calculation time delay. Preferably, the optimum broadcast data rate may be determined using the following equation:
Figure BDA0003319415240000085
wherein the content of the first and second substances,
Figure BDA0003319415240000086
represents the optimal broadcast data rate, S represents the local model size, T represents the total delay of a given round of training,
Figure BDA0003319415240000087
and expressing local computation time delay, wherein the total time delay of each round of training comprises two parts, one part is the local computation time delay, namely the time required by the computing equipment to update the local model according to the self learning capability and the size of the local data set, and the other part is the communication time delay, namely the time required by the computing equipment to broadcast and share the local model to the adjacent computing equipment according to the size of the model and the broadcast data rate of the computing equipment.
After one round of training is carried out by each computing device, the relevant information of the local model comprising the modulus of the local model of the current round, the modulus of the difference between the local model of the current round and the local model of the previous round, the modulus of the gradient of the previous round, the local loss reduction amount, the size of the local model of the current round, the total time delay of each round of training and the local calculation time delay is uploaded to the base station, so that the base station can carry out bandwidth allocation. Because the optimal broadcast data rate is obtained by calculating the size of the local model of the current round, the total time delay of each round of training and the local calculation time delay, the relevant information of the local model containing the current round local model, the model of the difference between the current round local model and the previous round local model, the model of the previous round gradient, the local loss reduction and the optimal broadcast data rate can be uploaded to the base station.
In the embodiment, the base station is not a server for collecting the model parameters or the gradient parameters, and mainly performs network coordination of the device, that is, mainly performs channel estimation. Since D2D channel estimation is expensive, the instantaneous channel state information of all D2D links is unknown. The path loss of all links depends mainly on stable information such as position, and the change is slow and known. To reduce the signaling overhead of the D2D channel estimation, the base station obtains long time scale information, such as link distance between devices, instead of instantaneous channel state information for all D2D links, and performs channel estimation based on the long time scale information.
In this embodiment, the ultimate goal is to reduce the model bias caused by unreliable transmission by jointly optimizing the broadcast data rate and bandwidth allocation under the constraint of given bandwidth and delay, so as to improve the convergence rate of model training. Therefore, the base station is further configured to allocate a bandwidth, which specifically includes: determining the model deviation between the estimated global model and the real global model of each computing device according to the relevant information of the local model, the optimal broadcast data rate and the device network information, defining the deviation reduction rate as the model deviation reduction amount of all the computing devices per increasing unit bandwidth, and performing bandwidth allocation by taking the deviation reduction rates of all the computing devices as the same target to determine the optimal bandwidth among the computing devices.
In an embodiment, the absence of instantaneous channel state information results in unreliable transmission of data, and therefore the global model estimated locally by the computing devices may differ from the true global model, and to this end, the model bias between the estimated global model and the true global model for each computing device is determined, which may be determined by the following formula:
Figure BDA0003319415240000101
wherein the content of the first and second substances,
Figure BDA0003319415240000102
represents the optimal broadcast data rate for the k-th computing device, B k Representing the bandwidth, P, allocated to the kth computing device for broadcast k Representing the transmission power, σ, of the k-th computing device k,i Represents the variance of the k-th to i-th computing device link channel power gain, N 0 Representing noise power, K representing the number of computing devices, Delta k,i Representing the model deviation of the i-th computing device resulting from unreliable transmission from the k-th computing device to the i-th computing device, A k Local model related information representing the kth computing device includes a modulus of the local model of the current round, a modulus of a difference between the local model of the current round and the local model of the previous round, a modulus of a gradient of the previous round, and a local loss reduction amount.
In an embodiment, the bandwidth allocation is targeted such that the rate of droop reduction is the same for all computing devices, where the rate of droop reduction is defined as the amount of model droop reduction for all devices per unit bandwidth increase. As shown in fig. 3, the bandwidth allocation algorithm adopted is:
(a) inputting information about the local model, the transmission power P k Variance σ of power gain of link channel k,i Noise power N 0 Wherein, the local model related information includes a module of the local model of the current round, a module of the difference between the local model of the current round and the local model of the previous round, a module of the gradient of the previous round, a local loss reduction amount, the size S of the local model of the current round, and the size S of the local model of each roundTotal time delay T, local calculation time delay
Figure BDA0003319415240000103
(b) Initializing lagrange parameter lambda (0) And the number of training rounds l is 0;
(c) using a gradient descent method to obtain values of the Lagrangian parameters of the next round, i.e.
Figure BDA0003319415240000104
Wherein eta is λ Denotes the step size, L 2 ({B k }, λ) represents the corresponding lagrange function;
(d) initializing bandwidth allocation
Figure BDA0003319415240000111
And the number of iterations j is 0;
(e) obtaining bandwidth allocation after a new iteration by using gradient descent method, i.e.
Figure BDA0003319415240000112
Wherein the content of the first and second substances,
Figure BDA0003319415240000113
denotes the step size, Δ k,i Representing a model bias for the ith computing device resulting from unreliable transmission between the kth computing device to the ith computing device;
(f) the number of iterations increases, i.e., j + 1;
(g) repeating the steps (e) to (f) until convergence;
(h) obtaining current optimal bandwidth allocation, wherein l is l + 1;
(i) repeating the steps (c) to (h) until convergence;
(j) and outputting the optimal bandwidth allocation.
Compared with a central edge learning frame, the distributed edge learning system provided in embodiment 1 does not need a central node to collect models and aggregate global models by a decentralized edge learning frame, avoids congestion of the central node, and is suitable for scenes with high requirements on robustness and safety of the system; considering a decentralized edge learning framework based on D2D communication is more consistent with a real system than the hypothetical communication ideal; compared with direct data transmission, the transmission model can fully protect the privacy and the security of user data; the overall deviation of the estimated global model and the real global model can be reduced by jointly optimizing the broadcast data rate and the bandwidth allocation, and the accuracy and the convergence rate of the model are improved while the training time delay is ensured.
Example 2
Fig. 4 is a flowchart of a distributed edge learning method according to an embodiment. The distributed edge learning method adopts the distributed edge learning system provided by embodiment 1, and the distributed edge learning method comprises the following steps:
step 1, a base station assists a computing device in network coordination;
step 2, the computing equipment feeds back the relevant information of the local model to the base station;
step 3, the base station performs bandwidth allocation based on the received local model related information;
step 4, the computing device updates the local model by using the local data set of the computing device;
and 5, sharing the local model to the adjacent computing equipment by the computing equipment based on the allocated optimal bandwidth and the allocated optimal broadcast data rate, and receiving the local model of the adjacent computing equipment to estimate the global model.
According to the distributed edge learning method, the convergence rate of the model is improved within the specified learning time by adjusting the broadcast data rate and bandwidth allocation, and the computing equipment at the edge end does not need to transmit original data, so that the privacy and the safety of a user can be better protected.
In embodiments 1 and 2, the wireless communication method may be an existing mobile communication network, that is, an LTE (Long-term Evolution) or 5G network, or a WiFi network. The computing equipment can be a mobile terminal which can support model training, such as a modern smart phone, a tablet computer, a notebook computer, an automatic driving automobile and the like, is provided with a wireless communication system, and can be accessed to mainstream wireless communication networks such as a mobile communication network and WiFi.
The above-mentioned embodiments are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions, equivalents, etc. made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (3)

1. The distributed edge learning system facing the industrial Internet is characterized by comprising a base station and a plurality of computing devices, wherein data are transmitted between the computing devices through D2D links;
the computing device is used for local model training and estimating a global model, and comprises: after a local model is trained according to local data, the local model is shared with all other adjacent computing equipment through broadcasting according to the optimal broadcast data rate and the optimal bandwidth, a global model is estimated according to the local model shared by other adjacent computing equipment, and the optimal broadcast data rate and the relevant information of the local model are uploaded to a base station, wherein the optimal broadcast data rate is determined according to the size of the local model, the total time delay of each round of training and the local computation time delay;
the base station is used for network coordination and bandwidth allocation, and comprises: determining a model deviation between an estimated global model and a real global model of each computing device according to the relevant information of the local model, the optimal broadcast data rate and the device network information, defining a deviation reduction rate as the model deviation reduction amount of all computing devices per increasing unit bandwidth, and performing bandwidth allocation by taking the deviation reduction rates of all computing devices as the same target to determine the optimal bandwidth allocated to the computing devices;
when the computing equipment trains the local model according to the local data, the model is updated according to a gradient descent method by adopting the following formula:
Figure FDA0003637569470000011
where k denotes the computing device index, l denotes the number of training rounds, η (l-1) The learning rate of the l-1 st round is expressed,
Figure FDA0003637569470000012
a local model representing the ith computing device update,
Figure FDA0003637569470000013
a local model representing the l-1 th round of the kth computing device,
Figure FDA0003637569470000014
a gradient vector representing a kth computing device;
the optimal broadcast data rate determined according to the size of the local model, the total time delay of each round of training and the local calculation time delay is represented as follows:
Figure FDA0003637569470000021
wherein the content of the first and second substances,
Figure FDA0003637569470000022
represents the optimal broadcast data rate, S represents the local model size, T represents the total delay of a given round of training,
Figure FDA0003637569470000023
the method comprises the steps of representing local calculation time delay, wherein the total time delay of each round of training comprises two parts, one part is the local calculation time delay, and the time required by the calculation equipment for updating a local model is calculated; the other part is communication delay, and the computing equipment broadcasts and shares the time required by the adjacent computing equipment;
wherein the global model is estimated from local models shared from other neighboring computing devices using the following formula:
Figure FDA0003637569470000024
wherein the content of the first and second substances,
Figure FDA0003637569470000025
a global model representing the ith computing device's round estimate, i and k being indices of the computing devices,
Figure FDA0003637569470000026
local model, α, representing the ith computing device update k,i Is a binary index, when the instantaneous channel capacity from the k-th computing device to the i-th computing device is greater than or equal to the optimal broadcast data rate, the local model of the k-th computing device can be successfully shared with the i-th computing device, and then alpha is k,i 1 is ═ 1; otherwise, α k,i K represents the number of all computing devices adjacent to the ith computing device;
the relevant information of the local model comprises a module of a local model of the current round, a module of the difference between the local model of the current round and the local model of the previous round, a module of the gradient of the previous round, the local loss reduction, the size of the local model of the current round, the total time delay of each round of training and the local calculation time delay;
wherein the determining of the model bias between the estimated global model and the true global model for each computing device based on the local model related information, the optimal broadcast data rate, and the device network information is:
Figure FDA0003637569470000027
wherein the content of the first and second substances,
Figure FDA0003637569470000031
represents the optimal broadcast data rate for the k-th computing device, B k Representing the bandwidth, P, allocated to the kth computing device for broadcast k Representing the transmission power, σ, of the k-th computing device k,i Represents the variance of the k-th to i-th computing device link channel power gain, N 0 Representing the noise power, K representing the calculation deviceNumber of standby, Δ k,i Representing the model deviation of the i-th computing device resulting from unreliable transmission from the k-th computing device to the i-th computing device, A k Local model related information representing the kth computing device comprises a modulus of a local model of the current round, a modulus of a difference between the local model of the current round and the local model of the previous round, a modulus of a gradient of the previous round, and a local loss reduction amount;
wherein, when the bandwidth allocation is performed by taking the deviation reduction rates of all the computing devices as the same target, the adopted bandwidth allocation algorithm is as follows:
(a) inputting information about the local model, the transmission power P k Variance σ of power gain of link channel k,i Noise power N 0
(b) Initializing lagrange parameter lambda (0) And the number of training rounds l is 0;
(c) obtaining the value of the Lagrangian parameter of the next round by adopting a gradient descent method,
Figure FDA0003637569470000032
wherein eta is λ Denotes the step size, L 2 ({B k }, λ) represents the corresponding lagrange function;
(d) initializing bandwidth allocation
Figure FDA0003637569470000033
And the number of iterations j is 0;
(e) the bandwidth allocation after a new iteration is obtained by adopting a gradient descent method,
Figure FDA0003637569470000034
wherein eta is Bk Denotes the step size, Δ k,i Representing a model bias for the ith computing device resulting from unreliable transmission between the kth computing device to the ith computing device;
(f) the iteration number is increased, j equals j + 1;
(g) repeating the steps (e) to (f) until convergence;
(h) obtaining current optimal bandwidth allocation, wherein l is l + 1;
(i) repeating the steps (c) to (h) until convergence;
(j) and outputting the optimal bandwidth allocation.
2. The industrial internet-oriented distributed edge learning system of claim 1, wherein the base station obtains long time scale information of the computing device during network coordination, and performs channel estimation based on the long time scale information.
3. An industrial internet-oriented distributed edge learning method, which employs the distributed edge learning system of claim 1 or 2, and includes the steps of:
step 1, a base station assists a computing device in network coordination;
step 2, the computing equipment feeds back the relevant information of the local model to the base station;
step 3, the base station performs bandwidth allocation based on the received local model related information;
step 4, the computing device updates the local model by using the local data set of the computing device;
and 5, sharing the local model to the adjacent computing equipment by the computing equipment based on the allocated optimal bandwidth and the allocated optimal broadcast data rate, and receiving the local model of the adjacent computing equipment to estimate the global model.
CN202111240693.8A 2021-10-25 2021-10-25 Distributed edge learning system and method for industrial internet Active CN113923605B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111240693.8A CN113923605B (en) 2021-10-25 2021-10-25 Distributed edge learning system and method for industrial internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111240693.8A CN113923605B (en) 2021-10-25 2021-10-25 Distributed edge learning system and method for industrial internet

Publications (2)

Publication Number Publication Date
CN113923605A CN113923605A (en) 2022-01-11
CN113923605B true CN113923605B (en) 2022-08-09

Family

ID=79242656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111240693.8A Active CN113923605B (en) 2021-10-25 2021-10-25 Distributed edge learning system and method for industrial internet

Country Status (1)

Country Link
CN (1) CN113923605B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109862610B (en) * 2019-01-08 2020-07-10 华中科技大学 D2D user resource allocation method based on deep reinforcement learning DDPG algorithm
CN110267338B (en) * 2019-07-08 2020-05-22 西安电子科技大学 Joint resource allocation and power control method in D2D communication
CN111176929B (en) * 2019-11-29 2023-04-18 广东工业大学 Edge federal learning-oriented high-energy-efficiency calculation communication joint optimization method
CN111401552B (en) * 2020-03-11 2023-04-07 浙江大学 Federal learning method and system based on batch size adjustment and gradient compression rate adjustment
CN113139662B (en) * 2021-04-23 2023-07-14 深圳市大数据研究院 Global and local gradient processing method, device, equipment and medium for federal learning

Also Published As

Publication number Publication date
CN113923605A (en) 2022-01-11

Similar Documents

Publication Publication Date Title
Dhakal et al. Coded federated learning
Kamoun et al. Joint resource allocation and offloading strategies in cloud enabled cellular networks
WO2021123139A1 (en) Systems and methods for enhanced feedback for cascaded federated machine learning
CN109951869B (en) Internet of vehicles resource allocation method based on cloud and mist mixed calculation
CN110968426B (en) Edge cloud collaborative k-means clustering model optimization method based on online learning
US11831708B2 (en) Distributed computation offloading method based on computation-network collaboration in stochastic network
CN106231622B (en) A kind of content storage method based on buffer memory capacity limitation
CN110941675A (en) Wireless energy supply edge calculation delay optimization method based on deep learning
CN112291747A (en) Network congestion control method and device, electronic equipment and storage medium
Han et al. Cache placement optimization in mobile edge computing networks with unaware environment—an extended multi-armed bandit approach
CN114118748B (en) Service quality prediction method and device, electronic equipment and storage medium
CN115119233A (en) Clustered wireless communication method and system
CN113923605B (en) Distributed edge learning system and method for industrial internet
CN113993168A (en) Multi-agent reinforcement learning-based cooperative caching method in fog wireless access network
Cheng et al. Mobile big data based network intelligence
CN113747450A (en) Service deployment method and device in mobile network and electronic equipment
WO2011087935A2 (en) Method of controlling resource usage in communication systems
CN110392377B (en) 5G ultra-dense networking resource allocation method and device
CN109041009B (en) Internet of vehicles uplink power distribution method and device
Fu Optimization of Caching Update and Pricing Algorithm Based on Stochastic Geometry Theory in Video Service
US20230188430A1 (en) First network node and method performed therein for handling data in a communication network
CN112887937B (en) Terminal equipment video information cooperative uploading method based on roadside equipment assistance
TWI607665B (en) Method for assigning backhaul links in cooperative wireless network
Tian et al. Hierarchical federated learning with adaptive clustering on non-IID data
CN114157660A (en) Data transmission method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant