CN115767758A - Equipment scheduling method based on combination of channel and local model update - Google Patents
Equipment scheduling method based on combination of channel and local model update Download PDFInfo
- Publication number
- CN115767758A CN115767758A CN202211422803.7A CN202211422803A CN115767758A CN 115767758 A CN115767758 A CN 115767758A CN 202211422803 A CN202211422803 A CN 202211422803A CN 115767758 A CN115767758 A CN 115767758A
- Authority
- CN
- China
- Prior art keywords
- local
- training
- round
- edge mobile
- gradient
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Mobile Radio Communication Systems (AREA)
Abstract
The invention discloses a device scheduling method based on combination of channel and local model updating, which relates to the field of federal learning and the field of user scheduling, wherein the communication efficiency is improved by adopting aerial calculation during local model aggregation, aggregation errors generated in the aerial calculation process are reduced by optimizing a receiving end beam forming vector, and then a device scheduling method considering channel and local model updating simultaneously is provided.
Description
Technical Field
The invention relates to the field of wireless communication, in particular to a device scheduling method based on combination of channel and local model updating.
Background
In recent years, as the breakthrough and the improvement of the computational power level of the machine learning technology and the explosive increase of the data volume, the artificial intelligence applications such as automatic driving and virtual reality are increasingly realized. A typical machine learning technology usually performs model training in a centralized processing manner, that is, original data generated by an intelligent mobile device is directly sent to a centralized cloud data center, a large amount of data transmission necessarily causes network congestion, which results in higher delay, and the problem of privacy disclosure is also caused by transmission of the original data. On the other hand, the computing power of edge mobile devices such as base stations, mobile phones and tablet computers is increasingly advanced, so that the computation at the edge of the network is possible, and based on the calculation, a federal learning training framework is proposed. In federal learning, only the trained model or model gradient is transmitted, and data related to privacy is also retained at the edge mobile device, so that the functions of protecting privacy and saving communication resources can be achieved.
To further improve the utilization of communication spectrum resources during federal learning, people combine over-the-air computation with federal learning. The air calculation is to use the superposition characteristic of the channel to complete the gradient aggregation in the Federal learning training process, and because part of calculation is completed in the transmission process, the time delay can be reduced.
User scheduling in federal learning is a research hotspot. There are usually a large number of edge devices connected to the parameter server, but only a portion of the edge devices interact with the parameter server during each training round, taking into account communication load and device energy consumption. In the federal learning based on air calculation, two conditions of channel and local model updating are generally considered when users are scheduled, and the traditional method for combining the channel and the local model updating is to firstly select equipment according to channel gain, perform local training on the selected equipment, and then schedule a part of less equipment according to local training results, which can lead to the fact that part of equipment is subjected to local training but not selected, and waste the energy of the equipment.
Disclosure of Invention
In view of this, the present invention provides a device scheduling method based on combination of channel and local model update, which aims at the federate learning system based on over-the-air computation and improves the model training effect by optimizing the scheduling of the edge mobile device.
In order to achieve the purpose, the invention adopts the following technical scheme:
step 1: constructing a federated learning system
In an edge intelligent scenario, there are K edge mobile devices with single antenna, denoted as K = {1,2, \ 8230;, K } and a parameter server with M antennas, and each edge mobile device K ∈ K has a local data set D k ,|D k | represents a data set D k The global model omega is trained by a parameter server and the edge mobile equipment together in the federal learning, the federal learning training process is a cyclic process, each cycle is called a training round, each training round obtains a new global model, and the omega is used t Represents the global model, ω, obtained for the t-th training run 0 Representing an initial global model, omega, not trained by federal learning t-1 Representing a global model obtained in the last training round to form a federal learning system;
and 2, step: updating two parameters by combining a channel and a local model, and scheduling the edge mobile equipment participating in training by a parameter server;
and step 3: the parameter server obtains the global model omega of the last cycle t-1 Sending the data to all the scheduled edge mobile devices;
and 4, step 4: the scheduled edge mobile equipment adopts a random gradient descent algorithm to carry out local training to obtain respective local gradients;
and 5: the scheduled edge mobile equipment uploads the obtained local gradient to a parameter server, and the global model is updated to obtain omega t The process of uploading the local gradient adopts aerial calculation, and the aerial calculation is optimized;
step 2-5 is executed in a loop until the global model omega t And (6) converging.
Further, the step 2 specifically includes:
and (3) channel parameters: parameter service in the t round of trainingWith different channel gain vectors h between the device and each edge mobile k k,t (ii) a Each edge mobile device sends a pilot frequency sequence to a parameter server, and the parameter server estimates a channel gain vector h between each edge mobile device and the parameter server according to the received pilot frequency sequence k,t (ii) a Using a channel gain vector h k,t L of 2 Norm ofRepresenting channel parameters between the parameter server and the edge mobile device k, whereinRepresentative vector h k,t The absolute value of the ith component is the channel gain vector h since the parameter server has a total of M antennas k,t A total of M components;
local model update parameters: in the t round of training, the edge mobile device k performs local training to obtain a local gradient g k,t Using local gradients g k,t L of 2 Norm g k,t The | | represents a local model update parameter of the edge mobile device k; g | | k,t The larger the value of | | is, the larger the effect of the local training result of the equipment k on the federal learning training is improved;
the edge mobile device k does not carry out local training before being scheduled, and uses the result g of the local training again when being scheduled k,t Calculate g k,t L of 2 Norm g k,t I guides the scheduling, so I of local gradient obtained by local training is needed 2 Norm g k,t Estimating | l;
combining the channel with the local model updating to decide the scheduling priority of the equipment, wherein the scheduling priority of the equipment is defined as follows:
I k,t =c||h k,t ||+(1-c)||g k,t || (1)
wherein I k,t Represents the scheduling priority of the edge mobile device k in the tth round of training, c ∈ [0,1]Is a hyper-parameter for controlling two schedulesThe impact weight of the parameter;
scheduling priority I for all edge mobiles k k,t Performing descending order, and under the condition that the number of the scheduling devices is fixed to N, 0<N<K, selecting the first N I k,t The largest equipment participates in the tth round of federal learning training; the set of edge mobile devices scheduled in the t-th training round are:
further, the local gradient l obtained by the local training 2 Norm g k,t Estimating, specifically:
since the local gradients calculated by the edge shifting device in each training round have a strong temporal correlation, the gradient l in the last training round is used 2 Norm to estimate the gradient l of the current training round 2 Norm, specifically, local gradient l obtained by estimation in the t-th training round 2 Norm is:
wherein t is k Representing the last dispatched federal learning training round of the device k;
all edge mobile devices are required to participate in training in the first round of federal learning training round, and local gradients of all edge mobile devices are uploaded, so that the parameter server can perform local gradient l of subsequent training rounds 2 Estimating the norm; when edge mobile k is scheduled in the t-th round of training, the parameter server uses the local gradient g after local training k,t Updating
Further, the step 4 specifically includes:
the method comprises the following steps that local training is carried out on scheduled edge mobile equipment by adopting a random gradient descent algorithm, a loss function is firstly constructed by adopting the random gradient descent algorithm, and then a local gradient is calculated by utilizing the loss function;
global loss function of the system ofWhereinSample capacity, F, representing the entire Federal learning System k (ω t-1 ) Is the global model omega obtained from the last cycle t-1 A local loss function on edge mobile k, the global loss function being a weighted average of the local loss functions for all edge mobile k; local loss function on edge shifting device kWherein (x) k ,y k ) Representing data samples on device k, f (x) k ,y k ;ω t-1 ) Is a single sample (x) k ,y k ) Corresponding loss function, f (x) k ,y k ;ω t-1 ) Measure the global model omega t-1 For sample (x) k ,y k ) The matching performance of (2);
constructing a formula for local training of the edge mobile equipment k in the t round of training to obtain a local gradient:
wherein ^ f (x) k ,y k ;ω t-1 ) Representing the derivative of the loss function, L k,t ∈D k Is from data set D in the t-th training round k In a small sample set, L, obtained by random selection b =|L k,t Is the small batch sample set L k,t The number of samples in (1); g k,t Is the local gradient obtained by local training.
Further, the step 5 specifically includes:
local gradient g to be obtained by scheduled edge mobile devices k,t Uploading to a parameter server, updating the global model, firstly constructing a formula for updating the global model by the parameter server:
wherein eta t Is the learning rate, | S, in the t-th training round t I is the number of centralized equipment of the dispatching equipment;
the process of uploading the gradient adopts air calculation, wherein the air calculation refers to that the aggregation of the local gradient is realized in the process of local gradient signal transmission by utilizing the superposition characteristic of a channel, namely the formula (5) is completed in the process of local gradient signal transmissionCalculating a part; the aerial computing is implemented as follows:
in the t-th round of training, the scheduled edge mobiles upload the calculated local gradient g simultaneously k,t All the sent local gradients are aggregated in the air, and the aggregate signal received by the parameter server is:
wherein p is k,t Is the transmitter scalar for device k in the t-th training round; n is the mean 0 and the variance σ 2 A gaussian white noise vector of;
the parameter server endows the received aggregation signal with a beam forming vector, and the aggregation signal processed by the parameter server is as follows:
where m is the receiver beamforming vector,the superscript H represents the transpose,namely, the local gradient signal aggregation is completed through air calculation;
the aerial calculation realizes the aggregation of local gradients, which is influenced by channel fading and noise, so that the aerial calculation is optimized;
under an ideal channel without channel fading and noise interference, the ideal aggregate signal is:
the error between the ideal aggregate signal and the actual aggregate signal is expressed in terms of a mean square error, which is expressed as follows:
where E is the mathematically expected sign;
in order to reduce the influence of channel gain and noise in the air calculation process and improve the performance of air calculation, a transceiver needs to be designed according to a mean square error minimization criterion, and the mean square error between an ideal aggregate signal and an actual aggregate signal is minimized.
Further, the designing the transceiver according to the mean square error minimization criterion to minimize the mean square error between the ideal aggregate signal and the actual aggregate signal specifically includes:
designing a transceiver refers to determining a transmitter scalar p k,t And a receiver beamforming vector m;
the transmitter scalar is designed to:
where μ is the transmit power control factor, | p k,t | 2 ≤P 0 ,P 0 Is the maximum transmit power, the symbol | | | | | non-calculation 2 Is to find l of its intermediate vector 2 The square of the norm;
the transmission power control factor is designed as follows:
substituting the formula (10) and the formula (11) into the formula (7), the actual aggregate signal is simplified as follows:
the mean square error between the actual aggregate signal and the ideal aggregate signal is further expressed as:
an optimization problem is constructed with the aim of minimizing the mean square error:
introducing a virtual variableThe optimization problem formula (14) is converted into the following form
and (3) solving the optimization problem in the formula (16) to obtain a receiver beam forming vector m.
Further, solving the optimization problem in the formula (16) specifically includes:
solving an initial solution of the problem by using a semi-definite relaxation method SDR, and optimizing the initial solution by using a sequential convex approximation algorithm SCA; the SDR method solves the initial solution as follows:
let A = mm H ,A * =min A tr (A), where tr (A) represents the trace of matrix A, λ 1 Is A * Maximum eigenvalue, u 1 Is λ 1 Corresponding feature vectors;
the specific steps of optimizing the initial solution by the SCA method are as follows:
in the optimization problem of equation (16), the non-convex constraint is | | | m H h k,t || 2 ≥1,Introducing an auxiliary variable c k,t =[Re(m H h k,t ),Im(m H h k,t )]Equation (16) converts to:
the non-convex constraint in equation (17) is c k,t || 2 ≥1,Use of c k,t || 2 ≥||c k,t (l) || 2 +2(c k,t (l) ) T (c k,t -c k,t (l) )≥1,Carrying out convex approximation on the non-convex limit by iterative relaxation linear constraint; wherein c is k,t (l) Is the solution after this iterative optimization,
replacing the non-convex constraint in equation (17) with the convex constraint, equation (17) is rewritten as:
order toIteratively solving equation (18) untilWherein epsilon represents the solving precision, and the optimal solution obtained correspondingly is the receiver beam forming vector m.
The beneficial effects of the invention are:
the invention provides a device scheduling method based on combination of channel and local model updating, which is characterized in that when edge mobile devices in a Federation learning system are scheduled, local model updating parameters are estimated by using a gradient estimation method before local training, channel related parameters and local model updating related parameters are added by distributing different weights, and then the addition results are compared to schedule the edge mobile devices. Compared with the traditional scheme that equipment selection is performed according to channel gain, the selected equipment performs local training, and then a part of less equipment is scheduled according to a local training result, the scheme can select the edge mobile equipment to be scheduled through only one scheduling step, reduce unnecessary local training and save the energy consumption of the edge mobile equipment.
Drawings
FIG. 1 is a schematic diagram of a Federal learning System model of the present invention;
FIG. 2 is a flow chart of an embodiment of the present invention;
fig. 3 is a flow chart of a specific scheduling scheme in the present invention.
Detailed Description
In order to better understand the purpose, structure and function of the present invention, the following describes the device scheduling method based on the combination of channel and local model update in detail with reference to the accompanying drawings.
The invention specifically comprises the following steps:
step 1: constructing a federated learning system
Referring to fig. 1, in an edge intelligent scenario, an edge mobile device with K single antennas, denoted as K = {1,2, \8230;, K } and a parameter server with M antennas, where each edge mobile device K ∈ K has a local data set D k ,|D k | represents a data set D k The global model omega is trained by a parameter server and the edge mobile equipment together in the federal learning, the federal learning training process is a cyclic process, each cycle is called a training round, each training round obtains a new global model, and the omega is used t Represents the global model, ω, obtained for the t-th training run 0 Representing an initial global model, omega, not trained by federal learning t-1 Representing a global model obtained in the last training round to form a federal learning system;
and 2, step: updating two parameters by combining the channel and the local model, and scheduling the edge mobile equipment participating in training by the parameter server
And (3) channel parameters: in the t-th training round, there are different channel gain vectors h between the parameter server and each edge mobile k k,t (ii) a Each edge mobile device sends a pilot frequency sequence to a parameter server, and the parameter server estimates a channel gain vector h between each edge mobile device and the parameter server according to the received pilot frequency sequence k,t (ii) a Using a channel gain vector h k,t L of 2 Norm ofRepresenting channel parameters between the parameter server and the edge mobile device k, whereinRepresentative vector h k,t Absolute value of ith component, since the parameter server has M antennas in common, the channel gain vector h k,t A total of M components;
local model update parameters: in the t round of training, the edge mobile device k performs local training to obtain a local gradient g k,t Using local gradients g k,t L of 2 Norm g k,t | | represents a local model update parameter of the edge mobile device k; g | | k,t The larger the value of | is, the larger the effect improvement of the local training result of the equipment k on the federal learning training is;
the edge mobile device k does not carry out local training before being scheduled, and uses the result g of the local training again when being scheduled k,t Calculate g k,t L of 2 Norm g k,t I guides the scheduling, so I of local gradient obtained by local training is needed 2 Norm g k,t Estimating | l;
combining the channel with the local model updating to decide the scheduling priority of the equipment, wherein the scheduling priority of the equipment is defined as follows:
I k,t =c||h k,t ||+(1-c)||g k,t || (1)
wherein I k,t Representing the scheduling priority of edge mobile k in the t-th training round, c ∈ [0,1 [ ]]Is a hyper-parameter used to control the impact weight of two scheduling parameters;
scheduling priority I for all edge mobiles k k,t Performing descending order, and under the condition that the number of the scheduling devices is fixed to N, 0<N<K, selecting the first N I k,t The largest equipment participates in the tth round of federal learning training; the set of edge mobiles scheduled in the t round of training is:
the specific scheduling scheme of the edge mobile device is shown in fig. 3;
local gradient of local training 2 Norm g k,t The method for estimating | l is as follows: since the local gradients calculated by the edge shifting device in each training round are strongly time-dependent, the gradient l in the last training round is used 2 Norm to estimate the gradient l of the current training round 2 Norm, specifically, local gradient l obtained by estimation in the t-th training round 2 The norm is:
wherein t is k Representing the last scheduled federal learning training round of the equipment k;
all the edge mobile devices are required to participate in training in the first round of federal learning training round, and local gradients g of the first round of training of all the edge mobile devices are uploaded k,1 So that the parameter server can adjust the gradient l of the subsequent training turns 2 Estimating the norm; when edge mobile k is scheduled in the tth round of training, the parameter server uses the local gradient g after local training k,t Updating
And step 3: the parameter server obtains the global model omega of the last circulation t-1 Is sent to all the scheduled edge mobile equipment
The parameter server obtains the global model omega of the last cycle through a wireless channel t-1 Issued to all scheduled edge mobiles, assuming a global model ω, since the parameter server has sufficient energy and bandwidth compared to the edge mobiles t-1 Is issued ofThe process is an error-free transmission process.
And 4, step 4: the dispatched edge mobile equipment adopts a random gradient descent algorithm to carry out local training to obtain respective local gradients
Firstly constructing a loss function by adopting a random gradient descent algorithm, and then calculating a local gradient by using the loss function;
the scheduled edge mobile device receives the global model omega obtained from the previous training round t-1 Constructing a global loss function for the entire federated learning system asWhereinSample Capacity representing the entire Federal learning System, F k (ω t-1 ) Is the global model omega obtained from the last cycle t-1 A local loss function on edge mobile k, the global loss function being a weighted average of the local loss functions for all edge mobile k; local loss function on edge shifting device kWherein (x) k ,y k ) Representing data samples on device k, f (x) k ,y k ;ω t-1 ) Is a single sample (x) k ,y k ) Corresponding loss function, f (x) k ,y k ;ω t-1 ) Measures the global model omega t-1 For sample (x) k ,y k ) The matching performance of (2);
constructing a formula for local training of the edge mobile equipment k in the t round of training to obtain a local gradient:
wherein ^ f (x) k ,y k ;ω t-1 ) Representing the derivative of the loss function, L k,t ∈D k Is from data set D in the t-th training round k In a small sample set, L, obtained by random selection b =|L k,t I is a small batch sample set L k,t The number of samples in (1); g k,t Is the local gradient obtained by local training.
And 5: the scheduled edge mobile equipment uploads the obtained local gradient to a parameter server, and the global model is updated to obtain omega t The process of uploading the local gradient adopts aerial calculation, and the aerial calculation is optimized;
local gradient g to be obtained by scheduled edge mobile devices k,t Uploading to a parameter server, updating the global model, and firstly establishing a formula for updating the global model by the parameter server:
wherein eta t Is the learning rate, | S, in the t-th training round t I is the number of centralized equipment of the dispatching equipment;
the process of uploading the gradient adopts air calculation, wherein the air calculation means that the aggregation of the local gradient is realized in the process of transmitting the local gradient signals by utilizing the superposition characteristic of a channel, namely the formula (5) is completed when the local gradient signals are transmittedCalculating a part; the specific implementation process of the over-the-air calculation is as follows:
in the t-th round of training, the scheduled edge mobiles upload the calculated local gradient g simultaneously k,t All the sent local gradients are aggregated in the air, and the aggregate signal received by the parameter server is:
wherein p is k,t Is the transmitter scalar for device k in the t-th round of training; n is the mean valueIs 0 and variance is σ 2 A gaussian white noise vector of;
the parameter server endows the received aggregation signal with a beam forming vector, and the aggregation signal processed by the parameter server is as follows:
where m is the receiver beamforming vector, the superscript H stands for transpose,namely local gradient signal aggregation completed by aerial calculation;
the aerial calculation realizes the aggregation of local gradients, which is influenced by channel fading and noise, so that the aerial calculation is optimized;
under an ideal channel without channel fading and noise interference, the ideal aggregate signal is:
the error between the ideal aggregate signal and the actual aggregate signal is expressed in terms of the mean square error, which is expressed as follows:
where E is the mathematically expected sign;
in order to reduce the influence of channel gain and noise in the air calculation process and improve the performance of air calculation, a transceiver needs to be designed according to a mean square error minimization criterion, so as to minimize the mean square error between an ideal aggregate signal and an actual aggregate signal, which specifically includes:
designing a transceiver refers to determining a transmitter scalar p k,t And a receiver beamforming vector m;
the transmitter scalar is designed to:
where μ is the transmit power control factor, | p k,t | 2 ≤P 0 ,P 0 Is the maximum transmission power, symbol | calving 2 Is to find l of its intermediate vector 2 The square of the norm;
the transmission power control factor is designed as follows:
substituting the formula (10) and the formula (11) into the formula (7), the actual aggregate signal is simplified as follows:
the mean square error between the actual aggregate signal and the ideal aggregate signal is further expressed as:
an optimization problem is constructed with the aim of minimizing the mean square error:
introducing a virtual variableThe optimization problem formula (14) is converted into the following form
solving the optimization problem in the formula (16) to obtain a receiver beamforming vector m, specifically including:
solving an initial solution of the problem by using a semi-definite relaxation method SDR, and optimizing the initial solution by using a sequential convex approximation algorithm SCA;
the SDR method solves the initial solution as follows:
let A = mm H ,A * =min A tr (A), where tr (A) represents the trace of matrix A, λ 1 Is A * Maximum eigenvalue, u 1 Is λ 1 A corresponding feature vector;
the specific steps of optimizing the initial solution by the SCA method are as follows:
in the optimization problem of equation (16), the non-convex constraint is | | | m H h k,t || 2 ≥1,Introducing an auxiliary variable c k,t =[Re(m H h k,t ),Im(m H h k,t )]Equation (16) translates to:
the non-convex constraint in equation (17) is c k,t || 2 ≥1,Using c k,t || 2 ≥||c k,t (l) || 2 +2(c k,t (l) ) T (c k,t -c k,t (l) )≥1,Carrying out convex approximation on the non-convex limit by iterative relaxation linear constraint; wherein c is k,t (l) Is the solution after this iterative optimization,
replacing the non-convex constraint in equation (17) with the convex constraint, and rewriting equation (17) as:
order toIteratively solving equation (18) untilWherein epsilon represents the solving precision, and the optimal solution obtained correspondingly is the receiver beam forming vector m.
Step 2-5 is executed in a loop until the global model omega t And (6) converging.
The flow chart of the whole embodiment is shown in fig. 2.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the invention and is not intended to limit the invention, which has been described in detail with reference to the foregoing examples, but it will be apparent to those skilled in the art that modifications may be made to the above-described embodiment or that equivalents may be substituted for elements thereof. All modifications, equivalents and the like which come within the spirit and principle of the invention are intended to be included within the scope of the invention.
Claims (7)
1. A method for scheduling a device based on a combination of channel and local model update, the method comprising the steps of:
step 1: constructing a federated learning system
In an edge intelligent scenario, there are K edge mobile devices with single antenna, denoted as K = {1,2, \ 8230;, K } and a parameter server with M antennas, and each edge mobile device K ∈ K has a local data set D k ,|D k | represents a data set D k The global model omega is trained by a parameter server and the edge mobile equipment together in the federal learning, the federal learning training process is a cyclic process, each cycle is called a training round, each training round obtains a new global model, and the omega is used t Representing the global model, ω, obtained for the t-th training round 0 Representing an initial global model, omega, not trained by federal learning t-1 Representing a global model obtained in the last training round to form a federal learning system;
step 2: updating two parameters by combining the channel and the local model, and scheduling the edge mobile equipment participating in training by the parameter server;
and step 3: the parameter server obtains the global model omega of the last circulation t-1 Sending the data to all the scheduled edge mobile devices;
and 4, step 4: the scheduled edge mobile equipment adopts a random gradient descent algorithm to carry out local training to obtain respective local gradients;
and 5: the scheduled edge mobile equipment uploads the obtained local gradient to a parameter server, and the global model is updated to obtain omega t The process of uploading the local gradient adopts air calculation, and the air calculation is optimized;
step 2-5 is executed in a loop until the global model omega t And (6) converging.
2. The method for scheduling devices based on combination of channel and local model update according to claim 1, wherein the step 2 specifically comprises:
and (3) channel parameters: in the t-th round of training, there are different channel gain vectors h between the parameter server and each edge mobile k k,t (ii) a Each edge mobile device sends a pilot frequency sequence to a parameter server, and the parameter server estimates a channel gain vector h between each edge mobile device and the parameter server according to the received pilot frequency sequence k,t (ii) a Using a channel gain vector h k,t L of 2 Norm ofRepresenting channel parameters between the parameter server and the edge mobile device k, whereinRepresentative vector h k,t Absolute value of ith component, since the parameter server has M antennas in common, the channel gain vector h k,t A total of M components;
local model update parameters: in the t round of training, the edge mobile device k performs local training to obtain a local gradient g k,t Using local gradients g k,t L of 2 Norm g k,t | | represents a local model update parameter of the edge mobile device k; g | | k,t The larger the value of | is, the larger the effect improvement of the local training result of the equipment k on the federal learning training is;
the edge mobile device k does not perform local training before being scheduled, and uses the result g of the local training during scheduling k,t Calculate g k,t L of 2 Norm g k,t I guides the scheduling, so I of local gradient obtained by local training is needed 2 Norm g k,t Estimating | l;
combining the channel with the local model updating to decide the scheduling priority of the equipment, wherein the scheduling priority of the equipment is defined as follows:
I k,t =c||h k,t ||+(1-c)||g k,t || (1)
wherein I k,t Representing the scheduling priority of edge mobile k in the t-th training round, c ∈ [0,1 [ ]]Is a hyper-parameter used to control the impact weight of two scheduling parameters;
scheduling priority I for all edge mobiles k k,t Performing descending order, and under the condition that the number of the scheduling devices is fixed to N, 0<N<K, selecting the first N I k,t The largest equipment participates in the tth round of federal learning training; the set of edge mobile devices scheduled in the t-th training round are:
3. the method as claimed in claim 2, wherein the i of the local gradient obtained by the local training is selected from 2 Norm g k,t Estimating, specifically:
since the local gradients calculated by the edge shifting device in each training round have a strong temporal correlation, the gradient l in the last training round is used 2 Norm to estimate the gradient l of the current training round 2 Norm, specifically, local gradient l obtained by estimation in the t-th training round 2 Norm is:
wherein t is k Representing the last dispatched federal learning training round of the device k;
all edge mobile devices are required to participate in training in the first round of federal learning training round, and local gradients of all edge mobile devices are uploaded, so that the parameter server can perform the gradient l of the subsequent training round 2 Estimating the norm; parameters when edge mobile k is scheduled in the tth round of trainingServer uses local gradient g after local training k,t Updating
4. The method for scheduling devices based on combination of channel and local model update according to claim 1, wherein the step 4 specifically includes:
the method comprises the following steps that local training is carried out on scheduled edge mobile equipment by adopting a random gradient descent algorithm, a loss function is firstly constructed by adopting the random gradient descent algorithm, and then a local gradient is calculated by utilizing the loss function;
the scheduled edge mobile device receives the global model omega obtained from the previous training round t-1 Constructing a global loss function for the entire federated learning system asWhereinSample capacity, F, representing the entire Federal learning System k (ω t-1 ) Is the global model omega obtained from the last cycle t-1 A local loss function on edge mobile k, the global loss function being a weighted average of the local loss functions for all edge mobile k; local loss function on edge shifting device kWherein (x) k ,y k ) Representing data samples on device k, f (x) k ,y k ;ω t-1 ) Is a single sample (x) k ,y k ) Corresponding loss function, f (x) k ,y k ;ω t-1 ) Measure the global model omega t-1 For sample (x) k ,y k ) The matching performance of (2);
constructing a formula for local training of the edge mobile equipment k in the t round of training to obtain a local gradient:
wherein ^ f (x) k ,yk;ω t-1 ) Representing the derivative of the loss function, L k,t ∈D k Is from data set D in the t-th training round k In a small sample set, L, obtained by random selection b =|L k,t I is a small batch sample set L k,t The number of samples in (1); g k,t Is the local gradient obtained by local training.
5. The method for scheduling devices based on combination of channel and local model update according to claim 1, wherein the step 5 specifically includes:
local gradient g to be obtained by scheduled edge mobile devices k,t Uploading to a parameter server, updating the global model, firstly constructing a formula for updating the global model by the parameter server:
wherein eta t Is the learning rate, | S, in the t-th training round t I is the number of centralized equipment of the dispatching equipment;
the process of uploading the gradient adopts air calculation, wherein the air calculation means that the aggregation of the local gradient is realized in the process of transmitting the local gradient signal by utilizing the superposition characteristic of a channel, namely the formula (5) is completed when the local gradient signal is transmittedCalculating a part; the aerial computing is implemented as follows:
in the t-th round of training, the scheduled edge mobiles upload the calculated local gradient g simultaneously k,t All local gradients transmittedThe aggregation is realized in the air, and the aggregation signals received by the parameter server are as follows:
wherein p is k,t Is the transmitter scalar for device k in the t-th round of training; n is mean 0 and variance σ 2 A gaussian white noise vector of;
the parameter server endows the received aggregation signal with a beam forming vector, and the aggregation signal processed by the parameter server is as follows:
where m is the receiver beamforming vector, the superscript H stands for transpose,namely local gradient signal aggregation completed by aerial calculation;
the aerial calculation realizes the aggregation of local gradients, which is influenced by channel fading and noise, so that the aerial calculation is optimized;
under an ideal channel without channel fading and noise interference, the ideal aggregate signal is:
the error between the ideal aggregate signal and the actual aggregate signal is expressed in terms of a mean square error, which is expressed as follows:
where E is the mathematically expected symbol;
in order to reduce the effect of channel gain and noise during the over-the-air computation and improve the over-the-air computation performance, it is necessary to design the transceiver according to the mean square error minimization criterion, minimizing the mean square error between the ideal aggregate signal and the actual aggregate signal.
6. The method as claimed in claim 5, wherein the designing the transceiver according to the mean square error minimization criterion to minimize the mean square error between the ideal aggregate signal and the actual aggregate signal comprises:
designing a transceiver refers to determining a transmitter scalar p k,t And a receiver beamforming vector m;
the transmitter scalar is designed to:
where μ is the transmit power control factor, | p k,t | 2 ≤P 0 ,P 0 Is the maximum transmission power, the symbol | | | calving 2 Is to find l of its intermediate vector 2 The square of the norm;
the transmission power control factor is designed as follows:
substituting the formula (10) and the formula (11) into the formula (7), the actual aggregate signal is simplified as follows:
the mean square error between the actual aggregate signal and the ideal aggregate signal is further expressed as:
an optimization problem is constructed with the aim of minimizing the mean square error:
introducing a virtual variableThe optimization problem formula (14) is converted into the following form
and (3) solving the optimization problem in the formula (16) to obtain a receiver beam forming vector m.
7. The method according to claim 6, wherein the solving of the optimization problem in equation (16) specifically includes:
solving an initial solution of the problem by using a semi-definite relaxation method SDR, and optimizing the initial solution by using a sequential convex approximation algorithm SCA;
the SDR method solves the initial solution as follows:
let A = mm H ,A * =min A tr (A), where tr (A) represents the trace of matrix A, λ 1 Is A * Maximum eigenvalue, u 1 Is λ 1 Corresponding characteristic directionAn amount;
the specific steps of optimizing the initial solution by the SCA method are as follows:
in the optimization problem of equation (16), the non-convex constraint isIntroducing an auxiliary variable c k,t =[Re(m H h k,t ),Im(m H h k,t )]Equation (16) converts to:
the non-convex constraint in formula (17) isUse ofCarrying out convex approximation on the non-convex limit by iterative relaxation linear constraint; wherein c is k,t (l) Is the solution after this iterative optimization,
replacing the non-convex constraint in equation (17) with the convex constraint, equation (17) is rewritten as:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211422803.7A CN115767758A (en) | 2022-11-15 | 2022-11-15 | Equipment scheduling method based on combination of channel and local model update |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211422803.7A CN115767758A (en) | 2022-11-15 | 2022-11-15 | Equipment scheduling method based on combination of channel and local model update |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115767758A true CN115767758A (en) | 2023-03-07 |
Family
ID=85370596
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211422803.7A Pending CN115767758A (en) | 2022-11-15 | 2022-11-15 | Equipment scheduling method based on combination of channel and local model update |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115767758A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116644802A (en) * | 2023-07-19 | 2023-08-25 | 支付宝(杭州)信息技术有限公司 | Model training method and device |
CN116781518A (en) * | 2023-08-23 | 2023-09-19 | 北京光函数科技有限公司 | Federal multi-arm slot machine learning method and system |
-
2022
- 2022-11-15 CN CN202211422803.7A patent/CN115767758A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116644802A (en) * | 2023-07-19 | 2023-08-25 | 支付宝(杭州)信息技术有限公司 | Model training method and device |
CN116781518A (en) * | 2023-08-23 | 2023-09-19 | 北京光函数科技有限公司 | Federal multi-arm slot machine learning method and system |
CN116781518B (en) * | 2023-08-23 | 2023-10-24 | 北京光函数科技有限公司 | Federal multi-arm slot machine learning method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Hamdi et al. | Federated learning over energy harvesting wireless networks | |
CN110719239B (en) | Data model dual-drive combined MIMO channel estimation and signal detection method | |
CN115767758A (en) | Equipment scheduling method based on combination of channel and local model update | |
CN103763782B (en) | Dispatching method for MU-MIMO down link based on fairness related to weighting users | |
CN105338609B (en) | Multiaerial system high energy efficiency dynamic power allocation method | |
CN112911608B (en) | Large-scale access method for edge-oriented intelligent network | |
JP2009141957A (en) | Pre-coding transmission method of mimo system | |
KR102510513B1 (en) | Deep learning based beamforming method and apparatus for the same | |
US20220103211A1 (en) | Method and device for switching transmission methods in massive mimo system | |
US11742901B2 (en) | Deep learning based beamforming method and apparatus | |
WO2022184010A1 (en) | Information reporting method and apparatus, first device, and second device | |
CN114567358B (en) | Large-scale MIMO robust WMMSE precoder and deep learning design method thereof | |
CN109818662A (en) | Mixed-beam manufacturing process in full duplex cloud access number energy integrated network | |
CN112994770A (en) | RIS (remote station identification) assisted multi-user downlink robust wireless transmission method based on partial CSI (channel state information) | |
CN117318774A (en) | Channel matrix processing method, device, terminal and network side equipment | |
Liu et al. | Scalable predictive beamforming for IRS-assisted multi-user communications: A deep learning approach | |
US20240137079A1 (en) | User selection for mu-mimo | |
CN115329954A (en) | Training data set acquisition method, wireless transmission method, device and communication equipment | |
CN110505604A (en) | A kind of method of D2D communication system access frequency spectrum | |
CN115843045A (en) | Data acquisition method and device | |
CN111988791B (en) | Fog calculation-based wireless charging network node computing capacity improving method and system | |
CN115843021A (en) | Data transmission method and device | |
Saxena et al. | A learning approach for optimal codebook selection in spatial modulation systems | |
CN108834155B (en) | Method for optimizing spectrum efficiency based on multiple parameters of large-scale antenna system | |
Bhattacharya et al. | Intelligent channel learning exploiting practical energy harvesting for wireless MISO systems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |