WO2024108601A2 - 终端选择方法、模型训练方法、装置及系统 - Google Patents

终端选择方法、模型训练方法、装置及系统 Download PDF

Info

Publication number
WO2024108601A2
WO2024108601A2 PCT/CN2022/134498 CN2022134498W WO2024108601A2 WO 2024108601 A2 WO2024108601 A2 WO 2024108601A2 CN 2022134498 W CN2022134498 W CN 2022134498W WO 2024108601 A2 WO2024108601 A2 WO 2024108601A2
Authority
WO
WIPO (PCT)
Prior art keywords
terminal
model
model training
training
parameters
Prior art date
Application number
PCT/CN2022/134498
Other languages
English (en)
French (fr)
Inventor
孙宇泽
陈栋
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2022/134498 priority Critical patent/WO2024108601A2/zh
Publication of WO2024108601A2 publication Critical patent/WO2024108601A2/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs

Definitions

  • the present disclosure relates to the field of mobile communication technology, and in particular to a terminal selection method, a model training method, a device and a system.
  • AI Artificial Intelligence
  • FL Federated Learning
  • the present disclosure proposes a terminal selection method, a model training method, an apparatus and a system, which provide a method that comprehensively considers factors such as terminal energy consumption and data quality, so as to biasedly select the most suitable terminal in each round of global iteration to balance the energy consumption of edge terminals and the learning accuracy and convergence speed of FL.
  • a first aspect embodiment of the present disclosure provides a terminal selection method, which is executed by a network device, and the method includes: receiving model training parameters sent by at least one terminal, wherein the at least one terminal is a terminal selected by the network device from multiple terminals for performing a first model training on a global model sent by the network device, and the model training parameters are obtained after the at least one terminal performs the first model training on the global model; based on the model training parameters, determining data quality parameters and energy consumption parameters of the terminals participating in the second model training; and selecting the terminals participating in the second model training from multiple terminals according to the data quality parameters and the energy consumption parameters.
  • the model training parameters include the model weight of the first model training of at least one terminal
  • determining the data quality parameters of the terminals participating in the second model training includes: determining the aggregated model weight of the first model training based on the model weight and the local data size of at least one terminal; determining the weight divergence based on the difference between the model weight and the aggregated model weight, and using the weight divergence as the data quality parameter.
  • the model training parameters include the local data distribution value of the first model training of at least one terminal, and determining the data quality parameters of the terminals participating in the second model training also includes: weighted merging the weight divergence and the local data distribution value to determine the quality score value, and using the quality score value as the data quality parameter.
  • the model training parameters include local training energy consumption of at least one terminal for training a first model
  • determining the energy consumption parameters of the terminal participating in the second model training includes: determining the energy consumption parameters based on the local training energy consumption and the transmission energy consumption of at least one terminal uploading the model training parameters.
  • selecting a terminal to participate in the training of the second model from multiple terminals based on data quality parameters and energy consumption parameters includes: establishing an objective function based on the data quality parameters and the energy consumption parameters; establishing constraints, the constraints including at least one of time constraints, energy consumption constraints, bandwidth constraints, terminal processor cycle frequency constraints, transmission power constraints, and terminal selection decision constraints; solving the objective function under the constraints according to the deep deterministic policy gradient DDPG algorithm to select the terminal to participate in the training of the second model from multiple terminals.
  • the objective function is solved to select a terminal participating in the second model training from multiple terminals, including: representing the objective function with a Markov quadruple, wherein the Markov quadruple includes a state space, an action space, a reward function, and a transition probability; establishing a policy gradient and a loss function under a policy-evaluation network, and determining the optimal state-action pair to select a terminal participating in the second model training from multiple terminals.
  • the state space includes data quality parameters, uplink channel gain, channel bandwidth from the terminal to the network device, and the remaining energy of the terminal;
  • the action space includes terminal selection parameters and the transmission power of the selected terminal;
  • the reward function is the ratio of the sum of the data quality parameters of the terminals participating in the second model training to the energy consumption; and
  • the transition probability is the state transition probability.
  • the second aspect of the present disclosure provides a model training method, which is executed by a network device, and includes: creating an initial global model using public data; receiving resource information sent by multiple terminals; selecting an initial training terminal from the multiple terminals based on the resource information; sending the initial global model to the initial training terminal, and receiving a first model training parameter uploaded by the initial training terminal after training the initial global model; performing model aggregation to generate a global model for first model training; determining data quality parameters and energy consumption parameters of a first terminal participating in the first model training according to the first model training parameters, and selecting a first terminal from the plurality of terminals according to the data quality parameters and the energy consumption parameters.
  • the third aspect of the present disclosure provides a terminal selection device, which includes: a receiving module, which is used to receive model training parameters sent by at least one terminal, wherein the at least one terminal is a terminal selected by a network device from multiple terminals for performing a first model training on a global model issued by the network device, and the model training parameters are obtained after the at least one terminal performs the first model training on the global model; a determination module, which is used to determine data quality parameters and energy consumption parameters of terminals participating in the second model training based on the model training parameters; and a selection module, which is used to select terminals participating in the second model training from multiple terminals based on the data quality parameters and energy consumption parameters.
  • an embodiment of the present disclosure provides a model training device, which includes: a creation module for creating an initial global model using public data; a transceiver module for receiving resource information sent by multiple terminals; a selection module for selecting an initial training terminal from the multiple terminals based on the resource information; the transceiver module is also used to: send the initial global model to the initial training terminal, and receive the first model training parameters uploaded by the initial training terminal after training the initial global model; an aggregation module is used to perform model aggregation to generate a global model for first model training; the selection module is also used to: determine the data quality parameters and energy consumption parameters of the first terminal participating in the first model training according to the first model training parameters, and select the first terminal according to the data quality parameters and the energy consumption parameters.
  • the transceiver module is also used to: send the global model used for the first model training to the first terminal for training, and receive the second model training parameters uploaded by the first terminal after the global model training;
  • the aggregation module is also used to: perform model aggregation to generate a global model for the second model training;
  • the selection module is also used to: determine the data quality parameters and energy consumption parameters of the second terminal participating in the second model training according to the second model training parameters, determine the second terminal for performing the second model training from the multiple terminals according to the data quality parameters and the energy consumption parameters, and send the global model used for the second model training to the second terminal for training until the model accuracy converges.
  • the fifth aspect embodiment of the present disclosure provides a communication device, which includes: a transceiver; a memory; a processor, which is connected to the transceiver and the memory respectively, and is configured to control the wireless signal reception and transmission of the transceiver by executing computer executable instructions on the memory, and can implement any method of the first aspect embodiment or the second aspect embodiment mentioned above.
  • the sixth aspect embodiment of the present disclosure provides a computer storage medium, wherein the computer storage medium stores computer executable instructions; after the computer executable instructions are executed by a processor, the method described in the first aspect or second aspect embodiment of the present disclosure can be implemented.
  • the network device receives the model training parameters sent by at least one terminal, wherein at least one terminal is selected by the network device from multiple terminals for performing the first model training on the global model sent by the network device, and the model training parameters are obtained after at least one terminal performs the first model training on the global model; based on the model training parameters, the data quality parameters and energy consumption parameters of the terminals participating in the second model training are determined; based on the data quality parameters and energy consumption parameters, the terminals participating in the second model training are selected from multiple terminals.
  • the scheme disclosed in the present invention comprehensively considers factors such as terminal energy consumption and data quality, so as to biasedly select the most suitable terminal in each round of global iteration to balance the energy consumption of the edge terminal and the learning accuracy and convergence speed of the FL.
  • FIG1 is a schematic diagram of a flow chart of a terminal selection method according to an embodiment of the present disclosure
  • FIG2 is a schematic diagram of a flow chart of a transmission configuration method according to an embodiment of the present disclosure
  • FIG3 is a flow chart of a model training method according to an embodiment of the present disclosure.
  • FIG4 is a schematic block diagram of a terminal selection device according to an embodiment of the present disclosure.
  • FIG5 is a schematic block diagram of a model training device according to an embodiment of the present disclosure.
  • FIG6 is a schematic diagram of the structure of a communication device according to an embodiment of the present disclosure.
  • FIG. 7 is a schematic diagram of the structure of a chip provided in an embodiment of the present disclosure.
  • the Internet of Everything enables all terminal devices to be analyzed and calculated as intelligent entities to form an intelligent network.
  • Mobile phones, laptops, sensors and other terminal devices will generate a large amount of heterogeneous data locally. If these data can be effectively used by the network, they will make positive contributions to intelligent analysis and resource optimization.
  • user data was usually sent to the central server for computing and storage. This method not only caused many security and privacy issues, but also caused huge energy consumption and latency during the transmission process. This is contrary to the concept of 6G green communication. At the same time, it will occupy a large bandwidth and hinder the normal operation of other task queues in the network.
  • With the popularization of mobile edge computing and federated learning technologies and the enhancement of the computing power of terminal devices more computing tasks are placed on the edge server side or the user's local side for computing.
  • Such a network intelligent architecture brings more feasibility of orchestration and deployment.
  • the terminal uses its local data to train and update the machine learning (ML) model required by the server.
  • the terminal device sends the model update parameters instead of the original data to the server for aggregation, and the server then sends the aggregated model to the terminal, repeating this iteration for multiple rounds until the model converges.
  • ML machine learning
  • the present disclosure aims to solve the problem that there is no corresponding terminal (or client) selection solution in the related art, that is, how to dynamically select the terminals participating in each round according to the model update situation during the training process to balance the data distribution and reduce the impact of terminal non-iid data on the model aggregation effect.
  • the purpose of the present invention is to comprehensively consider the energy consumption and delay generated by each terminal in the local calculation and model transmission process during the FL training process, as well as the local data quality of the terminal, analyze and select the terminal set participating in each round of training and the corresponding power allocation scheme, so as to achieve flexible and efficient federated learning training, and minimize energy consumption while quickly converging the model.
  • the present disclosure proposes a terminal selection method, a model training method, a device and a system, which provide a comprehensive consideration of factors such as terminal energy consumption and data quality, so as to biasedly select the most suitable terminal in each round of global iteration to balance the energy consumption of edge terminals and the learning accuracy and convergence speed of FL.
  • the solution provided in the present disclosure can be used for the fifth generation mobile communication technology (Fifth Generation, 5G) and its subsequent communication technologies, such as the fifth generation mobile communication technology evolution (5G-advanced), the sixth generation mobile communication technology (Sixth Generation, 6G), etc., which are not limited in the present disclosure.
  • 5G fifth generation mobile communication technology
  • 6G sixth generation mobile communication technology
  • Fig. 1 shows a schematic flow chart of a terminal selection method according to an embodiment of the present disclosure. As shown in Fig. 1, the method is executed by a network device, which may be a server. As shown in Fig. 1, the method includes the following steps.
  • the server can dynamically select the terminals participating in the training in this round based on the terminal conditions selected by the server in the previous round, taking into account the energy consumption of each terminal and the data quality. Therefore, in an embodiment of the present disclosure, at least one terminal is a terminal selected by a network device from multiple terminals for performing the first model training on the global model issued by the network device, wherein the first model training is the previous round of model training.
  • the model training parameters received by the server can be sent by the terminal selected by the server in the previous round of training.
  • the model training parameters are obtained after at least one terminal performs the first model training on the global model.
  • the model training parameters can be parameters such as data quality score, CPU frequency, channel gain, battery power, etc. As long as the parameters used for model training and iteration fall within the scope of the present disclosure, they are not limited in the present disclosure.
  • S102 Determine data quality parameters and energy consumption parameters of terminals participating in second model training based on model training parameters.
  • the data quality parameter can be used to judge the local data quality of the terminal
  • the energy consumption parameter can be used to judge the energy consumption of information transmission between the terminal and the server and the energy consumption of model training in the terminal. This embodiment does not limit the types and forms of the data quality parameters and energy consumption parameters.
  • the second model training is the current round of model training, that is, the current round of model training, which is a relative concept to the first model training, wherein "first" and “second” are only used as names to distinguish and do not represent limitations on the present disclosure.
  • S103 Select a terminal to participate in the second model training from multiple terminals according to the data quality parameter and the energy consumption parameter.
  • the server can select a terminal suitable for participating in the second model training according to the determined data quality parameters and energy consumption parameters, so as to send the global model to the selected terminal in the subsequent model training process for the current round of training. After the current round of training is completed, the next round of training can be carried out until the model converges. In the next round of training, the server can again dynamically select the terminal participating in the next round of training according to the model training parameters fed back by the terminal until the final model converges.
  • the network device receives the model training parameters sent by at least one terminal, wherein at least one terminal is selected by the network device from multiple terminals for performing the first model training on the global model sent by the network device, and the model training parameters are obtained after at least one terminal performs the first model training on the global model; based on the model training parameters, the data quality parameters and energy consumption parameters of the terminals participating in the second model training are determined; based on the data quality parameters and energy consumption parameters, the terminals participating in the second model training are selected from multiple terminals.
  • the scheme disclosed in the present disclosure comprehensively considers factors such as terminal energy consumption and data quality, so as to biasedly select the most suitable terminal in each round of global iteration to balance the energy consumption of the edge terminal and the learning accuracy and convergence speed of the FL.
  • Fig. 2 shows a schematic flow chart of a transmission configuration method according to an embodiment of the present disclosure. Based on the embodiment shown in Fig. 1 , as shown in Fig. 2 , the method may include the following steps.
  • the server can directly request some or all terminals to upload their resource information to the server, such as data quality score, CPU frequency, channel gain, battery power, etc.
  • the server selects the terminals participating in the first round of training based on the resource information of each terminal.
  • the server selects the terminals participating in this round of training based on the model training parameters obtained after training the global model fed back by the terminals selected in the previous round.
  • step 201 The other principles of the above step 201 are the same as those of step 101 and will not be described again.
  • the model training parameters include the model weights of the first model training of at least one terminal.
  • the process of determining the data quality parameters in step S102 shown in FIG. 1 is explained in detail below through steps S202-204.
  • the data heterogeneity of the terminal is understood as category imbalance, in which each device does not follow a common data distribution, that is, the data distribution in the device is non-iid.
  • This data heterogeneity will greatly weaken the aggregation quality of the model, resulting in low classification accuracy.
  • this difference is referred to as the skewness of the data distribution. Since the test accuracy is determined by the training weight, and there is a certain correlation between the weight update of each terminal and its data distribution.
  • the entire FL system needs to provide feedback on the overall training process through the aggregated updates of the server, and infer information through the aggregated model weight parameter results, so as to more effectively select the terminals participating in the training in the future.
  • S202 Determine an aggregated model weight of the first model training according to a model weight of the first model training of at least one terminal and a local data size of at least one terminal.
  • each round of training will go through the process of server aggregation model ⁇ model distribution ⁇ terminal training model ⁇ terminal reporting model training parameters ⁇ server selecting terminal ⁇ server aggregation model ⁇ model distribution...
  • the server can determine the weight update direction of the model based on the aggregation results, and then calculate the aggregate model weight based on the model weight uploaded by each terminal in the previous round and the local data size, as shown in the following formula:
  • n represents the nth terminal, where n ⁇ [1,N], N is a positive integer, and t represents the tth round of training.
  • Dn is the local data size of the terminal
  • wGlobal is the aggregated model weight (that is, it can be understood as the result of aggregating and averaging the updated weights after training with the data at each terminal).
  • the server determines the weight divergence (WD) according to the difference between the model weight and the aggregate model weight, and uses the weight divergence as a data quality parameter for terminal selection, as shown in the following formula:
  • the data quality parameter described in this disclosure can be the weight divergence. Specifically, if the current aggregate model weight is significantly different from the local weight of a terminal, that is, If the terminal has never participated in training before, the local model weight is set to 0. If it has participated in training many times before, the current data quality is judged by the weight of the most recent round of update. To represent the weight of the most recent (previous) update.
  • the model training parameters include a local data distribution value of the first model training of at least one terminal.
  • the present disclosure adopts a parameter item to reduce the probability of terminals with limited contribution to the model being selected, and the parameter item is determined according to the weight divergence of the previous round of models and the local data distribution.
  • the present disclosure refers to this parameter item as the quality score value, Indicates that the quality score value Owei data quality parameter is used to measure the data quality of each terminal in the current round.
  • the terminal only needs to send the EMD value of the local data distribution to the server in each round to protect its local data, while informing the server of its distribution.
  • the server After receiving the EMD value of each terminal, the server compares it with the EMD value of the previous round of training. Perform weighted merging to obtain quality score values
  • P const is a constant term used to prevent the fractional value from being negative.
  • model weight parameters of each terminal after the previous round of training are recorded in the server for update, so as to facilitate the comparison of the data quality scores of each terminal in the next round. This process is iterated multiple times during the entire training process.
  • the data quality parameter described in the present disclosure may be a quality score value, The higher the value, the more useful the terminal n is for the current round of model update and convergence.
  • the present disclosure selects terminals by weighted merging of quality score values, which can reduce the probability of terminals participating in training multiple times and give terminals with extreme data distribution an opportunity to participate.
  • the model training parameters include local training energy consumption of at least one terminal first model training.
  • the process of determining the energy consumption parameter in step S102 shown in FIG. 1 is explained in detail below through step S205.
  • S205 Determine energy consumption parameters according to local training energy consumption and transmission energy consumption of at least one terminal uploading model training parameters.
  • E g E g
  • a n,t represents a binary indicator.
  • the energy consumption of each terminal includes two aspects: one is the energy consumption of model parameter transmission from the terminal to the edge server, and the other is the energy consumption of local model calculation in the terminal.
  • n For the terminal n selected in round t, it incurs energy consumption due to local training and uploading local updates to the edge server through the wireless channel.
  • c n represent the number of CPU cycles for the nth terminal to execute a data sample, which is known a priori. Assuming that all data samples have the same data size, that is, the number of bits, let Dn be the amount of data for terminal n, then the number of CPU cycles required for terminal n to run a local epoch is c n D n .
  • U t,n represents the number of local training rounds of terminal n in the tth round of global iteration.
  • OFDMA Orthogonal Frequency Division Multiple Access
  • the bandwidth allocation ratio of terminal n in the tth round of global iteration is the bandwidth allocation ratio of terminal n in the tth round of global iteration, and the subchannel bandwidth allocated to the nth device is make For the bandwidth allocation of the entire terminal, it must meet If terminal n is selected in the tth round, the present disclosure requires that at least a minimum bandwidth ratio bmin be allocated to terminal n, i.e. This is because actual systems cannot allocate arbitrarily small bandwidth to a single terminal due to limited resource block size.
  • the achievable transmission rate of terminal n is defined as:
  • N0 is the power spectral density of Gaussian white noise
  • Gn is the channel gain between terminal n and the central server.
  • Model parameters w n and gradient The data size is s n . Assuming that the size of Sn is constant, the time for each terminal to transmit the local model update in the tth round is:
  • step S103 in the embodiment shown in FIG. 1 is explained below through steps S206 - S208 .
  • S206 Establish an objective function according to the data quality parameters and the energy consumption parameters.
  • constraint conditions include at least one of a time constraint, an energy consumption constraint, a bandwidth constraint, a terminal processor cycle frequency constraint, a transmission power constraint, and a terminal selection decision constraint.
  • the time taken by the terminal to complete model training and uploading in each round of iteration must be less than or equal to the deadline Tmax.
  • the energy consumption constraint is:
  • the bandwidth constraint is:
  • the terminal processor cycle frequency constraint is:
  • the transmission power constraint is:
  • the terminal selection decision constraints are:
  • the decision of terminal selection is a binary variable.
  • this step includes: representing the objective function with a Markov quadruple, wherein the Markov quadruple includes a state space, an action space, a reward function, and a transition probability; establishing a policy gradient and a loss function under a policy-evaluation network, and determining the optimal state-action pair to select a terminal participating in the second model training from multiple terminals.
  • the state space includes data quality parameters, uplink channel gain, channel bandwidth from the terminal to the network device, and the remaining energy of the terminal;
  • the action space includes terminal selection parameters and the transmission power of the selected terminal;
  • the reward function is the ratio of the sum of data quality parameters of the terminals participating in the second model training to the energy consumption;
  • the transition probability is the state transition probability.
  • P1 is a collaboration problem between multiple terminal devices, it is a MINLP problem (mixed integer nonlinear programming problem), and it is difficult to obtain its optimal solution within the polynomial time complexity. Therefore, an online optimization method, namely a deep reinforcement learning method, is used to solve it.
  • the action space consists of two factors, one is terminal selection and the other is power allocation, it contains both continuous actions and discrete actions.
  • the traditional value-based Q-learning algorithm cannot solve this problem well. Therefore, the present disclosure adopts the DDPG algorithm here, which is a deterministic strategy method that can effectively deal with continuous action decisions.
  • the present disclosure models the formulated problem as a Markov decision process and defines its state space and action space. Then design a reward function to represent the ratio of the data quality score and energy consumption corresponding to each round of specific actions.
  • An MDP can be represented by a 4-tuple (S, A, P, R), where S is the state space, A is the action space, P is the state transition probability, which represents the probability of transitioning from state St to the next state St +1 , and R is the reward function.
  • State space S According to problem P1, the state space of MDP contains the data quality score, the channel gain of the uplink, the channel bandwidth from the terminal to the server, and the residual energy of the IoT device.
  • the network state S t in round t is defined as follows:
  • Action space A The action of MDP is the terminal selection ⁇ n,t of FL and the transmission power P n,t of the selected device.
  • ⁇ ( ⁇ n,t ,P n,t ),n 1,...,N, ⁇ n,t ⁇ 0,1 ⁇ and
  • Reward function R Let R(S t+1
  • the reward is defined as the ratio of the sum of the data quality scores of the devices selected by FL in the current round to the energy consumption, that is:
  • the present disclosure uses the evaluation strategy ⁇ to evaluate the selected action, where ⁇ is the mapping from each state to the action.
  • the goal of the present disclosure is to maximize the expected total reward, that is, the action value function Q(S t ,A t
  • the present disclosure uses the DDPG algorithm to find the approximate optimal solution to problem P1.
  • the DDPG algorithm adopts an actor-critic network structure, and the server in FL is trained as the decision maker of this algorithm to optimize actions in a continuous action space, namely, the selection of terminal devices and power allocation, so that the system reward continues to increase.
  • DDPG uses a policy gradient method to map the network state to a specific action.
  • ⁇ ⁇ ) deterministically maps states to a specific continuous action, and the Critic network is used to approximate the actor-value function.
  • the DDPG algorithm uses an experience replay mechanism in each training step to use a register B to store information about the previous action taken by the server, the current state, the current action, the reward value, and the next state. Each time a mini-batch of data is sampled from the experience pool to train the actor-critic network.
  • the result of each action contains two parts, as mentioned above, namely the terminal selection of FL and the transmission power of the selected device.
  • ⁇ ( ⁇ n,t ,P n,t ),n 1,...,N ⁇ , where ⁇ n,t ⁇ 0,1 ⁇ and
  • the present disclosure converts the continuous value into a binary value by setting a threshold.
  • the goal is to find the optimal policy ⁇ ⁇ to maximize the expectation of the reward function
  • the policy ⁇ can be updated by taking the gradient of the expected return with respect to the network parameter ⁇ , and the actor’s policy gradient can be calculated using the chain rule.
  • the optimal action value function can be approximated by the Critic network Q(S t ,A t
  • the Critic network minimizes the approximation error between the current target value and Q(S t ,A t
  • a soft update method is used for updating.
  • the learning rate ⁇ is introduced, and the old target network parameters ⁇ Q , ⁇ ⁇ and the new corresponding network parameters ⁇ Q' , ⁇ ⁇ ' are weighted averaged and then assigned to the target network. This ensures that the network learning process is more stable and reduces over-estimation to a certain extent.
  • noise ⁇ needs to be added to the action to enable the agent to have exploration capabilities during the training phase.
  • the total delay of terminal n mainly includes the local model training time and parameter result upload time of terminal n. Since the downlink bandwidth is much larger than the uplink bandwidth, the time for the server to send the model to the terminal can be ignored. and They represent the local model training time and result upload time of terminal n in round t respectively.
  • the present disclosure uses an algorithm to select a terminal, which can be summarized as the following process.
  • Fig. 3 shows a flow chart of a model training method according to an embodiment of the present disclosure.
  • the method can be executed by a network device, specifically, the network device can be a server.
  • the method may include the following steps.
  • the server can apply public data to create an initial global model.
  • the public data can be understood as the ⁇ 10CIFA2 data set. Taking 60,000 images as a public data set as an example, it can include a training set of 50,000 images and a test set of 10,000 images, which can be distributed among multiple terminals.
  • S302 Receive resource information sent by multiple terminals.
  • the service may initiate a resource request: the server requests each terminal to upload its resource information (data quality score, CPU frequency, channel gain, battery power, etc.) to the server.
  • resource information data quality score, CPU frequency, channel gain, battery power, etc.
  • S303 Select an initial training terminal from the multiple terminals based on the resource information.
  • the server can perform joint optimization terminal selection: based on the information provided by each terminal, the server selects several clients within the total bandwidth B to participate in the model training.
  • the initial training terminal is a terminal that performs model training for the first time after initialization.
  • S304 Send the initial global model to the initial training terminal, and receive the first model training parameters uploaded by the initial training terminal after training the initial global model.
  • the server sends the global model to the selected terminal. Subsequently, the selected terminal uses its local data to train the shared model and uploads the new model parameters to the server through the channel.
  • S305 Perform model aggregation to generate a global model for first model training.
  • the server uses the Fedavg method to aggregate the uploaded model parameters and generate a new model.
  • S306 Determine the data quality parameters and energy consumption parameters of the first terminal participating in the first model training according to the first model training parameters, determine the first terminal for performing the first model training from the multiple terminals according to the data quality parameters and the energy consumption parameters, and send the global model used for the first model training to the first terminal for training.
  • S309 Determine the data quality parameters and energy consumption parameters of the second terminal participating in the second model training according to the second model training parameters; determine the second terminal for performing the second model training from the multiple terminals according to the data quality parameters and the energy consumption parameters; and send the global model used for the second model training to the second terminal for training until the model accuracy converges.
  • the overall federated learning training process is as follows:
  • the model training method provided by the present disclosure is specifically designed for the situation where the device data is heterogeneously distributed in the FL system by designing a strategy for terminal selection and resource allocation in a federated learning model.
  • the latest deep reinforcement learning algorithm DDPG is used to generate actions for each round, including the selection of terminals and the power allocation of terminals, so that the FL system can not only improve the model accuracy and accelerate convergence as much as possible within a limited time limit, but also comprehensively consider multiple aspects, so that the system can meet the model accuracy requirements while minimizing energy consumption under the constraints of the remaining power of the terminal device and the transmission bandwidth, which is in line with the development concept of green communication.
  • the network device may include a hardware structure and a software module, and implement the functions in the form of a hardware structure, a software module, or a hardware structure plus a software module.
  • a function of the functions may be executed in the form of a hardware structure, a software module, or a hardware structure plus a software module.
  • the present disclosure also provides a terminal selection device. Since the terminal selection device provided in the embodiment of the present disclosure corresponds to the terminal selection methods provided in the above-mentioned embodiments, the implementation method of the terminal selection method is also applicable to the terminal selection device provided in this embodiment and will not be described in detail in this embodiment.
  • Figure 4 is a structural schematic diagram of a terminal selection device 400 provided in an embodiment of the present disclosure.
  • the device 400 may include: a receiving module 410, used to receive model training parameters sent by at least one terminal, wherein the at least one terminal is a terminal selected by a network device from multiple terminals for performing a first model training on a global model issued by the network device, and the model training parameters are obtained after at least one terminal performs a first model training on the global model; a determination module 420, used to determine data quality parameters and energy consumption parameters of terminals participating in the second model training based on the model training parameters; a selection module 430, used to select terminals participating in the second model training from multiple terminals based on the data quality parameters and energy consumption parameters.
  • a receiving module 410 used to receive model training parameters sent by at least one terminal, wherein the at least one terminal is a terminal selected by a network device from multiple terminals for performing a first model training on a global model issued by the network device, and the model training parameters are obtained after at least one
  • the model training parameters include model weights of a first model training of at least one terminal
  • the determination module 420 is used to: determine the aggregated model weights of the first model training based on the model weights and the local data size of at least one terminal; determine the weight divergence based on the difference between the model weights and the aggregated model weights, and use the weight divergence as a data quality parameter.
  • the model training parameters include a local data distribution value of at least one terminal first model training
  • the determination module 420 is used to: weightedly combine the weight divergence and the local data distribution value to determine a quality score value, and the quality score value is used as a data quality parameter.
  • the model training parameters include local training energy consumption of the first model training of at least one terminal
  • the determination module 420 is used to determine the energy consumption parameters according to the local training energy consumption and the transmission energy consumption of uploading the model training parameters by at least one terminal.
  • the selection module 430 is used to: establish an objective function based on data quality parameters and energy consumption parameters; establish constraints, the constraints including at least one of time constraints, energy consumption constraints, bandwidth constraints, terminal processor cycle frequency constraints, transmission power constraints, and terminal selection decision constraints; solve the objective function under the constraints according to the deep deterministic policy gradient DDPG algorithm to select a terminal to participate in the second model training from multiple terminals.
  • the selection module 430 is used to: represent the objective function with a Markov quadruple, wherein the Markov quadruple includes a state space, an action space, a reward function, and a transition probability; establish a policy gradient and a loss function under a policy-evaluation network, determine the optimal state-action pair, and select a terminal from multiple terminals to participate in the second model training.
  • the state space includes data quality parameters, uplink channel gain, channel bandwidth from the terminal to the network device, and the remaining energy of the terminal;
  • the action space includes terminal selection parameters and the transmission power of the selected terminal;
  • the reward function is the ratio of the sum of data quality parameters of the terminals participating in the second model training to the energy consumption;
  • the transition probability is the state transition probability.
  • the network device receives the model training parameters sent by at least one terminal, wherein at least one terminal is selected by the network device from multiple terminals for performing the first model training on the global model sent by the network device, and the model training parameters are obtained after at least one terminal performs the first model training on the global model; based on the model training parameters, the data quality parameters and energy consumption parameters of the terminals participating in the second model training are determined; based on the data quality parameters and energy consumption parameters, the terminals participating in the second model training are selected from multiple terminals.
  • the scheme of the present disclosure comprehensively considers factors such as terminal energy consumption and data quality, so as to biasedly select the most suitable terminal in each round of global iteration to balance the energy consumption of the edge terminal and the learning accuracy and convergence speed of the FL.
  • the present disclosure also provides a terminal selection device. Since the terminal selection device provided in the embodiment of the present disclosure corresponds to the terminal selection methods provided in the above-mentioned embodiments, the implementation method of the terminal selection method is also applicable to the terminal selection device provided in this embodiment and will not be described in detail in this embodiment.
  • FIG5 is a schematic diagram of the structure of a model training device 500 provided in an embodiment of the present disclosure.
  • the device 500 may include: a creation module 510, used to create an initial global model using public data; a transceiver module 520, used to receive resource information sent by multiple terminals; a selection module 530, used to select an initial training terminal from the multiple terminals based on the resource information; the transceiver module 520 is also used to: send the initial global model to the initial training terminal, and receive the first model training parameters uploaded by the initial training terminal after training the initial global model; an aggregation module 540, used to perform model aggregation to generate a global model for first model training; the selection module 530 is also used to: determine the data quality parameters and energy consumption parameters of the first terminal participating in the first model training according to the first model training parameters, and determine the first terminal to participate in the first model training according to the data quality parameters and the energy consumption parameters from the multiple terminals.
  • a strategy for terminal selection and resource allocation in a federated learning model is designed to specifically target the situation where the device data is heterogeneously distributed in the FL system.
  • the latest deep reinforcement learning algorithm DDPG is used to generate actions for each round, including the selection of the terminal and the power allocation of the terminal, so that the FL system can not only improve the model accuracy and accelerate convergence as much as possible within a limited time limit, but also comprehensively consider multiple aspects, so that the system can meet the model accuracy requirements while minimizing energy consumption under the constraints of the remaining power of the terminal device and the transmission bandwidth, which is in line with the development concept of green communication.
  • the present application provides a communication system, including: a network device and a terminal, wherein: the network device is configured to execute the terminal selection method or model training method shown in the embodiments of Figures 1-3 of the present disclosure.
  • FIG. 6 is a schematic diagram of the structure of a communication device 600 provided in an embodiment of the present application.
  • the communication device 600 can be a network device, or a user device, or a chip, a chip system, or a processor that supports the network device to implement the above method, or a chip, a chip system, or a processor that supports the user device to implement the above method.
  • the device can be used to implement the method described in the above method embodiment, and the details can be referred to the description in the above method embodiment.
  • the communication device 600 may include one or more processors 601.
  • the processor 601 may be a general-purpose processor or a dedicated processor, etc. For example, it may be a baseband processor or a central processing unit.
  • the baseband processor may be used to process the communication protocol and communication data
  • the central processing unit may be used to control the communication device (such as a base station, a baseband chip, a terminal device, a terminal device chip, a DU or a CU, etc.), execute a computer program, and process the data of the computer program.
  • the communication device 600 may further include one or more memories 602, on which a computer program 604 may be stored, and the processor 601 executes the computer program 604 so that the communication device 600 performs the method described in the above method embodiment.
  • data may also be stored in the memory 602.
  • the communication device 600 and the memory 602 may be provided separately or integrated together.
  • the communication device 600 may further include a transceiver 605 and an antenna 606.
  • the transceiver 605 may be referred to as a transceiver unit, a transceiver, or a transceiver circuit, etc., and is used to implement a transceiver function.
  • the transceiver 605 may include a receiver and a transmitter, the receiver may be referred to as a receiver or a receiving circuit, etc., and is used to implement a receiving function; the transmitter may be referred to as a transmitter or a transmitting circuit, etc., and is used to implement a transmitting function.
  • the communication device 600 may further include one or more interface circuits 607.
  • the interface circuit 607 is used to receive code instructions and transmit them to the processor 601.
  • the processor 601 executes the code instructions to enable the communication device 600 to execute the method described in the above method embodiment.
  • the processor 601 may include a transceiver for implementing the receiving and sending functions.
  • the transceiver may be a transceiver circuit, or an interface, or an interface circuit.
  • the transceiver circuit, interface, or interface circuit for implementing the receiving and sending functions may be separate or integrated.
  • the above-mentioned transceiver circuit, interface, or interface circuit may be used for reading and writing code/data, or the above-mentioned transceiver circuit, interface, or interface circuit may be used for transmitting or delivering signals.
  • the processor 601 may store a computer program 603, which runs on the processor 601 and enables the communication device 600 to perform the method described in the above method embodiment.
  • the computer program 603 may be fixed in the processor 601, in which case the processor 601 may be implemented by hardware.
  • the communication device 600 may include a circuit that can implement the functions of sending or receiving or communicating in the aforementioned method embodiment.
  • the processor and transceiver described in the present application can be implemented in an integrated circuit (IC), an analog IC, a radio frequency integrated circuit RFIC, a mixed signal IC, an application specific integrated circuit (ASIC), a printed circuit board (PCB), an electronic device, etc.
  • the processor and transceiver can also be manufactured using various IC process technologies, such as complementary metal oxide semiconductor (CMOS), N-type metal oxide semiconductor (NMetal-Oxide-Semiconductor, NMOS), P-type metal oxide semiconductor (Positive Channel Metal Oxide Semiconductor, PMOS), bipolar junction transistor (Bipolar Junction Transistor, BJT), bipolar CMOS (BiCMOS), silicon germanium (SiGe), gallium arsenide (GaAs), etc.
  • CMOS complementary metal oxide semiconductor
  • NMetal-Oxide-Semiconductor NMOS
  • P-type metal oxide semiconductor Positive Channel Metal Oxide Semiconductor
  • PMOS bipolar junction transistor
  • BiCMOS bipolar CMOS
  • SiGe silicon germanium
  • GaAs gallium arsenide
  • the communication device described in the above embodiments may be a network device or a user device, but the scope of the communication device described in the present application is not limited thereto, and the structure of the communication device may not be limited by FIG. 6.
  • the communication device may be an independent device or may be part of a larger device.
  • the communication device may be:
  • the IC set may also include a storage component for storing data and computer programs;
  • ASIC such as modem
  • the communication device can be a chip or a chip system
  • the communication device can be a chip or a chip system
  • the chip shown in Figure 7 includes a processor 701 and an interface 702.
  • the number of processors 701 can be one or more, and the number of interfaces 702 can be multiple.
  • the chip further includes a memory 703, and the memory 703 is used to store necessary computer programs and data.
  • the present application also provides a readable storage medium having instructions stored thereon, which implement the functions of any of the above method embodiments when executed by a computer.
  • the computer program product includes one or more computer programs.
  • the computer can be a general-purpose computer, a special-purpose computer, a computer network, or other programmable device.
  • the computer program can be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer program can be transmitted from a website site, computer, server or data center by wired (e.g., coaxial cable, optical fiber, digital subscriber line (Digital Subscriber Line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) mode to another website site, computer, server or data center.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server or data center that contains one or more available media integrated.
  • Available media can be magnetic media (e.g., floppy disks, hard disks, tapes), optical media (e.g., high-density digital video discs (DVD)), or semiconductor media (e.g., solid-state drives (SSD)), etc.
  • magnetic media e.g., floppy disks, hard disks, tapes
  • optical media e.g., high-density digital video discs (DVD)
  • DVD digital video discs
  • semiconductor media e.g., solid-state drives (SSD)
  • At least one in the present application can also be described as one or more, and a plurality can be two, three, four or more, which is not limited in the present application.
  • the technical features in the technical feature are distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D”, etc., and there is no order of precedence or size between the technical features described by the "first”, “second”, “third”, “A”, “B”, “C” and “D”.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or device (e.g., disk, optical disk, memory, programmable logic device (PLD)) for providing machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.
  • machine-readable signal refers to any signal for providing machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., a user computer with a graphical user interface or a web browser through which a user can interact with implementations of the systems and techniques described herein), or a computing system that includes any combination of such back-end components, middleware components, or front-end components.
  • the components of the system may be interconnected by any form or medium of digital data communication (e.g., a communications network). Examples of communications networks include: a local area network (LAN), a wide area network (WAN), and the Internet.
  • a computer system may include terminals and servers.
  • the terminals and servers are generally remote from each other and usually interact through a communication network.
  • the relationship between the terminal and the server is generated by computer programs running on the respective computers and having a terminal-server relationship to each other.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本公开实施例提供一种终端选择方法、模型训练方法、装置及系统,涉及通信技术领域,该方法通过网络设备接收至少一个终端发送的模型训练参数,其中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,模型训练参数为至少一个终端对全局模型进行第一模型训练后得到的;基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。本公开的方案综合考虑终端能耗和数据质量等因素,从而有偏向地选择每轮全局迭代中最适合的终端,以平衡边缘终端的能耗和FL的学习精度与收敛速度。

Description

终端选择方法、模型训练方法、装置及系统 技术领域
本公开涉及移动通信技术领域,特别涉及一种终端选择方法、模型训练方法、装置及系统。
背景技术
随着移动网络通信技术的不断演进,各应用场景对于网络通信效率的需求越来越高。人工智能(Artificial Intelligence,AI)技术在通信领域取得不断突破,为用户带来丰富的应用体验。在AI技术中,终端可以使用其本地数据来训练并更新服务器所需的联邦学习(Federated Learning,FL)模型,但针对每轮全局迭代中如何有偏向性地选择合适终端的问题,目前尚没有很好的解决方案。
发明内容
本公开提出了一种终端选择方法、模型训练方法、装置及系统,提供了一种综合考虑终端能耗和数据质量等因素,从而有偏向地选择每轮全局迭代中最适合的终端,以平衡边缘终端的能耗和FL的学习精度与收敛速度。
本公开的第一方面实施例提供了一种终端选择方法,该方法由网络设备执行,该方法包括:接收至少一个终端发送的模型训练参数,其中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,模型训练参数为至少一个终端对全局模型进行第一模型训练后得到的;基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。
在本公开的一些实施例中,模型训练参数包括至少一个终端第一模型训练的模型权重,确定参与第二模型训练的终端的数据质量参数包括:根据模型权重以及至少一个终端的本地数据大小,确定第一模型训练的聚合模型权重;根据模型权重和聚合模型权重之间的差异,确定权重散度,将权重散度作为数据质量参数。
在本公开的一些实施例中,模型训练参数包括至少一个终端第一模型训练的本地数据分布值,确定参与第二模型训练的终端的数据质量参数还包括:对权重散度以及本地数据分布值进行加权合并,确定质量分数值,将质量分数值作为数据质量参数。
在本公开的一些实施例中,模型训练参数包括至少一个终端第一模型训练的本地训练能耗,确定参与第二模型训练的终端的能耗参数包括:根据本地训练能耗以及至少一个终端上传模型训练参数的传输能耗,确定能耗参数。
在本公开的一些实施例中,根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端包括:根据数据质量参数以及能耗参数,建立目标函数;建立约束条件,约束条件包括时间约束、能耗约束、带宽约束、终端处理器周期频率约束、传输功率约束以及终端选择决策约束中的至少一项;根据深度确定性策略梯度DDPG算法,在约束条件下对目标函数进行求解,以从多个终端中选出参与第二模型训练的终端。
在本公开的一些实施例中,根据DDPG算法,对目标函数进行求解,以从多个终端中选出参与第 二模型训练的终端包括:以马尔科夫四元组表示目标函数,其中,马尔科夫四元组包括状态空间、动作空间、奖励函数以及转移概率;在策略-评价网络下建立策略梯度和损失函数,确定最优状态-动作对,以从多个终端中选出参与第二模型训练的终端。
在本公开的一些实施例中,状态空间包括数据质量参数、上行链路的信道增益、终端到网络设备的信道带宽、以及终端的剩余能量;动作空间包括终端选择参数、以及被选择终端的传输功率;奖励函数为参与第二模型训练的终端的数据质量参数之和与能耗之比;转移概率为状态转移概率。
本公开第二方面实施例提供一种模型训练方法,该方法由网络设备执行,该方法包括:使用公共数据创建初始全局模型;接收多个终端发送的资源信息;基于所述资源信息,从所述多个终端中选择初始训练终端;将所述初始全局模型下发给所述初始训练终端,并接收所述初始训练终端对所述初始全局模型训练后上传的第一模型训练参数;执行模型聚合以生成用于第一模型训练的全局模型;根据所述第一模型训练参数,确定参与所述第一模型训练的第一终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第一模型训练的第一终端,将用于所述第一模型训练的全局模型下发给所述第一终端以进行训练;接收所述第一终端对所述全局模型训练后上传的第二模型训练参数;执行模型聚合以生成用于第二模型训练的全局模型;根据所述第二模型训练参数,确定参与所述第二模型训练的第二终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第二模型训练的第二终端,将用于所述第二模型训练的全局模型下发给所述第二终端以进行训练,直至模型精度收敛。
本公开第三方面实施例提供一种终端选择装置,该装置包括:接收模块,用于接收至少一个终端发送的模型训练参数,其中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,模型训练参数为至少一个终端对全局模型进行第一模型训练后得到的;确定模块,用于基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;选择模块,用于根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。
本公开第四方面实施例提供一种模型训练装置,该装置包括:创建模块,用于使用公共数据创建初始全局模型;收发模块,用于接收多个终端发送的资源信息;选择模块,用于基于所述资源信息,从所述多个终端中选择初始训练终端;所述收发模块还用于:将所述初始全局模型下发给所述初始训练终端,并接收所述初始训练终端对所述初始全局模型训练后上传的第一模型训练参数;聚合模块,用于执行模型聚合以生成用于第一模型训练的全局模型;所述选择模块还用于:根据所述第一模型训练参数,确定参与所述第一模型训练的第一终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第一模型训练的第一终端;所述收发模块还用于:将用于所述第一模型训练的全局模型下发给所述第一终端以进行训练,并接收所述第一终端对所述全局模型训练后上传的第二模型训练参数;所述聚合模块还用于:执行模型聚合以生成用于第二模型训练的全局模型;所述选择模块还用于:根据所述第二模型训练参数,确定参与所述第二模型训练的第二终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第二模型训练的第二终端,将用于所述第二模型训练的全局模型下发给所述第二终端以进行训练,直至模型精度收敛。
本公开第五方面实施例提供一种通信设备,其中,包括:收发器;存储器;处理器,分别与收发器及存储器连接,配置为通过执行存储器上的计算机可执行指令,控制收发器的无线信号收发,并能够实现上述第一方面实施例或第二方面实施例中任一项所述的方法。
本公开第六方面实施例提供一种计算机存储介质,其中,计算机存储介质存储有计算机可执行指令;计算机可执行指令被处理器执行后,能够实现本公开第一方面或第二方面实施例所述的方法。
根据本公开的终端选择方法,网络设备接收至少一个终端发送的模型训练参数,其中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,模型训练参数为至少一个终端对全局模型进行第一模型训练后得到的;基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。本公开的方案综合考虑终端能耗和数据质量等因素,从而有偏向地选择每轮全局迭代中最适合的终端,以平衡边缘终端的能耗和FL的学习精度与收敛速度。
本公开附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本公开的实践了解到。
附图说明
本公开上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1为根据本公开实施例的一种终端选择方法的流程示意图;
图2为根据本公开实施例的一种传输配置方法的流程示意图;
图3为根据本公开实施例的一种模型训练方法的流程示意图;
图4为根据本公开实施例的一种终端选择装置的示意框图;
图5为根据本公开实施例的一种模型训练装置的示意框图;
图6为根据本公开实施例的一种通信设备的结构示意图;
图7为本公开实施例提供的一种芯片的结构示意图。
具体实施方式
下面详细描述本公开的实施例,实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本公开,而不能理解为对本公开的限制。
随着移动网络通信技术的不断演进,从第五代移动通信技术(Fifth Generation,5G)到第六代移动通信技术(Sixth Generation,6G),万物互联是典型趋势,AI将成为未来通信的核心技术之一,6G和AI的典型应用场景有超过80%的重叠,两者深度融合。此外,6G网络的规模覆盖将为AI提供无所不在的承载空间,解决AI技术落地缺乏载体和通道的巨大痛点,极大地促进了AI产业的发展和繁荣。
万物互联使得所有的终端设备都可以作为智能体进行分析和计算,形成智能网络。手机、笔记本电脑、传感器等各类终端设备在本地会产生大量的异构数据,这些数据如果能够被网络有效利用,将会对于智能分析、资源优化等产生积极贡献。在过去传统的以云服务器为中心的计算方式中,用户数据通常被统一发送到中央服务器进行计算和存储,这种方式不仅会产生诸多的安全隐私问题,而且在传输过程 中会出现极大的能耗、时延等开销,这与6G绿色通信的理念相违背,同时会占用较大的带宽,妨碍网络中其他任务队列的正常工作。随着移动边缘计算和联邦学习技术的普及以及终端设备的计算能力增强,更多的计算任务被放在了边缘服务器侧或用户本地侧进行计算。这样的网络智能化架构带来了更多的编排部署的可行性。
在联邦学习中,终端使用其本地数据来训练并更新服务器所需的机器学习(Machine Leaning,ML)模型。终端设备将模型更新参数而不是原始数据发送到服务器进行聚合,服务器再将聚合后的模型下发给终端,如此重复的迭代多轮直到模型收敛。
然而,由于相关设备上的数据分布有偏差,FL在非独立同分布(Non-IID)场景中固有地会导致低分类准确度问题。尽管已经提出了各种方法来解决这个问题,如个性化本地损失函数,模型蒸馏等,但缺乏对于终端数据分布、局部模型贡献度以及通信开销的综合考量,这些方面对于确定全局模型聚合的质量极为重要。不同的训练数据分布使得终端的局部更新相互偏离,因此聚合的全局模型通常是有偏差的。与此同时,计算能力有限或信道条件差的落后设备会显着减慢模型聚合速度。由于带宽资源稀缺以及终端设备的能量预算有限,让所终端设备都参与每一轮的训练是不切实际的,每次训练只需要部分客户参与,相较于随机选择每轮参与训练的终端的方法,如何综合考虑终端的能耗、功率分配和客户数据质量等因素,从而有偏向地选择每轮全局迭代中最适合的客户以平衡边缘终端的能耗和FL的学习精度与收敛速度是十分重要的。
因此,本公开旨在解决相关技术中没有提出相应的终端(或言之,客户端)选择方案,即怎么在训练过程中根据模型更新情况动态地选择每轮参与的终端,来平衡数据分布,减少终端non-iid数据对于模型聚合效果的影响的问题。
本发明目的在于综合考虑FL训练过程中各个终端在本地计算和模型传输过程中产生的能量开销和时延情况,以及终端本地的数据质量,分析选择出每轮参与训练的终端集合以及相应的功率分配方案,从而实现灵活高效的联邦学习训练,在模型快速收敛的同时也能最大限度地减少能耗开销。
为此,本公开提出了一种终端选择方法、模型训练方法、装置及系统,提供了一种综合考虑终端能耗和数据质量等因素,从而有偏向地选择每轮全局迭代中最适合的终端,以平衡边缘终端的能耗和FL的学习精度与收敛速度。
可以理解的是,本公开提供的方案可以用于第五代移动通信技术(Fifth Generation,5G)及其后续通信技术,诸如第五代移动通信技术演进(5G-advanced)、第六代移动通信技术(Sixth Generation,6G)等,在本公开中不予限制。
下面结合附图对本申请所提供的方案进行详细介绍。
图1示出了根据本公开实施例的一种终端选择方法的流程示意图。如图1所示,该方法由网络设备执行,网络设备可以是服务器。如图1所示,该方法包括以下步骤。
S101,接收至少一个终端发送的模型训练参数。
应当理解的是,本公开所述的方案针对FL过程中的多轮模型训练和迭代过程,对于每一轮训练过程,服务器都可以动态地根据上一轮服务器所选的终端情况,综合考虑各终端能耗以及数据质量,来选择本轮参与训练的终端。因此,本公开的实施例中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,其中,第一模型训练为上一轮模型训练。换言之,针对当前轮次的训练过程,服务器接收的模型训练参数可以为上一轮训练中服务器选出的终端发送的。
本公开的实施例中,模型训练参数为至少一个终端对全局模型进行第一模型训练后得到的。该模型训练参数可以是数据质量分数、CPU频率、信道增益、电池电量等参数,只要用于模型训练和迭代的参数均落入本公开的范围内,在本公开中不予限制。
S102,基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数。
在本公开的实施例中,数据质量参数能够用于评判终端本地的数据质量,能耗参数能够用于评判终端与服务器之间的信息传输能耗以及终端本地进行模型训练的能耗。本实施例对数据质量参数和能耗参数的种类及形式不作限定。
在本公开的实施例中,第二模型训练为本轮模型训练,即当前轮次的模型训练,与第一模型训练为相对概念,其中“第一”、“第二”仅作为名称区分,不代表对本公开的限制。
S103,根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。
本公开中,服务器可以根据确定的数据质量参数以及能耗参数,选出适合参与第二模型训练的终端,从而在后续的模型训练过程中将全局模型下发给选出的终端以进行本轮训练。当本轮训练完成后,直至模型收敛前,可进行下一轮训练,在下一轮训练中,服务器可再次动态地根据终端反馈的模型训练参数,选择参与下一轮训练的终端,直至最后模型收敛。
综上,根据本公开实施例提供的终端选择方法,网络设备接收至少一个终端发送的模型训练参数,其中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,模型训练参数为至少一个终端对全局模型进行第一模型训练后得到的;基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。本公开的方案综合考虑终端能耗和数据质量等因素,从而有偏向地选择每轮全局迭代中最适合的终端,以平衡边缘终端的能耗和FL的学习精度与收敛速度。
图2示出了根据本公开实施例的一种传输配置方法的流程示意图。基于图1所示实施例,如图2所示,该方法可以包括以下步骤。
S201,接收至少一个终端发送的模型训练参数。
应当理解,本公开针对迭代过程中的各轮训练,针对服务器首次建模的过程(或言之,初始化),服务器可直接请求部分或所有终端将其资源信息上传给服务器,例如数据质量分数、CPU频率、信道增益、电池电量等,服务器根据各个终端的资源信息选择参与第一轮训练的终端。针对第二轮及其后续训练过程,服务器根据上一轮选择的终端反馈的对全局模型进行训练后得到的模型训练参数选择参与本轮训练的终端。
上述步骤201的其他原理与步骤101的原理相同,再次不再赘述。
在本公开的实施例中,模型训练参数包括至少一个终端第一模型训练的模型权重。下面通过步骤S202-204对图1所示的步骤S102中确定数据质量参数的过程进行具体解释。
首先,假设每个终端的数据大小相同,将终端的数据异构性理解为类别不平衡,其中各个设备不遵循共同的数据分布,即设备中的数据分布是non-iid的。而这种数据异构性会极大的减弱模型的聚合质量,导致分类准确度低,本公开中,将这种差异称之为数据分布的偏度。由于测试精度由训练的权重决定,而每个终端的权重更新和其数据分布之间有一定的相关性。因此需要尽可能地在每轮中选择具有均 匀数据分布的设备参与训练,但是同时为了保证公平性,确保每个终端都有机会参与到训练,更需要关注整体数据的标签分布以及聚合平均后模型的权重情况,并及时做出调整。
因此,整个FL系统需要通过服务器的聚合更新来对整体训练过程进行反馈,通过聚合后的模型权重参数结果来推断信息。从而更有效地选择未来参与训练的终端。
S202,根据至少一个终端第一模型训练的模型权重以及至少一个终端的本地数据大小,确定第一模型训练的聚合模型权重。
应理解,每一轮训练都会经过服务器聚合模型→下发模型→终端训练模型→终端上报模型训练参数→服务器选择终端→服务器聚合模型→下发模型……的过程。在每一轮模型聚合之后,服务器可以根据聚合结果判断模型的权重更新方向,然后根据每一个终端上一轮上传的模型权重和本地数据大小计算聚合模型权重,如下式所示:
Figure PCTCN2022134498-appb-000001
其中,n表示第n个终端,其中n∈[1,N],N为正整数,t表示第t轮训练。
Figure PCTCN2022134498-appb-000002
为终端上报的第一模型训练的模型权重(即,可以理解为终端n上轮训练更新后的权重值),D n为终端的本地数据大小,w Global为聚合模型权重(即,可以理解为用各个终端处的数据训练更新后的权重再进行聚合平均后的结果)。
S203,根据模型权重和聚合模型权重之间的差异,确定权重散度。
本公开中,服务器根据模型权重和聚合模型暗中之间的差异,确定权重散度(WeightDivergence,WD),将权重散度作为数据质量参数以用于终端选择,如下式所示:
Figure PCTCN2022134498-appb-000003
其中,
Figure PCTCN2022134498-appb-000004
标识终端n第t轮训练的权重散度。
Figure PCTCN2022134498-appb-000005
一定程度上能够体现出模型应该修正的方向,从而便于服务器选择参与当前轮次训练的终端,因此本公开中所述的数据质量参数可以是权重散度。具体地,若当前聚合模型权重和某一终端的本地权重差别较大,即
Figure PCTCN2022134498-appb-000006
较大,那么在下一轮中该终端有更大的概率被选中。如果某一终端之前从未参与过训练,则本地模型权重设置为0,如果之前多次参与训练,则当前数据质量的判断取最近一轮更新的权重。本公开可以用
Figure PCTCN2022134498-appb-000007
来表示最近一轮(上一轮)更新的权重。
通过这种交互,能够实现服务器和终端之间的反馈,充分考虑各终端的数据质量来选择每轮参与训练的终端,从而更快的实现模型收敛。
在上述实施例的基础上,作为一种可选或优选的实施例,在本公开中,模型训练参数包括至少一个终端第一模型训练的本地数据分布值。
S204,对权重散度以及本地数据分布值进行加权合并,确定质量分数值。
为了提高所有终端被选择的概率的公平性,并且更多地利用具有特殊数据分布的终端来参与训练,本公开采取一个参数项来降低对模型贡献度有限的终端的被选择概率,根据上一轮模型的权重散度和本地数据分布来确定该参数项。本公开将该参数项称为质量分数值,以
Figure PCTCN2022134498-appb-000008
表示,将质量分数值欧威数据质量参数,用于衡量当前轮次各个终端的数据质量。
可以理解的是,在每个终端的样本数量一样的情况下,每一轮参与的终端的数据样本标签分布越均匀,训练结果的损失函数越小。或言之,当终端具有相似的数据时,它们的梯度也相似。因此
Figure PCTCN2022134498-appb-000009
应该与终端n的数据分布的本地数据分布(EMD值)成正比关系,其中,EMD可通过下式表示:
Figure PCTCN2022134498-appb-000010
其中,C表示标签类型的种类数量,
Figure PCTCN2022134498-appb-000011
表示终端n的第i种数据类型的数量占比,p y=i表示第i种类型数据的数量的全局比例。此外,出于客户的隐私安全角度考虑,服务器无法直接获取用户本地的数据分布情况,因此本公开中终端只需要每轮将本地数据分布的EMD值发送给服务器以保护其本地数据,同时又能告知服务器其分布情况。
综合上述各个因素,服务器在收到每个终端的EMD数值之后,将其与上一轮训练的
Figure PCTCN2022134498-appb-000012
进行加权合并,得到质量分数值
Figure PCTCN2022134498-appb-000013
Figure PCTCN2022134498-appb-000014
其中,P const为常数项,用来防止分数值为负,
Figure PCTCN2022134498-appb-000015
为比例因子,用来平衡各项之间的单位差异。
假设各个终端的本地数据标签分布是不变的,每次模型聚合完成后,下一次本地训练开始前,各个终端的上一轮训练后的模型权重参数被记录到服务器中以进行更新,便于下一轮各个终端的数据质量分数的比较。这一过程在整个训练过程中迭代多次。
换言之,本公开所述的数据质量参数可以是质量分数值,
Figure PCTCN2022134498-appb-000016
越高说明终端n对于当前轮的模型更新和收敛越有用。
因此,本公开通过加权合并得到的质量分数值选择终端,能够降低多次参与训练的终端的概率,给拥有极端数据分布的终端以参与机会。
在本公开的实施例中,模型训练参数包括至少一个终端第一模型训练的本地训练能耗。下面通过步骤S205对图1所示的步骤S102中确定能耗参数的过程进行具体解释。
S205,根据本地训练能耗以及至少一个终端上传模型训练参数的传输能耗,确定能耗参数。
为了最大化整体FL过程中终端的质量分数和能量消耗之间的比值,本公开通过E g表示第t轮训练中所有参与训练的终端的总能量损耗:
Figure PCTCN2022134498-appb-000017
其中,a n,t表示一个二进制指标,当终端n在第t轮被边缘服务器选择,则a n,t=1;否则,a n,t=0。
由于终端通常具有有限的能量预算,同时带宽也有限,因此每一轮FL中只有部分终端参与训练。每个终端的能耗包含两方面:一方面是从终端到边缘服务器的模型参数传输能耗,另一方面是终端本地模型计算的能耗。
1)模型计算能量消耗
对于第t轮中选定的终端n,由于本地训练和通过无线信道将本地更新上传到边缘服务器,它会产生能量消耗。对于每个终端n,令
Figure PCTCN2022134498-appb-000018
表示其在第t轮中的本地训练能耗,这取决于其计算架构、硬件和数据集。用c n表示第n个终端执行一个数据样本的CPU周期数,这是先验已知的。假设所有的数据样本都有相同的数据大小,即比特数,设Dn为终端n的数据量,则终端n运行一个本地epoch所需的CPU周期数为c nD n。U t,n表示第t轮全局迭代中终端n的本地训练轮数。用f n表示终端n的CPU 周期频率,β n为终端n的计算芯片的有效电容系数。那么终端n在第t轮全局迭代中的本地训练能耗公式如下:
Figure PCTCN2022134498-appb-000019
2)模型参数传输能耗
在本地训练完成后,采用正交频分多址(OFDMA)方案,用于本地模型上传,总带宽为B,在第t轮中,其中系统带宽被划分为等于所选择的终端的数目的n个子信道。OFDMA是一种基于OFDM实现的多址方案。在OFDMA中,不同的用户被分配了不同的子载波,因此多个用户可以同时传输他们的数据。
Figure PCTCN2022134498-appb-000020
为终端n在第t轮全局迭代中的带宽分配比率,第n个设备被分配的子信道带宽为
Figure PCTCN2022134498-appb-000021
Figure PCTCN2022134498-appb-000022
为整体终端的带宽分配,必须满足
Figure PCTCN2022134498-appb-000023
如果在第t轮中选择了终端n,那么本公开要求至少为终端n分配最小带宽比率bmin,即
Figure PCTCN2022134498-appb-000024
这是因为实际系统会由于资源块大小有限无法为单个终端分配任意小的带宽。
根据香农定理,终端n的可达传输速率定义为:
Figure PCTCN2022134498-appb-000025
其中B是带宽,N 0是高斯白噪声的功率谱密度,
Figure PCTCN2022134498-appb-000026
是终端n在第t轮的传输功率,G n是终端n与中央服务器之间的信道增益。
模型参数w n和梯度
Figure PCTCN2022134498-appb-000027
的数据大小为s n。假设Sn大小是恒定的,那么,每个终端在第t轮的传输本地模型更新的时间为:
Figure PCTCN2022134498-appb-000028
那么在第t轮模型上传过程中终端n的能耗为:
Figure PCTCN2022134498-appb-000029
那么终端n的总能耗为:
Figure PCTCN2022134498-appb-000030
下面通过步骤S206-S208对图1所示实施例中步骤S103的具体过程进行解释。
S206,根据数据质量参数以及能耗参数,建立目标函数。
本公开中,对联合任务调度和资源分配优化问题进行建模,以P1表示目标函数:
Figure PCTCN2022134498-appb-000031
其中,当迭代至第T轮时,模型收敛。
S207,建立约束条件,约束条件包括时间约束、能耗约束、带宽约束、终端处理器周期频率约束、传输功率约束以及终端选择决策约束中的至少一项。
本公开时间约束为:
Figure PCTCN2022134498-appb-000032
即,要求每轮迭代中最后完成模型训练和上传的终端所耗时间需要小于或等于截止时间Tmax。
能耗约束为:
Figure PCTCN2022134498-appb-000033
即,保证每轮参与训练的终端n消耗的能量要小于电池容量。
带宽约束为:
Figure PCTCN2022134498-appb-000034
即,保证所选终端的总带宽应在带宽容量内。
终端处理器周期频率约束为:
Figure PCTCN2022134498-appb-000035
即,满足各终端的CPU周期频率范围。
传输功率约束为:
Figure PCTCN2022134498-appb-000036
即,满足各终端在第t轮的传输功率的大小范围。
终端选择决策约束为:
Figure PCTCN2022134498-appb-000037
即终端选择的决策,为二进制变量,当终端n在第t轮被边缘服务器选择,则a n,t=1;否则,a n,t=0。
S208,根据深度确定性策略梯度DDPG算法,在约束条件下对目标函数进行求解,以从多个终端中选出参与第二模型训练的终端。
具体地,该步骤包括:以马尔科夫四元组表示目标函数,其中,马尔科夫四元组包括状态空间、动作空间、奖励函数以及转移概率;在策略-评价网络下建立策略梯度和损失函数,确定最优状态-动作对,以从多个终端中选出参与第二模型训练的终端。
其中,状态空间包括数据质量参数、上行链路的信道增益、终端到网络设备的信道带宽、以及终端的剩余能量;动作空间包括终端选择参数、以及被选择终端的传输功率;奖励函数为参与第二模型训练的终端的数据质量参数之和与能耗之比;转移概率为状态转移概率。
下面具体对这一算法进行具体解释。
由于P1是多个终端设备之间的协作问题,是一个MINLP问题(混合整数非线性规划问题),难以在多项式时间复杂度内获得其最优解。因此,利用一种在线优化方法,即深度强化学习方法来解决。由于动作空间由两个因素组成,一是终端选择,二是功率分配,因此既包含连续动作,又包含离散动作,传统的基于值的Q-learning算法不能够很好的解决该问题。因此这里本公开采用DDPG算法,这是一种确定性策略方法,能够有效的应对连续性动作决策。具体来说,本公开将公式化问题建模为马尔可夫决策过程,并定义其状态空间和动作空间。然后设计奖励函数来代表每一轮采用特定动作后所对应的数据质量分数和能耗的比值。
如上考虑的终端选择和资源分配问题可以表述为一个MDP过程(马尔科夫决策)。一个MDP可以用4元组(S,A,P,R)表示,其中S是状态空间,A是动作空间,P是状态转移概率,表示从状态S t转移到下一个状态S t+1的概率,R是奖励函数。
状态空间S:根据问题P1,MDP的状态空间包含数据质量分数,上行链路的信道增益,终端到服务器的信道带宽和IoT设备的剩余能量。第t轮的网络状态S t定义如下:
S t={(PD n,t,G n,t,b n,t,E n,t),n=1,...,N}.       (18)
动作空间A:MDP的动作为FL的终端选择α n,t,以及被选择设备的传输功率P n,t。其中Α∈{(α n,t,P n,t),n=1,...,N,α n,t∈{0,1}并且
Figure PCTCN2022134498-appb-000038
奖励函数R:令R(S t+1|S t,A t)表示在状态S t下采用动作A t∈Α收到的及时奖励。奖励定义为FL在当前轮所选设备的数据质量分数之和与能耗之比,即:
Figure PCTCN2022134498-appb-000039
表示从当前状态S t通过动作A t转移到下一个状态S t+1所获得的奖励。
然后,本公开用评估策略μ来评估选择的动作,其中μ是从各个状态到动作的映射。本公开的目标在于最大化预期的总奖励,即动作价值函数Q(S t,A tμ):
Figure PCTCN2022134498-appb-000040
其中γ∈[0,1]是未来状态的折扣因子。根据贝尔曼公式,一个状态-动作对(S t,A t)和后续的状态动作对(S t',A t')的的最优动作-值函数可以表示为:
Figure PCTCN2022134498-appb-000041
最优动作
Figure PCTCN2022134498-appb-000042
可以表示为:
Figure PCTCN2022134498-appb-000043
其中
Figure PCTCN2022134498-appb-000044
提供当前状态的最大的FL目标比率。
具体地,本公开使用DDPG算法来寻找问题P1的近似最优解。DDPG算法采用actor-critic网络结构,FL中的服务器作为此算法的决策者被训练在连续动作空间中优化动作,即终端设备的选择和功率分配,使得系统的奖励不断增大。
DDPG使用策略梯度方法来将网络state映射为一个特定的action。其中Actor网络μ(s|θ μ)确定性地将states映射到一个特定的连续动作,Critic网络被用逼近actor-value函数。DDPG算法在每个训练步骤采用经验回放机制来用一个寄存器B存储服务器采取的先前动作,当前状态,当前动作,奖励值以及下一个状态的信息。每次一个mini-batch的数据从经验池中被采样来训练actor-critic网络。每次的动作结果包含两部分,如上所述,即为FL的终端选择情况,以及被选择设备的传输功率。Α∈{(α n,t,P n,t),n=1,...,N},其中α n,t∈{0,1}并且
Figure PCTCN2022134498-appb-000045
在得到每次的动作结果后,由于从策略网络输出的终端选择动作是连续值,而具体的终端选择策略为二进制数值0或1,本公开通过设置阈值将连续值转化为二进制数值。
Figure PCTCN2022134498-appb-000046
对于连续终端选择和功率分配,目标在于找到最优策略π θ来最大化奖励函数的期望
Figure PCTCN2022134498-appb-000047
其中策略μ可以通过取期望回报相对于网络参数θ的梯度来进行更新,actor的策略梯度可以通过链式法则来进行计算。
Figure PCTCN2022134498-appb-000048
这是从起始分布J相对于actor网络的参数θ μ的预期收益的梯度,并在采样的mini-batch上取平均。
另外最优的动作值函数能够通过Critic网络Q(S t,A tQ)来近似。为了更新Q(S t,A tQ),Critic网络通过调整参数θ Q来最小化当前目标值与Q(S t,A tQ)之间的逼近误差,采用MSE来衡量这种误差。
Figure PCTCN2022134498-appb-000049
对于target网络,采用软更新的方式进行更新。引入学习率τ,将旧的目标网络参数θ Q、θ μ和新的对应网络参数θ Q'、θ μ'做加权平均,然后赋值给目标网络,这样保证了网路学习过程更加平稳,也一定程度上减少了过估计。
θ Q'←τθ Q+(1-τ)θ Q'
θ μ'←τθ μ+(1-τ)θ μ'      (26)
另外在训练Actor网络输出的动作阶段,需要在动作上加上噪声ε,以使得在训练阶段让智能体具备探索能力。
应当理解,每一轮全局迭代中,终端n总的时延主要包括终端n的本地模型训练时间和参数结果上传时间,由于下行链路带宽远大于上行链路带宽,因此服务器下发模型给终端的时间可以忽略不计。令
Figure PCTCN2022134498-appb-000050
Figure PCTCN2022134498-appb-000051
分别表示终端n在第t轮中的模型本地训练时间和结果上传时间。
Figure PCTCN2022134498-appb-000052
计算公式为:
Figure PCTCN2022134498-appb-000053
又如上所述,
Figure PCTCN2022134498-appb-000054
由于每个终端都是并行本地计算,一旦计算完成就开始传输模型参数,因此在第t轮FL中耗时遵循落后者效应,即:
Figure PCTCN2022134498-appb-000055
因此,本公开使用一种算法来选择终端,该算法可以总结为如下过程。
Figure PCTCN2022134498-appb-000056
Figure PCTCN2022134498-appb-000057
图3示出了根据本公开实施例的一种模型训练方法的流程示意图。该方法可以由网络设备执行,具体地,网络设备可以是服务器。
如图3所示,该方法可以包括以下步骤。
S301,使用公共数据创建初始全局模型。
本公开中,服务器可以适用公共数据创建初始全局模型。公共数据可以理解为θ10CIFA2数据集。以6万张图像为公共数据集为例,其中可以包括5万张图像的训练集和1万张图像的测试集,其可以在多个终端之间进行分配。
S302,接收多个终端发送的资源信息。
本公开中,服务可以发起资源请求:服务器请求各个终端将其资源信息(数据质量分数、CPU频率、信道增益、电池电量等)上传给服务器。
S303,基于所述资源信息,从所述多个终端中选择初始训练终端。
即,服务器可以进行联合优化终端选择:基于各个终端提供的信息,服务器在总带宽B范围内选择若干客户参与到模型训练中。
应当理解,初始训练终端为初始化后第一次进行模型训练的终端。
S304,将所述初始全局模型下发给所述初始训练终端,并接收所述初始训练终端对所述初始全局模型训练后上传的第一模型训练参数。
当完成选择后,服务器将全局模型下发给选中的终端。随后,被选择的终端使用其本地数据来训练共享的模型并将新的模型参数通过信道上传给服务器。
S305,执行模型聚合以生成用于第一模型训练的全局模型。
服务器使用Fedavg的方法聚合上传的模型参数,进而生成新的模型。
S306,根据所述第一模型训练参数,确定参与所述第一模型训练的第一终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第一模型训练的第一终端,将用于所述第一模型训练的全局模型下发给所述第一终端以进行训练。
S307,接收所述第一终端对所述全局模型训练后上传的第二模型训练参数。
S308,执行模型聚合以生成用于第二模型训练的全局模型;
S309,根据所述第二模型训练参数,确定参与所述第二模型训练的第二终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第二模型训练的第二终端,将用于所述第二模型训练的全局模型下发给所述第二终端以进行训练,直至模型精度收敛。
上述步骤重复直至模型收敛。
因此,基于上述终端的选择算法,整体联邦学习训练流程如下:
Figure PCTCN2022134498-appb-000058
综上,本公开提供的模型训练方法通过设计一种联邦学习模型中终端选择与资源分配的策略,专门针对FL系统中设备数据分布异构的情况。为减少规划的时间复杂度,使用最新的深度强化学习算法DDPG来生成每轮的动作,其中包括终端的选择情况以及终端的功率分配,使得FL系统不仅能够在有限的时间期限内尽可能高的提高模型精度、加速收敛,而且能够综合多方面考虑,使得系统能够在终端设备的剩余电量约束和传输带宽的约束下满足模型精度要求的同时最小化能耗,符合绿色通信的发展理念。
上述本申请提供的实施例中,对本申请实施例提供的方法进行了介绍。为了实现上述本申请实施例提供的方法中的各功能,网络设备可以包括硬件结构、软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能可以以硬件结构、软件模块、或者硬件结构加软件模块的方式来执行。
与上述几种实施例提供的终端选择方法相对应,本公开还提供一种终端选择装置,由于本公开实施例提供的终端选择装置与上述几种实施例提供的终端选择方法相对应,因此终端选择方法的实施方式也适用于本实施例提供的终端选择装置,在本实施例中不再详细描述。
图4为本公开实施例提供的一种终端选择装置400的结构示意图,如图4所示,该装置400可以包括:接收模块410,用于接收至少一个终端发送的模型训练参数,其中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,模型训练参数为至少一个终端 对全局模型进行第一模型训练后得到的;确定模块420,用于基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;选择模块430,用于根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。
在一些实施例中,模型训练参数包括至少一个终端第一模型训练的模型权重,确定模块420用于:根据模型权重以及至少一个终端的本地数据大小,确定第一模型训练的聚合模型权重;根据模型权重和聚合模型权重之间的差异,确定权重散度,权重散度作为数据质量参数。
在一些实施例中,模型训练参数包括至少一个终端第一模型训练的本地数据分布值,确定模块420用于:对权重散度以及本地数据分布值进行加权合并,确定质量分数值,质量分数值作为数据质量参数。
在一些实施例中,模型训练参数包括至少一个终端第一模型训练的本地训练能耗,确定模块420用于:根据本地训练能耗以及至少一个终端上传模型训练参数的传输能耗,确定能耗参数。
在一些实施例中,选择模块430用于:根据数据质量参数以及能耗参数,建立目标函数;建立约束条件,约束条件包括时间约束、能耗约束、带宽约束、终端处理器周期频率约束、传输功率约束以及终端选择决策约束中的至少一项;根据深度确定性策略梯度DDPG算法,在约束条件下对目标函数进行求解,以从多个终端中选出参与第二模型训练的终端。
在一些实施例中,选择模块430用于:以马尔科夫四元组表示目标函数,其中,马尔科夫四元组包括状态空间、动作空间、奖励函数以及转移概率;在策略-评价网络下建立策略梯度和损失函数,确定最优状态-动作对,以从多个终端中选出参与第二模型训练的终端。
在一些实施例中,状态空间包括数据质量参数、上行链路的信道增益、终端到网络设备的信道带宽、以及终端的剩余能量;动作空间包括终端选择参数、以及被选择终端的传输功率;奖励函数为参与第二模型训练的终端的数据质量参数之和与能耗之比;转移概率为状态转移概率。
综上,根据本公开实施例提供的终端选择装置,网络设备接收至少一个终端发送的模型训练参数,其中,至少一个终端为网络设备从多个终端中选出用于对网络设备下发的全局模型进行第一模型训练的终端,模型训练参数为至少一个终端对全局模型进行第一模型训练后得到的;基于模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;根据数据质量参数以及能耗参数,从多个终端中选出参与第二模型训练的终端。本公开的方案综合考虑终端能耗和数据质量等因素,从而有偏向地选择每轮全局迭代中最适合的终端,以平衡边缘终端的能耗和FL的学习精度与收敛速度。
与上述几种实施例提供的终端选择方法相对应,本公开还提供一种终端选择装置,由于本公开实施例提供的终端选择装置与上述几种实施例提供的终端选择方法相对应,因此终端选择方法的实施方式也适用于本实施例提供的终端选择装置,在本实施例中不再详细描述。
图5为本公开实施例提供的一种模型训练装置500的结构示意图,如图5所示,该装置500可以包括:创建模块510,用于使用公共数据创建初始全局模型;收发模块520,用于接收多个终端发送的资源信息;选择模块530,用于基于所述资源信息,从所述多个终端中选择初始训练终端;所述收发模块520还用于:将所述初始全局模型下发给所述初始训练终端,并接收所述初始训练终端对所述初始全局模型训练后上传的第一模型训练参数;聚合模块540,用于执行模型聚合以生成用于第一模型训练的全局模型;所述选择模块530还用于:根据所述第一模型训练参数,确定参与所述第一模型训练的第一终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进 行所述第一模型训练的第一终端;所述收发模块520还用于:将用于所述第一模型训练的全局模型下发给所述第一终端以进行训练,并接收所述第一终端对所述全局模型训练后上传的第二模型训练参数;所述聚合模块540还用于:执行模型聚合以生成用于第二模型训练的全局模型;所述选择模块530还用于:根据所述第二模型训练参数,确定参与所述第二模型训练的第二终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第二模型训练的第二终端,将用于所述第二模型训练的全局模型下发给所述第二终端以进行训练,直至模型精度收敛。
综上,根据本公开实施例提供的模型训练装置,通过设计一种联邦学习模型中终端选择与资源分配的策略,专门针对FL系统中设备数据分布异构的情况。为减少规划的时间复杂度,使用最新的深度强化学习算法DDPG来生成每轮的动作,其中包括终端的选择情况以及终端的功率分配,使得FL系统不仅能够在有限的时间期限内尽可能高的提高模型精度、加速收敛,而且能够综合多方面考虑,使得系统能够在终端设备的剩余电量约束和传输带宽的约束下满足模型精度要求的同时最小化能耗,符合绿色通信的发展理念。
本申请提供一种通信系统,包括:网络设备和终端,其中:网络设备被配置为执行本公开基于图1-3实施例中所示终端选择方法或模型训练方法。
请参见图6,图6是本申请实施例提供的一种通信装置600的结构示意图。通信装置600可以是网络设备,也可以是用户设备,也可以是支持网络设备实现上述方法的芯片、芯片系统、或处理器等,还可以是支持用户设备实现上述方法的芯片、芯片系统、或处理器等。该装置可用于实现上述方法实施例中描述的方法,具体可以参见上述方法实施例中的说明。
通信装置600可以包括一个或多个处理器601。处理器601可以是通用处理器或者专用处理器等。例如可以是基带处理器或中央处理器。基带处理器可以用于对通信协议以及通信数据进行处理,中央处理器可以用于对通信装置(如,基站、基带芯片,终端设备、终端设备芯片,DU或CU等)进行控制,执行计算机程序,处理计算机程序的数据。
可选的,通信装置600中还可以包括一个或多个存储器602,其上可以存有计算机程序604,处理器601执行计算机程序604,以使得通信装置600执行上述方法实施例中描述的方法。可选的,存储器602中还可以存储有数据。通信装置600和存储器602可以单独设置,也可以集成在一起。
可选的,通信装置600还可以包括收发器605、天线606。收发器605可以称为收发单元、收发机、或收发电路等,用于实现收发功能。收发器605可以包括接收器和发送器,接收器可以称为接收机或接收电路等,用于实现接收功能;发送器可以称为发送机或发送电路等,用于实现发送功能。
可选的,通信装置600中还可以包括一个或多个接口电路607。接口电路607用于接收代码指令并传输至处理器601。处理器601运行代码指令以使通信装置600执行上述方法实施例中描述的方法。
在一种实现方式中,处理器601中可以包括用于实现接收和发送功能的收发器。例如该收发器可以是收发电路,或者是接口,或者是接口电路。用于实现接收和发送功能的收发电路、接口或接口电路可以是分开的,也可以集成在一起。上述收发电路、接口或接口电路可以用于代码/数据的读写,或者,上述收发电路、接口或接口电路可以用于信号的传输或传递。
在一种实现方式中,处理器601可以存有计算机程序603,计算机程序603在处理器601上运行,可使得通信装置600执行上述方法实施例中描述的方法。计算机程序603可能固化在处理器601中,该种情况下,处理器601可能由硬件实现。
在一种实现方式中,通信装置600可以包括电路,该电路可以实现前述方法实施例中发送或接收或者通信的功能。本申请中描述的处理器和收发器可实现在集成电路(Integrated Circuit,IC)、模拟IC、射频集成电路RFIC、混合信号IC、专用集成电路(Application Specific Integrated Circuit,ASIC)、印刷电路板(Printed CircuitBoard,PCB)、电子设备等上。该处理器和收发器也可以用各种IC工艺技术来制造,例如互补金属氧化物半导体(Complementary Metal Oxide Semiconductor,CMOS)、N型金属氧化物半导体(NMetal-Oxide-Semiconductor,NMOS)、P型金属氧化物半导体(Positive Channel Metal Oxide Semiconductor,PMOS)、双极结型晶体管(Bipolar Junction Transistor,BJT)、双极CMOS(BiCMOS)、硅锗(SiGe)、砷化镓(GaAs)等。
以上实施例描述中的通信装置可以是网络设备或者用户设备,但本申请中描述的通信装置的范围并不限于此,而且通信装置的结构可以不受图6的限制。通信装置可以是独立的设备或者可以是较大设备的一部分。例如该通信装置可以是:
(1)独立的集成电路IC,或芯片,或,芯片系统或子系统;
(2)具有一个或多个IC的集合,可选的,该IC集合也可以包括用于存储数据,计算机程序的存储部件;
(3)ASIC,例如调制解调器(Modem);
(4)可嵌入在其他设备内的模块;
(5)接收机、终端设备、智能终端设备、蜂窝电话、无线设备、手持机、移动单元、车载设备、网络设备、云设备、人工智能设备等等;
(6)其他等等。
对于通信装置可以是芯片或芯片系统的情况,可参见图7所示的芯片的结构示意图。图7所示的芯片包括处理器701和接口702。其中,处理器701的数量可以是一个或多个,接口702的数量可以是多个。
可选的,芯片还包括存储器703,存储器703用于存储必要的计算机程序和数据。
本领域技术人员还可以了解到本申请实施例列出的各种说明性逻辑块(illustrative logical block)和步骤(step)可以通过电子硬件、电脑软件,或两者的结合进行实现。这样的功能是通过硬件还是软件来实现取决于特定的应用和整个系统的设计要求。本领域技术人员可以对于每种特定的应用,可以使用各种方法实现的功能,但这种实现不应被理解为超出本申请实施例保护的范围。
本申请还提供一种可读存储介质,其上存储有指令,该指令被计算机执行时实现上述任一方法实施例的功能。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。计算机程序产品包括一个或多个计算机程序。在计算机上加载和执行计算机程序时,全部或部分地产生按照本申请实施例的流程或功能。计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。计算机程序可以存储在计算机可读存 储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,计算机程序可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(Digital Subscriber Line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,高密度数字视频光盘(Digital Video Disc,DVD))、或者半导体介质(例如,固态硬盘(Solid State Disk,SSD))等。
本领域普通技术人员可以理解:本申请中涉及的第一、第二等各种数字编号仅为描述方便进行的区分,并不用来限制本申请实施例的范围,也表示先后顺序。
本申请中的至少一个还可以描述为一个或多个,多个可以是两个、三个、四个或者更多个,本申请不做限制。在本申请实施例中,对于一种技术特征,通过“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”等区分该种技术特征中的技术特征,该“第一”、“第二”、“第三”、“A”、“B”、“C”和“D”描述的技术特征间无先后顺序或者大小顺序。
如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括终端和服务器。终端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有终端-服务器关系的计算机程序来产生终端和服务器的关系。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
此外,应该理解,本申请的各种实施例可以单独实施,也可以在方案允许的情况下与其他实施例组合实施。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
以上,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (12)

  1. 一种终端选择方法,其特征在于,所述方法由网络设备执行,所述方法包括:
    接收至少一个终端发送的模型训练参数,其中,所述至少一个终端为所述网络设备从多个终端中选出用于对所述网络设备下发的全局模型进行第一模型训练的终端,所述模型训练参数为所述至少一个终端对所述全局模型进行第一模型训练后得到的;
    基于所述模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;
    根据所述数据质量参数以及所述能耗参数,从所述多个终端中选出参与第二模型训练的终端。
  2. 根据权利要求1所述的方法,其特征在于,所述模型训练参数包括所述至少一个终端进行第一模型训练的模型权重,所述确定参与第二模型训练的终端的数据质量参数包括:
    根据所述模型权重以及所述至少一个终端的本地数据大小,确定第一模型训练的聚合模型权重;
    根据所述模型权重和所述聚合模型权重之间的差异,确定权重散度,将所述权重散度作为所述数据质量参数。
  3. 根据权利要求2所述的方法,其特征在于,所述模型训练参数包括所述至少一个终端进行第一模型训练的本地数据分布值,所述确定参与第二模型训练的终端的数据质量参数还包括:
    对所述权重散度以及所述本地数据分布值进行加权合并,确定质量分数值,将所述质量分数值作为所述数据质量参数。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述模型训练参数包括所述至少一个终端进行第一模型训练的本地训练能耗,所述确定参与第二模型训练的终端的能耗参数包括:
    根据所述本地训练能耗以及所述至少一个终端上传的所述模型训练参数的传输能耗,确定所述能耗参数。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述根据所述数据质量参数以及所述能耗参数,从所述多个终端中选出参与第二模型训练的终端包括:
    根据所述数据质量参数以及所述能耗参数,建立目标函数;
    建立约束条件,所述约束条件包括时间约束、能耗约束、带宽约束、终端处理器周期频率约束、传输功率约束以及终端选择决策约束中的至少一项;
    根据深度确定性策略梯度DDPG算法,在所述约束条件下对所述目标函数进行求解,以从所述多个终端中选出参与第二模型训练的终端。
  6. 根据权利要求5所述的方法,其特征在于,所述根据DDPG算法,在所述约束条件对所述目标函数进行求解,以从所述多个终端中选出参与第二模型训练的终端包括:
    以马尔科夫四元组表示所述目标函数,其中,所述马尔科夫四元组包括状态空间、动作空间、奖励函数以及转移概率;
    在策略-评价网络下建立策略梯度和损失函数,确定最优状态-动作对,以从所述多个终端中选出参与第二模型训练的终端。
  7. 根据权利要求6所述的方法,其特征在于,所述状态空间包括所述数据质量参数、上行链路的信道增益、终端到所述网络设备的信道带宽、以及终端的剩余能量;所述动作空间包括终端选择参数、以及被选择终端的传输功率;所述奖励函数为参与第二模型训练的终端的数据质量参数之和与能耗之比;所述转移概率为状态转移概率。
  8. 一种模型训练方法,其特征在于,所述方法由网络设备执行,所述方法包括:
    使用公共数据创建初始全局模型;
    接收多个终端发送的资源信息;
    基于所述资源信息,从所述多个终端中选择初始训练终端;
    将所述初始全局模型下发给所述初始训练终端,并接收所述初始训练终端对所述初始全局模型训练后上传的第一模型训练参数;
    执行模型聚合以生成用于第一模型训练的全局模型;
    根据所述第一模型训练参数,确定参与所述第一模型训练的第一终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第一模型训练的第一终端,将用于所述第一模型训练的全局模型下发给所述第一终端以进行训练;
    接收所述第一终端对所述全局模型训练后上传的第二模型训练参数;
    执行模型聚合以生成用于第二模型训练的全局模型;
    根据所述第二模型训练参数,确定参与所述第二模型训练的第二终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第二模型训练的第二终端, 将用于所述第二模型训练的全局模型下发给所述第二终端以进行训练,直至模型精度收敛。
  9. 一种终端选择装置,其特征在于,所述装置包括:
    接收模块,用于接收至少一个终端发送的模型训练参数,其中,所述至少一个终端为所述网络设备从多个终端中选出用于对所述网络设备下发的全局模型进行第一模型训练的终端,所述模型训练参数为所述至少一个终端对所述全局模型进行第一模型训练后得到的;
    确定模块,用于基于所述模型训练参数,确定参与第二模型训练的终端的数据质量参数以及能耗参数;
    选择模块,用于根据所述数据质量参数以及所述能耗参数,从所述多个终端中选出参与第二模型训练的终端。
  10. 一种模型训练装置,其特征在于,所述装置包括:
    创建模块,用于使用公共数据创建初始全局模型;
    收发模块,用于接收多个终端发送的资源信息;
    选择模块,用于基于所述资源信息,从所述多个终端中选择初始训练终端;
    所述收发模块还用于:将所述初始全局模型下发给所述初始训练终端,并接收所述初始训练终端对所述初始全局模型训练后上传的第一模型训练参数;
    聚合模块,用于执行模型聚合以生成用于第一模型训练的全局模型;
    所述选择模块还用于:根据所述第一模型训练参数,确定参与所述第一模型训练的第一终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第一模型训练的第一终端;
    所述收发模块还用于:将用于所述第一模型训练的全局模型下发给所述第一终端以进行训练,并接收所述第一终端对所述全局模型训练后上传的第二模型训练参数;
    所述聚合模块还用于:执行模型聚合以生成用于第二模型训练的全局模型;
    所述选择模块还用于:根据所述第二模型训练参数,确定参与所述第二模型训练的第二终端的数据质量参数以及能耗参数,根据所述数据质量参数以及所述能耗参数,从所述多个终端中确定进行所述第二模型训练的第二终端,将用于所述第二模型训练的全局模型下发给所述第二终端以进行训练,直至模型精度收敛。
  11. 一种通信设备,其中,包括:收发器;存储器;处理器,分别与所述收发器及所述存储器连接,配置为通过执行所述存储器上的计算机可执行指令,控制所述收发器的无线信号收发,并能够实现权利要求1-8中任一项所述的方法。
  12. 一种计算机存储介质,其中,所述计算机存储介质存储有计算机可执行指令;所述计算机可执行指令被处理器执行后,能够实现权利要求1-8中任一项所述的方法。
PCT/CN2022/134498 2022-11-25 2022-11-25 终端选择方法、模型训练方法、装置及系统 WO2024108601A2 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/134498 WO2024108601A2 (zh) 2022-11-25 2022-11-25 终端选择方法、模型训练方法、装置及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/134498 WO2024108601A2 (zh) 2022-11-25 2022-11-25 终端选择方法、模型训练方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2024108601A2 true WO2024108601A2 (zh) 2024-05-30

Family

ID=91195051

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/134498 WO2024108601A2 (zh) 2022-11-25 2022-11-25 终端选择方法、模型训练方法、装置及系统

Country Status (1)

Country Link
WO (1) WO2024108601A2 (zh)

Similar Documents

Publication Publication Date Title
US11483374B2 (en) Simultaneous optimization of multiple TCP parameters to improve download outcomes for network-based mobile applications
CN113242568B (zh) 一种不确定网络环境中的任务卸载和资源分配方法
CN110928654B (zh) 一种边缘计算系统中分布式的在线任务卸载调度方法
CN112105062B (zh) 时敏条件下移动边缘计算网络能耗最小化策略方法
WO2023087442A1 (zh) 数字孪生网络低时延高可靠传输方法、装置、设备及介质
CN112395090B (zh) 一种移动边缘计算中服务放置的智能混合优化方法
CN111182509B (zh) 一种基于上下文感知学习的泛在电力物联网接入方法
Tang et al. Dependent task offloading for multiple jobs in edge computing
Tang et al. Multi-dimensional auction mechanisms for crowdsourced mobile video streaming
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
WO2023175335A1 (en) A time-triggered federated learning algorithm
Yin et al. Joint user scheduling and resource allocation for federated learning over wireless networks
US11632713B2 (en) Network capability exposure method and device thereof
Tao et al. Drl-driven digital twin function virtualization for adaptive service response in 6g networks
CN113778675A (zh) 一种基于面向区块链网络的计算任务分配系统及方法
CN112231009B (zh) 一种能量捕获网络模型任务计算卸载决策与调度方法
WO2024108601A2 (zh) 终端选择方法、模型训练方法、装置及系统
WO2024011376A1 (zh) 人工智能ai网络功能服务的任务调度方法及装置
CN117202264A (zh) Mec环境中面向5g网络切片的计算卸载方法
CN116384513A (zh) 云边端协同学习系统及方法
CN116560832A (zh) 面向联邦学习的资源分配方法以及相关设备
CN114615705B (zh) 一种基于5g网络下单用户资源分配策略方法
TWI792784B (zh) 基於聯邦強化學習的邊緣計算卸載優化方法及通信系統
WO2024138398A1 (zh) 模型训练方法及装置
He et al. Confect: Computation offloading for tasks with hard/soft deadlines in edge computing