WO2024026583A1 - 一种通信方法和通信装置 - Google Patents

一种通信方法和通信装置 Download PDF

Info

Publication number
WO2024026583A1
WO2024026583A1 PCT/CN2022/109277 CN2022109277W WO2024026583A1 WO 2024026583 A1 WO2024026583 A1 WO 2024026583A1 CN 2022109277 W CN2022109277 W CN 2022109277W WO 2024026583 A1 WO2024026583 A1 WO 2024026583A1
Authority
WO
WIPO (PCT)
Prior art keywords
terminal device
base station
distribution
importance weight
data
Prior art date
Application number
PCT/CN2022/109277
Other languages
English (en)
French (fr)
Inventor
朱哲祺
樊平毅
彭程晖
王飞
宛烁
卢嘉勋
Original Assignee
华为技术有限公司
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司, 清华大学 filed Critical 华为技术有限公司
Priority to PCT/CN2022/109277 priority Critical patent/WO2024026583A1/zh
Publication of WO2024026583A1 publication Critical patent/WO2024026583A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F30/00Computer-aided design [CAD]
    • G06F30/20Design optimisation, verification or simulation
    • G06F30/27Design optimisation, verification or simulation using machine learning, e.g. artificial intelligence, neural networks, support vector machines [SVM] or training a model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of communication, and in particular, to a communication method and a communication device.
  • federated learning is generally regarded as a type of distributed system with high communication efficiency due to its concise model training paradigm and parameter transfer rules.
  • the core idea of federated learning is to conduct distributed model training among multiple data sources with local data, without exchanging local individual or sample data, and only by exchanging model parameters or intermediate results to build a model based on Global model under virtual fused data.
  • non-independent and identically distributed local user data is considered to be a key challenge faced by federated learning.
  • the non-IID characteristics of local data at the distribution level are mainly manifested in two aspects.
  • Second, the distribution of local data categories among distributed users (terminal devices) is inconsistent.
  • the former affects the training effect of the local model.
  • Unbalanced sample category distribution can easily lead to overfitting and poor generalization performance on the test set.
  • the latter causes the local models of each distributed user to show different training trends and tendencies, resulting in poor accuracy and convergence of the global model after federation integration.
  • Embodiments of the present application provide a communication method and communication device.
  • the local data of each terminal equipment participating in the federated training is not independent. Identical distribution brings problems such as poor model accuracy and slow convergence.
  • the importance weight derived based on relevant theories not only takes into account the uneven distribution of samples of different categories, but also takes into account the gradient Lipschitz coefficient that affects the convergence of the model, so that the pattern features of samples of different categories are learned by the local model at the same time. It also ensures that the deviation between local models is controllable.
  • a communication method is provided.
  • the base station determines to calculate a first importance weight, where the first importance weight is the importance weight of each category of data samples in multiple categories of data samples corresponding to the terminal device;
  • the base station sends first instruction information to the terminal device.
  • the first instruction information is used to instruct the terminal device to send a first parameter.
  • the first parameter is used by the base station to calculate the first importance weight; the base station receives the first parameter sent by the terminal device. ;
  • the base station calculates the first importance weight according to the first parameter; the base station sends the first importance weight to the terminal device.
  • the communication method provided by the embodiments of this application solves the problem of non-independent and identically distributed local data of each terminal device participating in federated training by introducing the importance weight sampling of sample data of each category of the terminal device and the interaction mechanism between the base station and the terminal device.
  • the model has problems such as poor accuracy and slow convergence.
  • the importance weight derived based on relevant theories not only takes into account the uneven distribution of samples of different categories, but also takes into account the gradient Lipschitz coefficient that affects the convergence of the model, so that the pattern features of samples of different categories are learned by the local model at the same time. It also ensures that the deviation between local models is controllable.
  • the update of the importance weight is deployed on the base station, which can fully utilize the computing power advantages of the base station and liberate the computing power of the terminal equipment.
  • the first parameter includes distribution of local data of the terminal device.
  • the first parameter includes the distribution of local data of the terminal device and the gradient corresponding to the local data.
  • the base station determines the distribution of global data, and the distribution of the global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training;
  • the base station calculates the first importance weight based on the first parameter, including:
  • the base station calculates the first importance weight according to the distribution of the first parameter and global data.
  • the first importance weight satisfies the following conditions:
  • the p N, n represents the distribution of global data
  • the distribution of the global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training
  • the p n is the distribution of the local data corresponding to the terminal device.
  • the distribution of local data the N represents the number of the plurality of terminal devices
  • the n represents that the terminal device is the n-th terminal device
  • the ⁇ N,n is the gradient Shipley of each category tz deviation
  • ⁇ N is The smallest non-negative value in .
  • a communication method including: a terminal device receiving first indication information sent by a base station, the first indication information being used to instruct the terminal device to send a first parameter, and the first parameter being used to instruct the terminal device to send a first parameter.
  • the base station calculates the first importance weight; the terminal device sends the first parameter to the base station according to the first instruction information; the terminal device receives the first importance weight sent by the base station; the terminal device trains the next local model according to the first importance weight.
  • the communication method provided by the embodiments of this application solves the problem of non-independent and identically distributed local data of each terminal device participating in federated training by introducing the importance weight sampling of sample data of each category of the terminal device and the interaction mechanism between the base station and the terminal device.
  • the model has problems such as poor accuracy and slow convergence.
  • the importance weight derived based on relevant theories not only takes into account the uneven distribution of samples of different categories, but also takes into account the gradient Lipschitz coefficient that affects the convergence of the model, so that the pattern features of samples of different categories are learned by the local model at the same time. It also ensures that the deviation between local models is controllable.
  • the update of the importance weight is deployed on the base station, which can fully utilize the computing power advantages of the base station and liberate the computing power of the terminal equipment.
  • the first parameter includes distribution of local data of the terminal device.
  • the first parameter includes the distribution of local data of the terminal device and the gradient corresponding to the local data.
  • the first importance weight satisfies the following conditions:
  • the p N, n represents the distribution of global data
  • the distribution of the global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training
  • the p n is the distribution of the local data corresponding to the terminal device.
  • the distribution of local data the N represents the number of the plurality of terminal devices
  • the n represents that the terminal device is the n-th terminal device
  • the ⁇ N,n is the gradient Shipley of each category tz deviation
  • ⁇ N is The smallest non-negative value in .
  • a communication method including: the base station sends second indication information to a terminal device, the second indication information instructs the terminal device to send a second parameter, the second parameter is used to calculate a second importance weight, and the second important
  • the sexual weight is the importance weight of each category of data samples in multiple categories of data samples corresponding to the terminal device; the base station receives the second parameter sent by the terminal device; the base station determines that the second importance weight is calculated by the terminal device; The base station sends third indication information to the terminal device, and the third indication information instructs the terminal device to calculate the second importance weight.
  • the communication method provided by the embodiments of this application can locally complete the update of importance weights for terminal devices that have surplus computing power or are not very timely for federation updates, reducing the pressure on system scheduling and further improving overall efficiency.
  • the second parameter includes distribution of local data of the terminal device.
  • the second parameter includes the distribution of local data of the terminal device and the gradient corresponding to the local data.
  • the base station determines the distribution of global data, and the distribution of the global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training; the base station transmits data to the terminals Devices send distribution of global data.
  • the second importance weight satisfies the following conditions:
  • p N, n represents the distribution of global data.
  • the distribution of global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training.
  • p n is the distribution of local data corresponding to the terminal device.
  • N represents The number of multiple terminal devices, n indicates that the terminal device is the nth terminal device, ⁇ N, n is the gradient Shiplitz deviation of each category, ⁇ N is The smallest non-negative value in .
  • a communication method including: a terminal device receiving second indication information sent by a base station, the second indication information instructing the terminal device to send a second parameter to the base station, and the second parameter is used to calculate a second importance weight,
  • the second importance weight is the importance weight of each category of data samples in multiple categories of data samples corresponding to the terminal device; the terminal device sends the second parameter to the base station according to the second instruction information; the terminal device receives the third parameter sent by the base station.
  • Three indication information, the third indication information instructs the terminal device to calculate the second importance weight; the terminal device calculates the second importance weight according to the second parameter; and the terminal device trains the next local model according to the second importance weight.
  • the communication method provided by the embodiments of this application can locally complete the update of importance weights for terminal devices that have surplus computing power or are not very timely for federation updates, reducing the pressure on system scheduling and further improving overall efficiency.
  • the second parameter includes the distribution of local data of the terminal device.
  • the second parameter includes the distribution of local data of the terminal device and the gradient corresponding to the local data.
  • the base station determines the distribution of global data, and the distribution of the global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training; the base station transmits data to the terminals Devices send distribution of global data.
  • the second importance weight satisfies the following conditions:
  • p N, n represents the distribution of global data.
  • the distribution of global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training.
  • p n is the distribution of local data corresponding to the terminal device.
  • N represents The number of multiple terminal devices, n indicates that the terminal device is the nth terminal device, ⁇ N, n is the gradient Shiplitz deviation of each category, ⁇ N is The smallest non-negative value in .
  • a method for calculating importance weights including: the first device calculates the gradient Lip of each category of data samples in multiple categories of data samples in the terminal device according to the distribution of local data of the terminal device. Schitz coefficient; a first device calculates an importance weight of a data sample of each of the plurality of categories of data samples based on a gradient Lipschitz coefficient of the data sample of each of the plurality of categories of data samples, the plurality of categories The importance weights of the data samples are used by the terminal device to train the local model for the next time.
  • the first device is a base station.
  • the first device is a terminal device.
  • the importance weight satisfies the following conditions:
  • p N, n represents the distribution of global data.
  • the distribution of global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training.
  • p n is the distribution of local data corresponding to the terminal device.
  • N represents The number of multiple terminal devices, n indicates that the terminal device is the nth terminal device, ⁇ N, n is the gradient Shiplitz deviation of each category, ⁇ N is The smallest non-negative value in .
  • a communication device including a unit for executing each step of the communication method in the above-mentioned first or third aspect and its implementation.
  • the communication device is a communication chip, which may include an input circuit or interface for sending information or data, and an output circuit or interface for receiving information or data.
  • the communication device is a communication device (eg, a base station, etc.), and the communication chip may include a transmitter for sending information, and a receiver for receiving information or data.
  • the communication chip may include a transmitter for sending information, and a receiver for receiving information or data.
  • a communication device including a unit for executing each step of the communication method in the above-mentioned second or fourth aspect and its implementation.
  • the communication device is a communication chip, which may include an input circuit or interface for sending information or data, and an output circuit or interface for receiving information or data.
  • the communication device is a communication device (eg, terminal equipment, etc.), and the communication chip may include a transmitter for sending information, and a receiver for receiving information or data.
  • the communication chip may include a transmitter for sending information, and a receiver for receiving information or data.
  • a communication device including a processor and a memory.
  • the memory is used to store a computer program.
  • the processor is used to call and run the computer program from the memory, so that the communication device executes the above-mentioned first aspect to The communication method in the fourth aspect and its various implementations.
  • processors there are one or more processors and one or more memories.
  • the memory may be integrated with the processor, or the memory may be provided separately from the processor.
  • the communication device also includes a transmitter (transmitter) and a receiver (receiver).
  • a computer program product includes: a computer program (which may also be called a code, or an instruction).
  • a computer program which may also be called a code, or an instruction.
  • the computer program When the computer program is run, it causes the computer to execute the above-mentioned first aspect to the third aspect. Communication methods in the four aspects and their respective implementations.
  • a computer-readable medium stores a computer program (which may also be called a code, or an instruction), and when run on a computer, causes the computer to execute the above-mentioned first aspect to the third aspect.
  • a chip system including a memory and a processor.
  • the memory is used to store a computer program.
  • the processor is used to call and run the computer program from the memory, so that the communication device installed with the chip system executes The communication methods in the above first to fourth aspects and their respective implementations.
  • the chip system may include an input circuit or interface for sending information or data, and an output circuit or interface for receiving information or data.
  • Figure 1 is a schematic diagram of the traditional federated learning framework.
  • Figure 2 is a schematic diagram of the system architecture provided by the embodiment of the present application.
  • Figure 3 is an example of a communication method provided by an embodiment of the present application.
  • Figure 4 shows a method for the base station to calculate the first importance weight.
  • Figure 5 is another example of the communication method provided by the embodiment of the present application.
  • Figure 6 is another example of the communication method provided by the embodiment of the present application.
  • Figure 7 is a schematic diagram of the process for the base station to calculate the importance weight of the terminal equipment.
  • Figure 8 is another example of the communication method provided by the embodiment of the present application.
  • FIG. 9 is an example of a schematic diagram of the effect of the communication method provided by the embodiment of the present application.
  • Figure 10 is another schematic diagram of the effect of the communication method provided by the embodiment of the present application.
  • Figure 11 is an example of a communication device provided by an embodiment of the present application.
  • Figure 12 is another example of a communication device provided by an embodiment of the present application.
  • GSM Global System of Mobile communication
  • CDMA Code Division Multiple Access
  • WCDMA broadband code division multiple access
  • GPRS General Packet Radio Service
  • LTE Long Term Evolution
  • FDD Frequency Division Duplex
  • TDD Time Division Duplex
  • UMTS Universal Mobile Telecommunication System
  • WiMAX Worldwide Interoperability for Microwave Access
  • the terminal equipment in the embodiment of this application may refer to user equipment, access terminal, user unit, user station, mobile station, mobile station, remote station, remote terminal, mobile device, user terminal, terminal, wireless communication equipment, user agent or User device.
  • the terminal device may also be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA), or a device with wireless communications Functional handheld devices, computing devices or other processing devices connected to wireless modems, vehicle-mounted devices, wearable devices, terminal devices in 5G networks or terminal devices in the evolved Public Land Mobile Network (PLMN) etc., the embodiments of the present application are not limited to this.
  • SIP Session Initiation Protocol
  • WLL Wireless Local Loop
  • PDA Personal Digital Assistant
  • distributed users can be terminal devices and the central server can be base stations.
  • the local model is a model trained by each distributed user based on local data.
  • the global model is a model obtained by the central server after integrating multiple local models for delivery.
  • Model aggregation is the global model parameters obtained by integrating the model parameters uploaded by distributed users when the central server receives them during the federation update round.
  • federated learning is generally regarded as a type of distributed system with high communication efficiency due to its concise model training paradigm and parameter transfer rules.
  • the federated learning model is a model of deep communication-computing coupling: on the one hand, the communication network serves distributed computing and provides basic functions of information transmission; on the other hand, intelligent algorithms can also empower the communication network to optimize system performance. .
  • Figure 1 shows the structure of a regional federation, in which the central server can be deployed on a regional base station to provide centralized services such as federation updates and overall system scheduling for distributed user terminal devices in the region.
  • model aggregation in common federated learning algorithms is integration at the parameter level, the training effect of the local model has an important impact on the performance and convergence speed of the global model.
  • the deep learning model represented by neural network is a data-driven algorithm, and its training data largely determines the final training effect. Therefore, in a federated learning scenario, local data indirectly affects the global model, which in turn affects the communication and computing efficiency of the system.
  • Non-IID local user data is considered to be a key challenge faced by federated learning.
  • model distillation can modify the loss function of distributed user local model training to make the output of the local model similar to the global model and reduce the difference between local models.
  • model distillation solution requires distributed users to additionally calculate the target model output locally, which increases the computational load on the local side of distributed users.
  • base stations can solve the non-IID problem to a certain extent through local data integration.
  • each distributed user can choose local data with better quality during training, which helps improve the performance of the local model.
  • l(z k,i ; ⁇ t) is the loss function of the training period neural network model ⁇ t on the sample z k,i , such as in the classification problem cross entropy or mean square error in regression problems.
  • This method operates at the beginning of each distributed user's local training. After each round of federation update, each participating distributed user uses the downloaded global model to calculate the importance weight of each sample in the local data according to the above formula. In the local training cycle, data samples are sampled according to importance weights and used to train and update the local model.
  • the importance weight in this method is defined from an intuitive perspective. Under the global model updated according to the importance weight, samples with larger gradient modulus in backpropagation are given higher importance. weight, without considering the performance and convergence speed of the global model after resampling. Although this method has certain performance improvements in actual experiments, it lacks theoretical support.
  • the calculation of importance weights in this method relies on the local sample data of each group of distributed users, which requires each distributed user to download the updated global model and then perform backpropagation calculations sample by sample.
  • the computing resources of distributed users are given priority. Computing on distributed users requires more computing time, thereby reducing the overall efficiency of the system.
  • the federated learning architecture corresponding to this method lacks interaction between distributed users and the central server, and fails to fully utilize the global information or the computing power support provided by the central server.
  • Figure 2 is a schematic diagram of the system architecture provided by the embodiment of the present application.
  • Figure 2 shows a multi-user collaborative federated learning system.
  • a central server for example, a regional base station
  • N N>1 distributed users participating in training.
  • the training data is divided into M (M>1) according to labels. categories, the distribution of local data of each distributed user is ⁇ p N, n ⁇ , and the distribution of all data is ⁇ p n ⁇ .
  • the federated learning system shown in Figure 2 mainly functions in three stages.
  • the central server in addition to aggregating the collected local models, the central server also analyzes the local models of each distributed user and the current The new global model obtained in the federation round assists in estimating the gradient Lipschitz coefficient of each category of samples and calculating the importance weight;
  • distributed users download the new global model and receive new importance weights at the same time;
  • third distributed users sample training data samples according to importance weights when training local models.
  • Figure 3 shows an example of a communication method provided by an embodiment of the present application.
  • the base station can correspond to the central server
  • the terminal device can correspond to a certain distributed user.
  • the method includes:
  • S310 The base station determines to calculate the first importance weight.
  • the first importance weight is the importance weight of each category of data samples in multiple categories of data samples corresponding to the terminal device.
  • the base station can calculate the first importance weight of the terminal device, and can also calculate the first importance weight of other terminal devices in the federated learning system related to the base station.
  • the base station can calculate the first importance weight of all terminal devices in the federated learning system related to the base station.
  • the base station can calculate the first importance weight of some terminal devices in the federated learning system related to the base station. For example, the base station can calculate the first importance weights of some terminal devices by itself, and the base station can instruct other terminal devices to calculate their own first importance weights.
  • the base station may determine whether the base station calculates the first result of the terminal device based on one or more factors such as the distribution of global data of the base station, system scheduling, communication link quality, computing power of the base station, computing power of the terminal device, etc. Importance weight.
  • the distribution of global data is the distribution of local data corresponding to multiple terminal devices related to the base station during federated training.
  • the base station collects the distribution of local data corresponding to all terminal devices related to the base station during federated training, integrates it, and obtains the distribution of global data.
  • the base station sends first indication information to the terminal device.
  • the first indication information is used to instruct the terminal device to send the first parameter.
  • the first parameter is used by the base station to calculate the first importance weight.
  • the terminal device receives the first indication information.
  • the first parameter includes the distribution of local data of the terminal device.
  • the first parameter also includes the gradient corresponding to the local data of the terminal device.
  • the gradient is calculated by the terminal device based on local data.
  • the terminal device sends the first parameter to the base station according to the first indication information.
  • the base station receives the first parameter sent by the terminal device.
  • S340 The base station calculates the first importance weight according to the first parameter.
  • the base station determines the distribution of global data based on the distribution of local data of multiple terminal devices associated with the base station when performing federated training.
  • the base station calculates the first importance weight based on the first parameter and global data.
  • Figure 4 shows a method for the base station to calculate the first importance weight. As shown in Figure 4, the method includes:
  • the base station calculates the gradient Lipschitz coefficient of each category of data samples in multiple categories of data samples in the terminal device according to the distribution of local data of the terminal device.
  • the base station calculates the importance weight of each category of data samples in the multiple categories of data samples based on the gradient Lipschitz coefficient of each category of data samples in the multiple categories of data samples.
  • the first importance weight satisfies the following conditions:
  • the p N, n represents the distribution of global data
  • the distribution of the global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training
  • the p n is the distribution of the local data corresponding to the terminal device.
  • the distribution of local data the N represents the number of the plurality of terminal devices
  • the n represents that the terminal device is the n-th terminal device
  • the ⁇ N,n is the gradient Shipley of each category tz deviation
  • ⁇ N is The smallest non-negative value in .
  • S350 The base station sends the first importance weight to the terminal device.
  • the terminal device receives the first importance weight sent by the base station.
  • S360 The terminal device trains the next local model according to the first importance weight.
  • the communication method provided by the embodiments of this application solves the problem of non-independent and identically distributed local data of each terminal device participating in federated training by introducing the importance weight sampling of sample data of each category of the terminal device and the interaction mechanism between the base station and the terminal device.
  • the model has problems such as poor accuracy and slow convergence.
  • the importance weight derived based on relevant theories not only takes into account the uneven distribution of samples of different categories, but also takes into account the gradient Lipschitz coefficient that affects the convergence of the model, so that the pattern features of samples of different categories are learned by the local model at the same time. It also ensures that the deviation between local models is controllable.
  • the update of the importance weight is deployed on the base station, which can fully utilize the computing power advantages of the base station and liberate the computing power of the terminal equipment.
  • Figure 5 shows another example of the communication method provided by the embodiment of the present application. As shown in Figure 5, the method includes:
  • the base station sends second instruction information to the terminal device.
  • the second instruction information instructs the terminal device to send second parameters.
  • the second parameters are used to calculate the second importance weight.
  • the terminal device receives the second indication information sent by the base station.
  • the second importance weight is the importance weight of each category of data samples among multiple categories of data samples corresponding to the terminal device.
  • the second parameter includes the distribution of local data of the terminal device.
  • the second parameter also includes the gradient corresponding to the local data of the terminal device.
  • the gradient is calculated by the terminal device based on local data.
  • S520 The terminal device sends the second parameter to the base station according to the second indication information.
  • the base station receives the second parameter sent by the terminal device.
  • the base station determines the distribution of global data according to the second parameter.
  • the distribution of global data reference can be made to the description of the distribution of global data in the corresponding method in Figure 3, which will not be described again.
  • S530 The base station determines that the terminal device calculates the second importance weight.
  • S540 The base station sends third instruction information to the terminal device, where the third instruction information instructs the terminal device to calculate the second importance weight.
  • the terminal device receives the third indication information sent by the base station.
  • S550 The terminal device calculates the second importance weight according to the second parameter.
  • the method may also include: the terminal device receives the distribution of global data sent by the base station, and calculates the second importance weight according to the second parameter and the distribution of the global data.
  • the second importance weight satisfies the following conditions:
  • the p N, n represents the distribution of global data
  • the distribution of the global data is the distribution of local data corresponding to multiple terminal devices associated with the base station during federated training
  • the p n is the distribution of the local data corresponding to the terminal device.
  • the distribution of local data the N represents the number of the plurality of terminal devices
  • the n represents that the terminal device is the n-th terminal device
  • the ⁇ N,n is the gradient Shipley of each category tz deviation
  • ⁇ N is The smallest non-negative value in .
  • S560 The terminal device trains the next local model according to the second importance weight.
  • the communication method provided by the embodiments of this application can locally complete the update of importance weights for terminal devices that have surplus computing power or are not very timely in federation updates, reducing the pressure on system scheduling and further improving overall efficiency.
  • Figure 6 shows another example of the communication method provided by the embodiment of the present application. As shown in Figure 6, the method includes:
  • this initialization includes the preset of system parameters, the initialization of the local model of the terminal device, the initialization of the importance weights of each category corresponding to the terminal device, and the configuration of training hyperparameters.
  • S620 The base station performs terminal-cloud interaction presets for the current global cycle.
  • the central server can be a base station.
  • Each terminal device among the plurality of terminal devices respectively samples the local training data according to the importance weight and samples a preset number of samples for updating the local model.
  • the process for the nth terminal device to train a local model based on importance weights can be:
  • Step 6-1 The nth terminal device samples each type of data sample according to the importance weight to form a data batch for training the local model;
  • Step 6-2 The n-th terminal device sequentially performs backpropagation on the sampled data batch and updates the local model
  • Step 6-3 The nth terminal device checks whether the local training round is completed. If it is completed, it ends. If it is not completed, it goes to step 6-1 to continue the training process.
  • the base station performs model aggregation according to the local models of each terminal device and the preset federation update rules to obtain this global model.
  • the base station estimates the gradient of the data samples of each category of each terminal device on the corresponding local model of the terminal device on a preset data set based on the local model of each terminal device this time and the global model this time. Shiplitz coefficient.
  • the preset data set may be a preset data set that is independent of each terminal device, or may be a data set intercepted from the data of each terminal device.
  • the base station may estimate the gradient Shiplitz coefficient of each category of data samples of multiple categories of the n-th terminal device on the current local model of the n-th terminal device.
  • S670 The base station calculates the importance weight according to the gradient Shiplitz coefficient.
  • the base station updates the importance weight this time based on the gradient Shiplitz coefficient.
  • Figure 7 is a schematic diagram of a process for the base station to calculate the importance weight of the n-th terminal device. As shown in Figure 7, the process can be:
  • Step 710 The base station initializes the candidate parameter set to be empty and sets the category label to 0;
  • Step 720 The base station calculates the gradient Lipschitz coefficient deviation of the current category according to the formula. If there is a deviation is less than 0, then Add the candidate parameter set and increase the category label by one, otherwise directly increase the category label by one;
  • Step 730 The base station determines whether all analogies have been traversed. If so, proceed to step 740. If not, jump to step 720;
  • Step 740 The base station selects the minimum non-negative value in the parameter candidate set as the optimal parameter ⁇ N , and then obtains the importance weight of each category of data samples in the n-th terminal device.
  • the base station sends the current global model and the corresponding importance weight to each terminal device among the multiple terminal devices.
  • the base station checks whether the global training round is completed. If it is completed, it ends the model training. If it is not completed, it jumps to S620 for loop.
  • the method provided by the embodiments of this application solves the problem of poor model accuracy caused by the non-independence and identical distribution of local data of each terminal device by introducing the importance weight sampling of sample data of each category of the terminal device and the interaction mechanism between the base station and the terminal device. , slow convergence and other problems.
  • the importance weight derived based on relevant theories not only takes into account the uneven distribution of samples of different categories, but also takes into account the gradient Lipschitz coefficient that affects the convergence of the model, so that the pattern features of samples of different categories are learned by the local model at the same time. It also ensures that the deviation between local models is controllable.
  • the update of the importance weight is deployed on the base station, which can fully utilize the computing power advantages of the base station and liberate the computing power of the terminal equipment.
  • Figure 8 shows another example of the communication method provided by the embodiment of the present application. As shown in Figure 8, the method includes:
  • the initialization includes the preset of system parameters, the initialization of the local model of the terminal device, the initialization of the importance weights of each category corresponding to the terminal device, and the configuration of training hyperparameters.
  • S820 The base station performs terminal-cloud interaction presets for the current global cycle.
  • Each terminal device among the plurality of terminal devices respectively samples the local training data according to the importance weight and samples a preset number of samples for updating the local model.
  • the process of training the local model based on the importance weight by the nth terminal device can refer to the description in step S630, which will not be described again here.
  • the base station performs model aggregation according to the local models of each terminal device and the preset federation update rules to obtain this global model.
  • the base station sends instruction information to the terminal device, instructing the terminal device to calculate the importance weight.
  • S870 The terminal device obtains this global model and auxiliary parameters from the base station.
  • S880 The terminal device calculates the importance weight.
  • the calculation of the importance weight can refer to the description in S670, and only the base station is replaced by the terminal device.
  • the base station checks whether the global training round is completed. If it is completed, it ends the model training. If it is not completed, it jumps to S820 for loop.
  • the communication method provided by the embodiments of this application can update the importance weight locally for distributed users who have surplus computing power or whose federation update is not timely, reducing the pressure on system scheduling and further improving overall efficiency.
  • FIG. 9 is an example of a schematic diagram of the effect of the communication method provided by the embodiment of the present application. As shown in Figure 9, it is a comparison curve of model accuracy with communication rounds. Under the set non-IID data distribution, after 25 rounds of federation updates, the average classification accuracy of the global model on the standard test set is 64 %, which is only about 6% lower than the model trained centrally using all non-IID data.
  • FIG. 10 is an example of a schematic diagram illustrating the effects of the communication method provided by the embodiment of the present application.
  • FIG. 10 for data sampling efficiency, consider the partial participation of local data in training, that is, in each local training iteration, only a certain proportion of the data is sampled to update the local model. As the sampling ratio decreases, the global model classification accuracy of the federated learning algorithm designed in the embodiments of this application decreases less, and its ability to cope with local device computing resources, computing time or insufficient samples is also significantly better than the random sampling method.
  • FIG. 11 is a schematic diagram of an example of a communication device provided by an embodiment of the present application. As shown in FIG. 11 , the communication device 1100 includes a transceiver unit 1110 and a processing unit 1120 .
  • the communication device 1100 can be used to implement functions involving terminal devices in any of the above methods.
  • the communication device 1100 may correspond to a terminal device.
  • the communication device 1100 may be a terminal device, and performs the steps performed by the terminal device in the above method embodiment.
  • the transceiver unit 1110 can be used to support the communication device 1100 to communicate, for example, perform the sending and/or receiving actions performed by the terminal device in the above method embodiment.
  • the processing unit 1120 can be used to support the communication device 1100 to perform the above method embodiment. Processing actions, such as executing the processing actions performed by the terminal device in the above method embodiments.
  • the communication device may also include a storage unit 1130 (not shown in Figure 11) for storing program codes and data of the communication device.
  • a storage unit 1130 (not shown in Figure 11) for storing program codes and data of the communication device.
  • the communication device 1100 can be used to implement functions involving the base station in any of the above methods.
  • the communication device 1100 may correspond to a base station.
  • the communication device 1100 may be a base station, and performs the steps performed by the base station in the above method embodiments.
  • the transceiver unit 1110 can be used to support the communication device 1100 to communicate, for example, perform the sending and/or receiving actions performed by the base station in the above method embodiment.
  • the processing unit 1120 can be used to support the communication device 1100 to perform the processing in the above method embodiment. Actions, such as performing processing actions performed by the base station in the above method embodiments.
  • the communication device may also include a storage unit 1130 (not shown in Figure 11) for storing program codes and data of the communication device.
  • a storage unit 1130 (not shown in Figure 11) for storing program codes and data of the communication device.
  • FIG. 12 is a schematic diagram of another example of a communication device provided by an embodiment of the present application.
  • the device 1200 includes: a transceiver 1210, a processor 1220 and a memory 1230.
  • the memory 1230 is used to store instructions.
  • the processor 1220 is coupled to the memory 1230 and is used to execute instructions stored in the memory to execute the method provided by the above embodiments of the present application.
  • the transceiver 1210 in the device 1200 may correspond to the transceiver unit 1110 in the device 1100
  • the processor 1220 in the communication device 1200 may correspond to the processing unit 1120 in the communication device 1100 .
  • memory 1210 and processor 1220 can be combined into one processing device, and the processor 1220 is used to execute the program code stored in the memory 1230 to implement the above functions.
  • the memory 1230 may also be integrated in the processor 1220 or independent of the processor 1220.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the functions are implemented in the form of software functional units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the existing technology or the part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program code. .

Abstract

本申请实施例提供了一种通信方法和装置。该方法包括:基站确定要计算第一重要性权重,第一重要性权重为终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重;基站向终端设备发送第一指示信息,第一指示信息用于指示终端设备发送第一参数,第一参数用于基站计算第一重要性权重;基站接收终端设备发送的第一参数;基站根据第一参数计算第一重要性权重;基站向终端设备发送第一重要性权重。该方法解决了参与联邦训练的各终端设备本地数据非独立同分布带来的模型精度较差、收敛较慢等问题。

Description

一种通信方法和通信装置 技术领域
本申请实施例涉及通信领域,特别的,涉及一种通信方法和通信装置。
背景技术
联邦学习作为一种新兴的分布式学习框架,以其简明的模型训练范式和参数传递规则,被普遍认为是一类高通信效率的分布式系统。联邦学习的核心思想是通过在多个拥有本地数据的数据源之间进行分布式模型训练,在不需要交换本地个体或样本数据的前提下,仅通过交换模型参数或中间结果的方式,构建基于虚拟融合数据下的全局模型。
然而,在相关的理论、实验研究以及工程实践中,非独立同分布的本地用户数据被认为是联邦学习所面临的关键挑战。本地数据在分布层面的非独立同分布特性主要表现在两个方面。第一,本地数据中训练样本类别的不平衡。第二,分布式用户(终端设备)间本地数据类别的分布不一致。其中,前者影响本地模型的训练效果,不平衡的样本类别分布容易导致过拟合、在测试集上泛化性能较差等问题。后者使得各分布式用户的本地模型呈现出不同的训练趋势和倾向,导致联邦整合后的全局模型的精度和收敛性较差。
发明内容
本申请实施例提供一种通信方法和通信装置,通过引入终端设备各类别的样本数据的重要性权重采样,以及基站与终端设备的交互机制,解决了参与联邦训练的各终端设备本地数据非独立同分布带来的模型精度较差、收敛较慢等问题。基于相关理论推导得到的重要性权重不仅考虑了不同类别样本的分布不均衡,还将影响模型收敛性的梯度利普希茨系数考虑在内,使得不同类别样本的模式特征被本地模型学习的同时也保证了本地模型间的偏离可控。
第一方面,提供了一种通信方法,基站确定要计算第一重要性权重,第一重要性权重为终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重;基站向终端设备发送第一指示信息,第一指示信息用于指示终端设备发送第一参数,第一参数用于所述基站计算所述第一重要性权重;基站接收终端设备发送的第一参数;基站根据第一参数计算第一重要性权重;基站向终端设备发送第一重要性权重。
本申请实施例提供的通信方法,通过引入终端设备各类别的样本数据的重要性权重采样,以及基站与终端设备的交互机制,解决了参与联邦训练的各终端设备本地数据非独立同分布带来的模型精度较差、收敛较慢等问题。基于相关理论推导得到的重要性权重不仅考虑了不同类别样本的分布不均衡,还将影响模型收敛性的梯度利普希茨系数考虑在内,使得不同类别样本的模式特征被本地模型学习的同时也保证了本地模型间的偏离可控。此外,重要性权重的更新部署于基站,能够充分利用基站的算力优势,解放终端设备的算力。
结合第一方面,在第一方面的某些实现方式中,该第一参数包括终端设备的本地数据 的分布。
结合第一方面,在第一方面的某些实现方式中,该第一参数包括终端设备的本地数据的分布和本地数据对应的梯度。
结合第一方面,在第一方面的某些实现方式中,基站确定全局数据的分布,全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布;
基站根据第一参数计算第一重要性权重,包括:
基站根据第一参数和全局数据的分布计算第一重要性权重。
结合第一方面,在第一方面的某些实现方式中,第一重要性权重满足以下条件:
Figure PCTCN2022109277-appb-000001
其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述α N,n为每个类别的梯度希普利茨偏差,Γ N
Figure PCTCN2022109277-appb-000002
中的最小非负值。
第二方面,提供了一种通信方法,包括:终端设备接收基站发送的第一指示信息,第一指示信息用于指示第一指示信息用于指示终端设备发送第一参数,第一参数用于基站计算第一重要性权重;终端设备根据第一指示信息向基站发送第一参数;终端设备接收基站发送的第一重要性权重;终端设备根据第一重要性权重训练下一次的本地模型。
本申请实施例提供的通信方法,通过引入终端设备各类别的样本数据的重要性权重采样,以及基站与终端设备的交互机制,解决了参与联邦训练的各终端设备本地数据非独立同分布带来的模型精度较差、收敛较慢等问题。基于相关理论推导得到的重要性权重不仅考虑了不同类别样本的分布不均衡,还将影响模型收敛性的梯度利普希茨系数考虑在内,使得不同类别样本的模式特征被本地模型学习的同时也保证了本地模型间的偏离可控。此外,重要性权重的更新部署于基站,能够充分利用基站的算力优势,解放终端设备的算力。
结合第二方面,在第二方面的某些实现方式中,该第一参数包括终端设备的本地数据的分布。
结合第二方面,在第二方面的某些实现方式中,该第一参数包括终端设备的本地数据的分布和本地数据对应的梯度。
结合第二方面,在第二方面的某些实现方式中,第一重要性权重满足以下条件:
Figure PCTCN2022109277-appb-000003
其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分 布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述α N,n为每个类别的梯度希普利茨偏差,Γ N
Figure PCTCN2022109277-appb-000004
中的最小非负值。
第三方面,提供了一种通信方法,包括:基站向终端设备发送第二指示信息,第二指示信息指示终端设备发送第二参数,第二参数用于计算第二重要性权重,第二重要性权重为终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重;基站接收终端设备发送的所述第二参数;基站确定由终端设备计算第二重要性权重;基站向终端设备发送第三指示信息,第三指示信息指示由终端设备计算第二重要性权重。
本申请实施例提供的通信方法,对算力有剩余或者对于联邦更新时效性不强的终端设备,可以在本地完成重要性权重的更新,减轻了系统调度的压力,进一步提高整体效率。
结合第三方面,在第三方面的某些实现方式中,第二参数包括终端设备的本地数据的分布。
结合第三方面,在第三方面的某些实现方式中,第二参数包括终端设备的本地数据的分布和本地数据对应的梯度。
结合第三方面,在第三方面的某些实现方式中,基站确定全局数据的分布,全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布;基站向终端设备发送全局数据的分布。
结合第三方面,在第三方面的某些实现方式中,第二重要性权重满足以下条件:
Figure PCTCN2022109277-appb-000005
其中,p N,n表示全局数据的分布,全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,p n为终端设备对应的本地数据的分布,N表示多个终端设备的个数,n表示终端设备为第n个终端设备,α N,n为每个类别的梯度希普利茨偏差,Γ N
Figure PCTCN2022109277-appb-000006
中的最小非负值。
第四方面,提供了一种通信方法,包括:终端设备接收基站发送的第二指示信息,第二指示信息指示终端设备向基站发送第二参数,第二参数用于计算第二重要性权重,第二重要性权重为终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重;终端设备根据第二指示信息向基站发送第二参数;终端设备接收基站发送的第三指示信息,第三指示信息指示由终端设备计算第二重要性权重;终端设备根据第二参数计算第二重要性权重;终端设备根据第二重要性权重训练下一次的本地模型。
本申请实施例提供的通信方法,对算力有剩余或者对于联邦更新时效性不强的终端设备,可以在本地完成重要性权重的更新,减轻了系统调度的压力,进一步提高整体效率。
结合第四方面,在第四方面的某些实现方式中,第二参数包括终端设备的本地数据的 分布。
结合第四方面,在第四方面的某些实现方式中,第二参数包括终端设备的本地数据的分布和本地数据对应的梯度。
结合第四方面,在第四方面的某些实现方式中,基站确定全局数据的分布,全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布;基站向终端设备发送全局数据的分布。
结合第四方面,在第四方面的某些实现方式中,第二重要性权重满足以下条件:
Figure PCTCN2022109277-appb-000007
其中,p N,n表示全局数据的分布,全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,p n为终端设备对应的本地数据的分布,N表示多个终端设备的个数,n表示终端设备为第n个终端设备,α N,n为每个类别的梯度希普利茨偏差,Γ N
Figure PCTCN2022109277-appb-000008
中的最小非负值。
第五方面,提供了一种重要性权重的计算方法,包括:第一设备根据终端设备的本地数据的分布计算终端设备中多个类别的数据样本中的每个类别的数据样本的梯度利普希茨系数;第一设备根据多个类别数据样本中的每个类别的数据样本的梯度利普希茨系数计算多个类别数据样本中的每个类别的数据样本的重要性权重,多个类别的数据样本的重要性权重用于终端设备训练下一次的本地模型。
结合第五方面,在第五方面的某些实现方式中,第一设备为基站。
结合第五方面,在第五方面的某些实现方式中,第一设备为终端设备。
结合第五方面,在第五方面的某些实现方式中,该重要性权重满足以下条件:
Figure PCTCN2022109277-appb-000009
其中,p N,n表示全局数据的分布,全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,p n为终端设备对应的本地数据的分布,N表示多个终端设备的个数,n表示终端设备为第n个终端设备,α N,n为每个类别的梯度希普利茨偏差,Γ N
Figure PCTCN2022109277-appb-000010
中的最小非负值。
第六方面,提供了一种通信装置,包括用于执行上述第一方面或第三方面及其实现方式中的通信方法的各步骤的单元。
在一种设计中,该通信装置为通信芯片,该通信芯片可以包括用于发送信息或数据的输入电路或接口,以及用于接收信息或数据的输出电路或接口。
在另一种设计中,该通信装置为通信设备(例如,基站等),通信芯片可以包括用于发送信息的发射机,以及用于接收信息或数据的接收机。
第七方面,提供了一种通信装置,包括用于执行上述第二方面或第四方面及其实现方式中的通信方法的各步骤的单元。
在一种设计中,该通信装置为通信芯片,该通信芯片可以包括用于发送信息或数据的输入电路或接口,以及用于接收信息或数据的输出电路或接口。
在另一种设计中,该通信装置为通信设备(例如,终端设备等),通信芯片可以包括用于发送信息的发射机,以及用于接收信息或数据的接收机。
第八方面,提供了一种通信设备,包括,处理器,存储器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得该通信设备执行上述第一方面至第四方面及其各实现方式中的通信方法。
可选地,所述处理器为一个或多个,所述存储器为一个或多个。
可选地,所述存储器可以与所述处理器集成在一起,或者所述存储器与处理器分离设置。
可选的,该通信设备还包括,发射机(发射器)和接收机(接收器)。
第九方面,提供了一种计算机程序产品,所述计算机程序产品包括:计算机程序(也可以称为代码,或指令),当所述计算机程序被运行时,使得计算机执行上述第一方面至第四方面及其各实现方式中的通信方法。
第十方面,提供了一种计算机可读介质,所述计算机可读介质存储有计算机程序(也可以称为代码,或指令)当其在计算机上运行时,使得计算机执行上述第一方面至第四方面及其各实现方式中的通信方法。
第十一方面,提供了一种芯片系统,包括存储器和处理器,该存储器用于存储计算机程序,该处理器用于从存储器中调用并运行该计算机程序,使得安装有该芯片系统的通信设备执行上述第一方面至第四方面及其各实现方式中的通信方法。
其中,该芯片系统可以包括用于发送信息或数据的输入电路或者接口,以及用于接收信息或数据的输出电路或者接口。
附图说明
图1是传统的联邦学习框架的示意图。
图2是本申请实施例提供的系统架构的示意图。
图3是本申请实施例提供的通信方法的一例。
图4为基站计算第一重要性权重的方法。
图5为本申请实施例提供的通信方法的另一例。
图6为本申请实施例提供的通信方法的另一例。
图7为基站计算终端设备的重要性权重的流程的示意图
图8为本申请实施例提供的通信方法的另一例。
图9为本申请实施例提供的通信方法的效果的示意图的一例。
图10为本申请实施例提供的通信方法的效果的示意图的另一例。
图11为本申请实施例提供的通信装置的一例。
图12为本申请实施例提供的通信装置的另一例。
具体实施方式
下面将结合附图对本申请技术方案的实施例进行详细的描述。以下实施例仅用于更加清楚地说明本申请的技术方案,因此只作为示例,而不能以此来限制本申请的保护范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。
在本申请实施例的描述中,“多个”的含义是两个以上,除非另有明确具体的限定。在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本申请实施例的描述中,术语“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本申请实施例的技术方案可以应用于各种通信系统,例如:全球移动通讯(Global System of Mobile communication,GSM)系统、码分多址(Code Division Multiple Access,CDMA)系统、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)系统、通用分组无线业务(General Packet Radio Service,GPRS)、长期演进(Long Term Evolution,LTE)系统、LTE频分双工(Frequency Division Duplex,FDD)系统、LTE时分双工(Time Division Duplex,TDD)、通用移动通信系统(Universal Mobile Telecommunication System,UMTS)、全球互联微波接入(Worldwide Interoperability for Microwave Access,WiMAX)通信系统、第五代(5th Generation,5G)系统或新无线(New Radio,NR)等。
本申请实施例中的终端设备可以指用户设备、接入终端、用户单元、用户站、移动站、移动台、远方站、远程终端、移动设备、用户终端、终端、无线通信设备、用户代理或用户装置。终端设备还可以是蜂窝电话、无绳电话、会话启动协议(Session Initiation Protocol,SIP)电话、无线本地环路(Wireless Local Loop,WLL)站、个人数字处理(Personal Digital Assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、可穿戴设备,5G网络中的终端设备或者演进的公用陆地移动通信网络(Public Land Mobile Network,PLMN)中的终端设备等,本申请实施例对此并不限定。
下面将对本申请涉及的关键术语进行介绍。
需要说明的是,在关键术语的介绍中,分布式用户可以为终端设备,中心服务器可以为基站。
一.本地模型:
本地模型是各个分布式用户根据本地数据训练得到的模型。
二.全局模型:
全局模型是中心服务器整合多个本地模型后得到的用于下发的模型。
三.模型聚合:
模型聚合是在联邦更新轮次,中心服务器收到分布式用户上传的模型参数,进行整合得到的全局模型参数。
四.梯度的模:
梯度的模式模型训练时反向传播得到参数梯度的范数。
五、本地模型的梯度利普希茨系数:
对于本地模型
Figure PCTCN2022109277-appb-000011
和用全部数据集中式训练的模型
Figure PCTCN2022109277-appb-000012
以及用户k上的i类的任意训练样本z k,i,使得
Figure PCTCN2022109277-appb-000013
成立的最小系数L k,i,其中l(·)为深度学习任务的损失函数。
下面将对本申请涉及的联邦学习模式进行介绍。
联邦学习作为一种新兴的分布式学习框架,以其简明的模型训练范式和参数传递规则,被普遍认为是一类高通信效率的分布式系统。
同时,联邦学习模式是通信-计算深度耦合的模式:一方面通信网络服务于分布式计算,提供信息传递的基础功能;另一方面,智能算法也可以为通信网络赋能,实现系统性能的优化。
因此,以联邦学习为代表的分布式网算系统也被认为是未来通信网络的一个核心功能。传统的联邦学习基本框架如图1所示,可以总结为本地分布式训练和中心周期性联邦整合两个基本步骤:
(1)分布式用户根据本地的数据训练本地模型;
(2)在联邦更新轮次,分布式用户将本地模型参数上传至中心服务器,中心服务器进行模型聚合后将新的全局模型再传回各个分布式用户,用于之后的本地训练。
图1中所示为一个区域联邦的结构,其中的中心服务器可以部署在区域基站上,为该区域内的分布式用户终端设备提供联邦更新、系统整体调度等中心化服务。
由于常见的联邦学习算法中的模型聚合都是参数层面的整合,本地模型的训练效果对全局模型的性能表现和收敛速度都有重要的影响。而以神经网络为代表的深度学习模型作为数据驱动的算法,其训练数据很大程度上决定了最终训练效果。因此,在联邦学习场景中,本地数据间接影响全局模型,进而影响系统的通信和计算效率。非独立同分布的本地用户数据被认为是联邦学习所面临的关键挑战,下面将介绍几种对抗本地数据的非独立同分布特性的方法。
一.数据增强。
从数据增强的角度对抗本地数据的非独立同分布特性。例如,可以通过中心服务器将一较小的增强数据即分发与分布式用户共享,以此来对抗本地数据的非独立同分布特性。
但是,数据增强的解决方案需要进行额外的数据传输,增加了通信代价,同时也可能存在数据隐私等限制。
二.模型层面。
从模型层面对抗本地数据的非独立同分布特性。例如,模型蒸馏,可以修改分布式用户本地模型训练的损失函数,使得本地模型与全局模型的输出相似,降低本地模型的差异度。
但是,模型蒸馏的解决方案需要分布式用户在本地额外计算目标模型输出,增加了分布式用户本地端的运算量。
三.对分布式用户或数据的选择。
在联邦学习框架下加入对分布式用户或者数据的选择模块。
在分布式用户方面,通过选择模型表现较好的分布式用户参与模型聚合,可以减少较差的本地模型给全局模型带来的偏差。例如,基站可以通过局部的数据整合在一定程度上对非独立同分布的问题。
在数据方面,各分布式用户在训练时可以选择质量较好的本地数据,有助于提高本地模型的表现。
四.对数据样本的重采样。
从分布式用户本地训练样本采样角度出发的技术有基于本地训练样本梯度动态采样的联邦学习方法,该方法通过定义给出一种样本重要性权重,在第t个训练周期,对于第k个用户和其本地数据中的训练样本z k,i,其重要性权重的计算方法如下:
Figure PCTCN2022109277-appb-000014
其中
Figure PCTCN2022109277-appb-000015
是该次反向传播中神经网络模型第L层的输入和输出,l(z k,i;θt)为该训练周期神经网络模型θ t对样本z k,i的损失函数,如分类问题中的交叉熵或回归问题中的均方误差。
该方法在每个分布式用户本地训练开始时进行操作,每轮联邦更新后,每个参与的分布式用户用下载的全局模型根据上述公式计算本地数据中各样本的重要性权重,在之后的本地训练周期中,按照重要性权重对数据样本进行采样,用于训练更新本地模型。
但是,该基于本地训练样本梯度动态采样的联邦学习方法存在较多不足之处。
第一,该方法中的重要性权重是从直观的角度定义的,在根据该重要性权重更新后的全局模型下,反向传播中梯度的模较大的样本被赋予了更高的重要性权重,而没有考虑重采样后全局模型的性能表现和收敛速度。虽然该方法在实际实验中有一定的性能提升,但是缺乏理论层面的支持。
第二,该方法中对重要性权重的计算依赖于每一组分布式用户的本地样本数据,需要每个分布式用户下载更新后的全局模型后逐样本进行反向传播计算,而在一些实际的联邦学习系统中,尤其是轻量级的移动场景中,分布式用户的计算资源优先,在分布式用户端进行计算需要耗费较多的运算时间,进而降低系统的整体效率。
第三,该方法所对应的联邦学习架构中,缺乏分布式用户与中心服务器之间的交互,没能将全局信息或中心服务器所能提供的算力支持充分利用。
图2是本申请实施例提供的系统架构的示意图。如图2所示为多用户协作的联邦学习系统,有一个中心服务器(例如,区域基站)和N(N>1)个分布式用户参与训练,训练数据按照标签划分为M(M>1)个类别,每个分布式用户的本地数据的分布为{p N,n},全体数据的分布为{p n}。
如图2所示的联邦学习系统,主要作用在三个阶段,第一,在联邦更新轮次内,中心 服务器除了将收集到的本地模型进行聚合,还根据各分布式用户本地模型和本次联邦轮次得到的新的全局模型辅助估计各类别样本的梯度利普希茨系数,并计算重要性权重;第二,分布式用户下载新的全局模型的同时接收新的重要性权重;第三,分布式用户在进行本地模型的训练时根据重要性权重采样训练数据样本。
图3所示是本申请实施例提供的通信方法的一例。该方法中,基站可以对应中心服务器,终端设备可以对应某一个分布式用户。如图3所示,该方法包括:
S310,基站确定要计算第一重要性权重。
具体的,该第一重要性权重为终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重。
需要说明的是,基站可以计算该终端设备的第一重要性权重,也可以计算与该基站相关的联邦学习系统中其他终端设备的第一重要性权重。
可选的,基站可以计算与该基站相关的联邦学习系统中所有终端设备的第一重要性权重。
可选的,基站可以计算与该基站相关的联邦学习系统中部分终端设备的第一重要性权重。例如,基站可以自己计算部分终端设备的第一重要性权重,基站可以指示其他终端设备计算自身的第一重要性权重。
可选的,基站可以根据基站的全局数据的分布、系统调度、通信链路质量、基站的算力、终端设备的算力等其中的一个或多个因素确定是否由基站计算终端设备的第一重要性权重。
其中,全局数据的分布为进行联邦训练时与基站相关的多个终端设备对应的本地数据的分布。换一种说法,基站收集进行联邦训练时与基站相关的所有终端设备对应的本地数据的分布,进行整合,得到全局数据的分布。
S320,基站向终端设备发送第一指示信息,该第一指示信息用于指示终端设备发送第一参数,该第一参数用于基站计算第一重要性权重。
对应的,终端设备接收第一指示信息。
可选的,该第一参数包括终端设备的本地数据的分布。
可选的,该第一参数还包括终端设备的本地数据对应的梯度。该梯度为终端设备根据本地数据计算得出的。
S330,终端设备根据第一指示信息向基站发送第一参数。
对应的,基站接收终端设备发送的第一参数。
S340,基站根据第一参数计算第一重要性权重。
可选的,基站根据进行联邦训练时与基站关联的多个终端设备的本地数据的分布确定全局数据的分布。
可选的,基站根据第一参数和全局数据的计算第一重要性权重。
图4为基站计算第一重要性权重的方法。如图4所示,该方法包括:
S410,基站根据终端设备的本地数据的分布计算终端设备中多个类别的数据样本中每个类别的数据样本的梯度利普希茨系数。
S420,基站根据多个类别数据样本中的每个类别的数据样本的梯度利普希茨系数计算多个类别数据样本中的每个类别的数据样本的重要性权重。
需要说明的是,在图5所示的通信方法中,终端设备计算第二重要性权重的步骤与图4示出的步骤相类似,只需将基站替换为终端设备即可。下面的描述中将不再赘述。
在一种实施例中,该第一重要性权重满足以下条件:
Figure PCTCN2022109277-appb-000016
其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述α N,n为每个类别的梯度希普利茨偏差,Γ N
Figure PCTCN2022109277-appb-000017
中的最小非负值。
其中,
Figure PCTCN2022109277-appb-000018
为每个类别的梯度希普利茨偏差,所述
Figure PCTCN2022109277-appb-000019
Figure PCTCN2022109277-appb-000020
所述
Figure PCTCN2022109277-appb-000021
表示全局模型,所述
Figure PCTCN2022109277-appb-000022
表示本地模型。
S350,基站向终端设备发送第一重要性权重。
对应的,终端设备接收基站发送的第一重要性权重。
S360,终端设备根据第一重要性权重训练下一次的本地模型。
本申请实施例提供的通信方法,通过引入终端设备各类别的样本数据的重要性权重采样,以及基站与终端设备的交互机制,解决了参与联邦训练的各终端设备本地数据非独立同分布带来的模型精度较差、收敛较慢等问题。基于相关理论推导得到的重要性权重不仅考虑了不同类别样本的分布不均衡,还将影响模型收敛性的梯度利普希茨系数考虑在内,使得不同类别样本的模式特征被本地模型学习的同时也保证了本地模型间的偏离可控。此外,重要性权重的更新部署于基站,能够充分利用基站的算力优势,解放终端设备的算力。
图5所示是本申请实施例提供的通信方法的另一例。如图5所示,该方法包括:
S510,基站向终端设备发送第二指示信息,该第二指示信息指示终端设备发送第二参数,该第二参数用于计算第二重要性权重。
对应的,终端设备接收基站发送的第二指示信息。
具体的,第二重要性权重为终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重。
可选的,该第二参数包括终端设备的本地数据的分布。
可选的,该第二参数还包括终端设备的本地数据对应的梯度。该梯度为终端设备根据本地数据计算得出的。
S520,终端设备根据第二指示信息向基站发送第二参数。
对应的,基站接收终端设备发送的第二参数。
可选的,基站根据第二参数确定全局数据的分布,该全局数据的分布可以参考图3对应的方法中全局数据的分布的描述,再此不再赘述。
S530,基站确定由终端设备计算第二重要性权重。
S540,基站向终端设备发送第三指示信息,该第三指示信息指示由终端设备计算第二重要性权重。
对应的,终端设备接收基站发送的第三指示信息。
S550,终端设备根据第二参数计算第二重要性权重。
可选的,该方法还可以包括:终端设备接收基站发送的全局数据的分布,根据第二参数和全局数据的分布计算第二重要性权重。
在某些实施例中,第二重要性权重满足以下条件:
Figure PCTCN2022109277-appb-000023
其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述α N,n为每个类别的梯度希普利茨偏差,Γ N
Figure PCTCN2022109277-appb-000024
中的最小非负值。
S560,终端设备根据第二重要性权重训练下一次的本地模型。
本申请实施例提供的通信方法,对算力有剩余或者对于联邦更新时效性不强的终端设备,其可以在本地完成重要性权重的更新,减轻了系统调度的压力,进一步提高整体效率。
图6所示是本申请实施例提供的通信方法的另一例。如图6所示,该方法包括:
S610,系统进行初始化。
具体的,该初始化包括系统参数的预设,终端设备的本地模型的初始化,终端设备对应的各类别重要性权重的初始化,训练超参数的配置。
S620,基站为当前全局周期进行端云交互预设。
可选的,该中心服务器可以为基站。
S630,多个终端设备中的各终端设备分别对本地训练数据按照重要性权重采样预设数量的样本用于更新本地模型。
例如,第n个终端设备基于重要性权重对本地模型进行训练的流程可以为:
步骤6-1,第n个终端设备按照重要性权重对每类数据样本进行采样,形成用于训练本地模型的数据批次batch;
步骤6-2,第n个终端设备依次对采样得到的数据batch进行反向传播,更新本地模型;
步骤6-3,第n个终端设备检查是否完成本地训练轮次,若完成则结束,若未完成则 转至步骤6-1,继续该训练的流程。
S640,多个终端设备中的各终端设备分别完成本地训练轮次后,将本次的本地模型上传到基站。
S650,基站根据各终端设备的本地模型根据预设的联邦更新规则进行模型聚合,得到本次的全局模型。
S660,基站根据各终端设备本次的本地模型和本次的全局模型,在一个预设的数据集上,估计各终端设备的各个类别的数据样本在对应的终端设备本次本地模型上的梯度希普利茨系数。
其中,预设的数据集可以为预先设定的与各终端设备无关的数据集,也可以为在各终端设备的数据中截取的一段数据集。
例如,基站可以估计第n个终端设备的多个类别的数据样本中的每个类别的数据样本在第n个终端设备本次本地模型上的梯度希普利茨系数。
S670,基站根据梯度希普利茨系数计算重要性权重。
或者说,基站根据梯度希普利茨系数更新本次重要性权重。
例如,如图7所示为基站计算第n个终端设备的重要性权重的流程的示意图。如图7所示,该流程可以为:
步骤710,基站初始化候选参数集为空,将类别标号置0;
步骤720,基站按照公式按公式计算当前类别的梯度利普希茨系数偏差
Figure PCTCN2022109277-appb-000025
若偏差
Figure PCTCN2022109277-appb-000026
小于0,则将
Figure PCTCN2022109277-appb-000027
加入候选参数集,并将类别标号加一,否则直接将类别标号加一;
步骤730,基站判断是否遍历所有类比,若是则进行步骤740,若否则跳转至步骤720;
步骤740,基站选择参数候选集中的最小非负值为最优参数Γ N,进而得到第n个终端设备中各类别数据样本的重要性权重
Figure PCTCN2022109277-appb-000028
S680,基站将本次全局模型和对应的重要性权重发送给多个终端设备中的每个终端设备。
S690,基站检查是否完成全局训练轮次,若完成则结束模型训练,若未完成则跳转至S620进行循环。
本申请实施例提供的方法,通过引入终端设备各类别的样本数据的重要性权重采样,以及基站与终端设备的交互机制,解决了各终端设备本地数据非独立同分布带来的模型精度较差、收敛较慢等问题。基于相关理论推导得到的重要性权重不仅考虑了不同类别样本的分布不均衡,还将影响模型收敛性的梯度利普希茨系数考虑在内,使得不同类别样本的模式特征被本地模型学习的同时也保证了本地模型间的偏离可控。此外,重要性权重的更新部署于基站,能够充分利用基站的算力优势,解放终端设备的算力。
图8所示是本申请实施例提供的通信方法的另一例。如图8所示,该方法包括:
S810,系统进行初始化。
具体的,该初始化包括系统参数的预设,终端设备的本地模型的初始化,终端设备对应的各类别重要性权重的初始化,训练超参数的配置。
S820,基站为当前全局周期进行端云交互预设。
S830,多个终端设备中的各终端设备分别对本地训练数据按照重要性权重采样预设数量的样本用于更新本地模型。
例如,第n个终端设备基于重要性权重对本地模型进行训练的流程可以参照步骤S630中的描述,在此不再赘述。
S840,多个终端设备中的各终端设备分别完成本地训练轮次后,将本次的本地模型上传到基站。
S850,基站根据各终端设备的本地模型根据预设的联邦更新规则进行模型聚合,得到本次的全局模型。
S860,基站向终端设备发送指示信息,指示终端设备进行重要性权重的计算。
需要说明的是,该步骤是可选的。
S870,终端设备从基站获取本次的全局模型和辅助参数。
S880,终端设备进行重要性权重的计算。
需要说明的是,重要性权重的计算可以参考S670中的描述,只需将基站替换为终端设备即可。
S890,基站检查是否完成全局训练轮次,若完成则结束模型训练,若未完成则跳转至S820进行循环.
本申请实施例提供的通信方法,对算力有剩余或者对于联邦更新时效性不强的分布式用户,其可以在本地完成重要性权重的更新,减轻了系统调度的压力,进一步提高整体效率。
图9为本申请实施例提供的通信方法的效果的示意图的一例。如图9所示,为模型准确率随通信轮次的对比曲线,在所设置的非独立同分布数据分布下,经25轮联邦更新,全局模型在标准测试集上的分类准确率平均为64%,仅比利用全部非独立同分布数据集中式训练得到的模型低约6%。
图10是本申请实施例提供的通信方法的效果的示意图的一例。如图10所示,针对数据采样效率,考虑本地数据部分参与训练的情况,即在每轮本地训练迭代中,仅有一定比例的数据量被采样用于更新本地模型。随着采样比例的降低,本申请实施例所设计的联邦学习算法的全局模型分类准确率下降较小,应对有本地设备计算资源、运算时间首先或样本不足的能力也明显强于随机采样方法。
图11是本申请实施例提供的通信装置的一例的示意图。如图11所示,该通信装置1100包括收发单元1110和处理单元1120。
在某些实施例中,该通信装置1100可以用于实现上述任意方法中涉及终端设备的功能。例如,该通信装置1100可以与终端设备相对应。
该通信装置1100可以为终端设备,并执行上述述方法实施例中由终端设备执行的步骤。收发单元1110可以用于支持通信装置1100进行通信,例如执行上述方法实施例中由终端设备执行的发送和/或接收的动作,处理单元1120可以用于支持通信装置1100执行上述方法实施例中的处理动作,例如执行上述方法实施例中由终端设备执行的处理动作。
可选的,该通信装置还可以包括存储单元1130(图11中未示出),用于存储该通信装置的程序代码和数据。
在某些实施例中,该通信装置1100可以用于实现上述任意方法中涉及基站的功能。例如,该通信装置1100可以与基站相对应。
该通信装置1100可以为基站,并执行上述述方法实施例中由基站执行的步骤。收发单元1110可以用于支持通信装置1100进行通信,例如执行上述方法实施例中由基站执行的发送和/或接收的动作,处理单元1120可以用于支持通信装置1100执行上述方法实施例中的处理动作,例如执行上述方法实施例中由基站执行的处理动作。
可选的,该通信装置还可以包括存储单元1130(图11中未示出),用于存储该通信装置的程序代码和数据。
图12是本申请实施例提供的通信装置的另一例的示意图。如图12所示,该装置1200包括:收发器1210、处理器1220和存储器1230。该存储器1230,用于存储指令。该处理器1220与存储器1230耦合,用于执行存储器中存储的指令,以执行上述本申请实施例提供的方法。
具体的,该装置1200中的收发器1210可以对应于装置1100中的收发单元1110,该通信装置1200中的处理器1220可以对应于通信装置1100中的处理单元1120。
应理解,上述存储器1210和处理器1220可以合成一个处理装置,处理器1220用于执行存储器1230中存储的程序代码来实现上述功能。具体实现时,该存储器1230也可以集成在处理器1220中,或者独立于处理器1220。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计 算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (27)

  1. 一种模型训练的方法,其特征在于,包括:
    第一设备根据终端设备的本地数据的分布计算所述终端设备中多个类别的数据样本中的每个类别的数据样本的梯度利普希茨系数;
    所述第一设备根据所述多个类别数据样本中的每个类别的数据样本的梯度利普希茨系数计算所述多个类别数据样本中的每个类别的数据样本的重要性权重,所述多个类别的数据样本的重要性权重用于所述终端设备训练下一次的本地模型。
  2. 根据权利要求1所述的方法,其特征在于,所述重要性权重满足以下条件:
    Figure PCTCN2022109277-appb-100001
    其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述
    Figure PCTCN2022109277-appb-100002
    为每个类别的梯度希普利茨偏差,所述
    Figure PCTCN2022109277-appb-100003
    所述
    Figure PCTCN2022109277-appb-100004
    表示全局模型,所述
    Figure PCTCN2022109277-appb-100005
    表示本地模型,所述所述Γ N
    Figure PCTCN2022109277-appb-100006
    中的最小非负值。
  3. 根据权利要求1所述的方法,其特征在于,所述第一设备为基站。
  4. 根据权利要求1所述的方法,其特征在于,所述第一设备为所述终端设备。
  5. 一种通信方法,其特征在于,包括:
    基站确定要计算第一重要性权重,所述第一重要性权重为终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重;
    所述基站向所述终端设备发送第一指示信息,所述第一指示信息用于指示所述终端设备发送第一参数,所述第一参数用于所述基站计算所述第一重要性权重;
    所述基站接收所述终端设备发送的所述第一参数;
    所述基站根据所述第一参数计算所述第一重要性权重;
    所述基站向所述终端设备发送所述第一重要性权重。
  6. 根据权利要求5所述的方法,其特征在于,所述第一参数包括所述终端设备的本地数据的分布。
  7. 根据权利要求6所述的方法,其特征在于,所述第一参数还包括所述终端设备的本 地数据对应的梯度。
  8. 根据权利要求5至7中任一项所述的方法,其特征在于,所述方法还包括:
    所述基站确定全局数据的分布,所述全局数据的分布为进行联邦训练时与所述基站关联的多个终端设备对应的本地数据的分布;
    所述基站根据所述第一参数计算所述第一重要性权重,包括:
    所述基站根据所述第一参数和所述全局数据的分布计算所述第一重要性权重。
  9. 根据权利要求5至8中任一项所述的方法,其特征在于,所述第一重要性权重满足以下条件:
    Figure PCTCN2022109277-appb-100007
    其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与所述基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述α N,n为每个类别的梯度希普利茨偏差,所述Γ N
    Figure PCTCN2022109277-appb-100008
    中的最小非负值。
  10. 一种通信方法,其特征在于,包括:
    终端设备接收基站发送的第一指示信息,所述第一指示信息用于指示所述第一指示信息用于指示所述终端设备发送第一参数,所述第一参数用于所述基站计算所述第一重要性权重;
    所述终端设备根据所述第一指示信息向所述基站发送所述第一参数;
    所述终端设备接收所述基站发送的所述第一重要性权重;
    所述终端设备根据所述第一重要性权重训练下一次的本地模型。
  11. 根据权利要求10所述的方法,其特征在于,所述第一参数包括所述终端设备的本地数据的分布。
  12. 根据权利要求11所述的方法,其特征在于,所述第一参数还包括所述终端设备的本地数据对应的梯度。
  13. 根据权利要求10至12中任一项所述的方法,其特征在于,所述第一重要性权重满足以下条件:
    Figure PCTCN2022109277-appb-100009
    其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与所述基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设 备,所述α N,n为每个类别的梯度希普利茨偏差,所述Γ N
    Figure PCTCN2022109277-appb-100010
    中的最小非负值。
  14. 一种通信方法,其特征在于,包括:
    基站向终端设备发送第二指示信息,所述第二指示信息指示所述终端设备发送第二参数,所述第二参数用于计算第二重要性权重,所述第二重要性权重为所述终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重;
    所述基站接收所述终端设备发送的所述第二参数;
    所述基站确定由所述终端设备计算所述第二重要性权重;
    所述基站向所述终端设备发送第三指示信息,所述第三指示信息指示由所述终端设备计算所述第二重要性权重。
  15. 根据权利要求14所述的方法,其特征在于,所述第二参数包括所述终端设备的本地数据的分布。
  16. 根据权利要求15所述的方法,其特征在于,所述第二参数还包括所述终端设备的本地数据对应的梯度。
  17. 根据权利要求14至16中任一项所述的方法,其特征在于,所述方法还包括:
    所述基站确定全局数据的分布,所述全局数据的分布为进行联邦训练时与所述基站关联的多个终端设备对应的本地数据的分布;
    所述基站向所述终端设备发送所述全局数据的分布。
  18. 根据权利要求14至17中任一项所述的方法,其特征在于,所述第二重要性权重满足以下条件:
    Figure PCTCN2022109277-appb-100011
    其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与所述基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述α N,n为每个类别的梯度希普利茨偏差,所述Γ N
    Figure PCTCN2022109277-appb-100012
    中的最小非负值。
  19. 一种通信方法,其特征在于,包括:
    终端设备接收基站发送的第二指示信息,所述第二指示信息指示所述终端设备向所述基站发送第二参数,所述第二参数用于计算第二重要性权重,所述第二重要性权重为所述终端设备对应的多个类别的数据样本中的每个类别的数据样本的重要性权重;
    所述终端设备根据所述第二指示信息向所述基站发送第二参数;
    所述终端设备接收所述基站发送的第三指示信息,所述第三指示信息指示由所述终端设备计算所述第二重要性权重;
    所述终端设备根据所述第二参数计算所述第二重要性权重;
    所述终端设备根据所述第二重要性权重训练下一次的本地模型。
  20. 根据权利要求19所述的方法,其特征在于,所述第二参数包括所述终端设备的本地数据的分布。
  21. 根据权利要求20所述的方法,其特征在于,所述第二参数还包括所述终端设备的本地数据对应的梯度。
  22. 根据权利要求19至21任一项所述的方法,其特征在于,所述方法还包括:
    所述终端设备接收所述基站发送的全局数据的分布,所述全局数据的分布为进行联邦训练时与所述基站关联的多个终端设备对应的本地数据的分布;
    所述终端设备根据所述第二参数计算所述第二重要性权重,包括:
    所述终端设备根据所述第二参数和所述全局数据的分布计算所述第二重要性权重。
  23. 根据权利要求19至22中任一项所述的方法,其特征碍于,所述第二重要性权重满足以下条件:
    Figure PCTCN2022109277-appb-100013
    其中,所述p N,n表示全局数据的分布,所述全局数据的分布为进行联邦训练时与所述基站关联的多个终端设备对应的本地数据的分布,所述p n为所述终端设备对应的本地数据的分布,所述N表示所述多个终端设备的个数,所述n表示所述终端设备为第n个终端设备,所述α N,n为每个类别的梯度希普利茨偏差,所述Γ N
    Figure PCTCN2022109277-appb-100014
    中的最小非负值。
  24. 一种通信设备,其特征在于,包括:
    存储器,用于存储程序指令和数据;
    处理器,用于与所述存储器耦合,执行所述存储器中的指令,以实现如权利要求5至9中任一项所述的方法,或以实现如权利要求14至18中任一项所述的方法。
  25. 一种通信设备,其特征在于,包括:
    存储器,用于存储程序指令和数据;
    处理器,用于与所述存储器耦合,执行所述存储器中的指令,以实现如权利要求10至13中任一项所述的方法,或以实现如权利要求19至23中任一项所述的方法。
  26. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机指令,当所述计算机指令在计算机上运行时,使得所述计算机执行如权利要求5至13中任一项所述的方法,或执行如权利要求14至23中任一项所述的方法。
  27. 一种芯片,其特征在于,包括处理器和存储器,所述存储器用于存储计算机程序,所述处理其用于调用并运行所述存储器中存储的计算机介质,以执行如权利要求5至13中任一项所述的方法,或执行权利要求14或23所述的方法。
PCT/CN2022/109277 2022-07-30 2022-07-30 一种通信方法和通信装置 WO2024026583A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/109277 WO2024026583A1 (zh) 2022-07-30 2022-07-30 一种通信方法和通信装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/109277 WO2024026583A1 (zh) 2022-07-30 2022-07-30 一种通信方法和通信装置

Publications (1)

Publication Number Publication Date
WO2024026583A1 true WO2024026583A1 (zh) 2024-02-08

Family

ID=89848202

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/109277 WO2024026583A1 (zh) 2022-07-30 2022-07-30 一种通信方法和通信装置

Country Status (1)

Country Link
WO (1) WO2024026583A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871702A (zh) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 联邦模型训练方法、系统、设备及计算机可读存储介质
CN112906864A (zh) * 2021-02-20 2021-06-04 深圳前海微众银行股份有限公司 信息处理方法、装置、设备、存储介质及计算机程序产品
CN113158550A (zh) * 2021-03-24 2021-07-23 北京邮电大学 一种联邦学习方法、装置、电子设备及存储介质
WO2021179196A1 (zh) * 2020-03-11 2021-09-16 Oppo广东移动通信有限公司 一种基于联邦学习的模型训练方法、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109871702A (zh) * 2019-02-18 2019-06-11 深圳前海微众银行股份有限公司 联邦模型训练方法、系统、设备及计算机可读存储介质
WO2021179196A1 (zh) * 2020-03-11 2021-09-16 Oppo广东移动通信有限公司 一种基于联邦学习的模型训练方法、电子设备及存储介质
CN112906864A (zh) * 2021-02-20 2021-06-04 深圳前海微众银行股份有限公司 信息处理方法、装置、设备、存储介质及计算机程序产品
CN113158550A (zh) * 2021-03-24 2021-07-23 北京邮电大学 一种联邦学习方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
Shi et al. Joint device scheduling and resource allocation for latency constrained wireless federated learning
CN110113190B (zh) 一种移动边缘计算场景中卸载时延优化方法
Jiang et al. Multi-agent reinforcement learning based cooperative content caching for mobile edge networks
Ji et al. Computation offloading for edge-assisted federated learning
Luo et al. Incentive-aware micro computing cluster formation for cooperative fog computing
Chen et al. Federated learning over wireless IoT networks with optimized communication and resources
Xia et al. Federated-learning-based client scheduling for low-latency wireless communications
Yoshida et al. MAB-based client selection for federated learning with uncertain resources in mobile networks
CN111083767B (zh) 一种基于深度强化学习的异构网络选择方法
CN114390057B (zh) Mec环境下基于强化学习的多接口自适应数据卸载方法
Yang et al. Multi-armed bandits learning for task offloading in maritime edge intelligence networks
Feng et al. Energy-efficient user selection and resource allocation in mobile edge computing
CN110267274A (zh) 一种根据用户间社会信誉度选择传感用户的频谱共享方法
Yang et al. Learning-based user clustering and link allocation for content recommendation based on D2D multicast communications
CN110225493A (zh) 基于改进蚁群的d2d路由选择方法、系统、设备及介质
CN109831801B (zh) 基于深度学习神经网络的用户行为预测的基站缓存方法
Zhao et al. Reinforcement learning for resource mapping in 5G network slicing
Kuang et al. Client selection with bandwidth allocation in federated learning
WO2024026583A1 (zh) 一种通信方法和通信装置
WO2023226183A1 (zh) 一种基于多智能体协作的多基站排队式前导码分配方法
Li et al. Intelligent Content Caching and User Association in Mobile Edge Computing Networks for Smart Cities
CN107995034B (zh) 一种密集蜂窝网络能量与业务协作方法
WO2024011376A1 (zh) 人工智能ai网络功能服务的任务调度方法及装置
Liu et al. Primal–Dual Learning for Cross-Layer Resource Management in Cell-Free Massive MIMO IIoT
CN110324869A (zh) 一种用户接入方法和基站

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22953400

Country of ref document: EP

Kind code of ref document: A1