WO2024017001A1 - 模型的训练方法及通信装置 - Google Patents

模型的训练方法及通信装置 Download PDF

Info

Publication number
WO2024017001A1
WO2024017001A1 PCT/CN2023/104254 CN2023104254W WO2024017001A1 WO 2024017001 A1 WO2024017001 A1 WO 2024017001A1 CN 2023104254 W CN2023104254 W CN 2023104254W WO 2024017001 A1 WO2024017001 A1 WO 2024017001A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
parameters
child node
node device
training
Prior art date
Application number
PCT/CN2023/104254
Other languages
English (en)
French (fr)
Inventor
陈宏智
刘礼福
孙琰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2024017001A1 publication Critical patent/WO2024017001A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Definitions

  • the present application relates to the field of communications, and in particular to a model training method and a communications device.
  • each sub-node device downloads a unified model from the central node device for local training, considering the different capabilities of the sub-node devices, the efficiency of the training model is limited to the sub-node device with the worst capabilities.
  • the model Training efficiency is low.
  • This application provides a model training method and communication device to solve the problem of low efficiency of existing model training using federated learning.
  • this application provides a model training method, which method is applied to the central node device.
  • the method includes:
  • Receive capability information from multiple child node devices (child node devices participating in federated learning of the model) used to characterize the child node device's ability to train the model, and send instructions to each of the multiple child node devices when the child node device trains the model.
  • the first indication information and model of the first model parameters to be updated (determined according to the capability information of the child node devices), receiving the updated first model parameters from multiple child node devices, and using a preset fusion algorithm to
  • the global model parameters including the updated first model parameters and the second model parameters other than the first model parameters in the model) are fused to obtain the fusion parameters, and the fusion parameters are sent to multiple child node devices.
  • first, multiple child node devices send capability information to the central node device, and then the central node device sends first indication information and models to each of the multiple child node devices based on the capability information, and then, the multiple child node devices
  • the child node device trains the model based on the training set and the model parameters to be updated to obtain the updated first model parameters
  • the central node device uses a preset fusion algorithm to generate a global model of the multiple child node devices that includes the first model parameters.
  • the parameters are fused to obtain the fusion parameters, and then multiple child node devices train the model based on the fusion parameters. This allows the sub-node devices to freeze some parameters during model training and not participate in the training. This ensures that when training the model, the model training progress will not be affected by the low computing power of the central node device, and the training model will be more efficient.
  • the method also includes:
  • the model training duration of multiple child node devices is first predicted based on the capability information, and then the first model parameters of the child node devices are determined based on the model training duration, so that the determined first model parameters to be trained can be compared with the corresponding Child node devices are matched to ensure model training progress.
  • the first model parameter of the child node device is determined based on the model training duration, including:
  • the model training duration does not meet the preset duration conditions, some parameters of the selected model are determined as the first model parameters.
  • all parameters of the selected model are determined as the first model parameters. .
  • some parameters of the selected model are determined as the first model parameters, including:
  • Some parameters of the model are randomly selected and determined as the first model parameters.
  • some parameters of the model are randomly selected and determined as the first model parameters, so that the determined first model parameters to be trained can match the corresponding child node devices, ensuring the progress of model training.
  • the model includes a first network layer (the model application scenario indicates the first network layer) and a second network layer.
  • the first network layer and the second network layer are different; some parameters of the selected model are determined as
  • the first model parameters include:
  • Parameters of the first network layer and some parameters of the second network layer are selected and determined as first model parameters.
  • the parameters of the first network layer and some parameters of the second network layer are targeted and determined as the first model parameters, taking into account the requirements of the model application scenario for the network layer and ensuring model training performance.
  • the method before selecting and determining the parameters of the preset first network layer in the first network layer as the first model parameters, the method further includes:
  • the model application scenario determines the first network layer required to build the model based on the model application scenario.
  • the model application scenario determines the first network layer required to build the model based on the model application scenario, taking into account the requirements of the model application scenario for the network layer to ensure model training performance.
  • the capability information is also used to characterize the type of model that the child node device can train.
  • the method also includes:
  • the child node device whose capability information includes a model type of the model is determined as a target device for sending the first indication information and the model.
  • the child node device whose capability information includes the model type of the model is determined as the target device for sending the first instruction information and the model, ensuring that all child node devices participating in the training can train the model and ensuring the model training progress.
  • the global model parameters of multiple sub-node devices are fused through a preset fusion algorithm to obtain the fusion parameters, including:
  • the weight of the second model parameter is set to the first preset weight
  • the weight of the first model parameter is set to the second preset weight
  • weighted averaging is performed to obtain fusion. parameters
  • set the weight of the second model parameter and the weight of the first model parameter to the second preset weight, and perform weighted averaging to obtain the fusion parameters.
  • a preset fusion algorithm is designed so that the second model parameters cannot affect the fusion parameters, and the fusion parameters are calculated so that the unupdated second model parameters cannot affect model training, ensuring model training performance.
  • a preset fusion algorithm whose second model parameters can affect the fusion parameters is also designed to calculate the fusion parameters. The determined fusion parameters can participate in model training, ensuring the progress of model training.
  • this application provides a method for selecting beam information, which method is applied to sub-node devices and includes:
  • Receive the beam information input the beam information into the model (trained according to the method of the first aspect), and output the target beam information.
  • the target beam is selected through the model, and the efficiency of selecting the target beam is high.
  • this application provides a model training method, which is applied to sub-node devices, including:
  • the first model parameters to be updated of the node device are determined based on the capability information of the child node device; the model is trained based on the training set including the beam information collected by the child node device and the model parameters to be updated, and the updated first model parameters are obtained ;
  • this application provides a communication device, which may be a chip or system-on-chip of a central node device, including:
  • a receiving module used to receive capability information from multiple child node devices (child node devices participating in federated learning of the model) used to characterize the child node device's model training capabilities; a sending module, used to send a message to each child node in the multiple child node devices The device sends the first instruction information and model used to instruct the child node device to update the first model parameter when training the model, and the first model parameter of the child node device to be updated is determined based on the capability information of the child node device; the receiving module, It is also used to receive updated first model parameters from multiple sub-node devices; a processing module is used to fuse global model parameters of multiple sub-node devices through a preset fusion algorithm to obtain fusion parameters; the global model parameters include updated The last first model parameter and the second model parameter other than the first model parameter in the model; the sending module is used to send the fusion parameters to multiple child node devices.
  • the processing module is specifically used for:
  • the processing module is specifically used for:
  • model training duration does not meet the preset duration conditions, select some parameters of the model and determine them as the first model parameters; when the model training duration meets the preset duration conditions, select all the parameters of the model and determine them as the first model parameters. .
  • the processing module is specifically used for:
  • Some parameters of the model are randomly selected and determined as the first model parameters.
  • the model includes a first network layer and a second network layer; the model application scenario indicates the first network layer, and the first network layer and the second network layer are different; a processing module is specifically used to select the first network layer.
  • the parameters of the network layer and some parameters of the second network layer are determined as first model parameters.
  • processing module is also used to:
  • the model application scenario determines the first network layer required to build the model according to the model application scenario.
  • the capability information is also used to characterize the model type that the sub-node device can train, and the processing module is also used to:
  • the child node device whose capability information includes a model type of the model is determined as a target device for sending the first indication information and the model.
  • the processing module is specifically used for:
  • the weight of the second model parameter is set to the first preset weight, and the weight of the first model parameter is set to the second preset weight, and weighted averaging is performed to obtain the fusion. parameters; or, for model parameters with the same function of different sub-node devices, set the weight of the second model parameter and the weight of the first model parameter to the second preset weight, perform weighted averaging, and obtain the fusion parameters.
  • this application provides a communication device, which may be a chip or system-on-chip of a sub-node device, including:
  • the receiving module is used to receive the beam information; the processing module is used to input the beam information into the model and output the target beam information; the model is obtained according to the training of the communication device of the fourth aspect.
  • this application provides a communication device, which may be a chip or system-on-chip of a sub-node device, including:
  • the sending module is used to send capability information used to characterize the training model capability of the child node device to the central node device; the receiving module is used to receive the first model parameters from the central node device used to indicate the child node device to be updated when training the model.
  • the first indication information and model of the child node device, the first model parameters to be updated of the child node device are determined based on the capability information of the child node device; the processing module is configured to use the training set (including the beam information collected by the child node device) and the data to be updated Update the model parameters to train the model and obtain the updated first model parameters; the sending module is used to send the updated first model parameters to the central node device; the receiving module is used to receive the updated first model parameters from the central node device.
  • the present application provides a communication device.
  • the communication device includes a processor and a transceiver.
  • the processor and the transceiver are used to support the communication device in executing the method of the first aspect, the second aspect, or the third aspect.
  • the communication device may further include a memory, which stores computer instructions, and when the processor can run the computer instructions, the method of the first aspect, the second aspect, or the third aspect may be executed.
  • the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores computer instructions. When the computer instructions are executed, the method of the first aspect, the second aspect, or the third aspect is executed.
  • the present application provides a computer program product containing instructions that, when run on a computer, enable the computer to execute the method of the first aspect, the second aspect, or the third aspect.
  • the present application provides a chip, which includes a processor and a transceiver.
  • the processor and the transceiver are used to support a communication device to perform the method of the first aspect, the second aspect, or the third aspect.
  • Figure 1 is a schematic diagram of a neuron structure provided by an embodiment of the present application.
  • Figure 2 is a schematic structural diagram of a DNN neural network provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of the system architecture of a classic federated learning provided by the embodiment of this application;
  • Figure 4-a is a schematic diagram of a federated learning process provided by the embodiment of this application.
  • Figure 4-b is a schematic diagram of another federated learning process provided by the embodiment of this application.
  • Figure 5 is a schematic diagram of another federated learning process provided by an embodiment of this application.
  • Figure 6 is a schematic diagram of a differentiated network parameter update principle provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of a communication system provided by an embodiment of the present application.
  • Figure 8 is a schematic flow chart of a model training method provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of a model network structure provided by an embodiment of the present application.
  • Figure 10 is a schematic flowchart of another model training method provided by an embodiment of the present application.
  • Figure 11 is a structural diagram of a communication device provided by an embodiment of the present application.
  • Figure 12 is a structural diagram of another communication device provided by an embodiment of the present application.
  • Figure 13 is a structural diagram of another communication device provided by an embodiment of the present application.
  • Figure 14 is a structural diagram of a communication system provided by an embodiment of the present application.
  • At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • at least one of a, b, or c can mean: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .
  • words such as “first” and “second” are used to distinguish the same or similar items with basically the same functions and effects.
  • words such as “first” and “second” do not limit the number and execution order, and words such as “first” and “second” do not limit the number and execution order.
  • words such as “exemplary” or “for example” are used to represent examples, illustrations or explanations. Any embodiment or design described as “exemplary” or “such as” in the embodiments of the application is not to be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner that is easier to understand.
  • Machine learning Learning models or rules from raw data. There are many different machine learning methods, such as neural networks, decision trees, support vector machines, etc.
  • Neural network This refers to artificial neural network, which is a mathematical model that imitates the behavioral characteristics of animal neural networks and performs distributed parallel information processing. It is a special form of AI model.
  • Data set Data used for model training, verification and testing in machine learning. The quantity and quality of data will affect the effectiveness of machine learning.
  • Model training By selecting an appropriate loss function and applying an optimization algorithm to train the model parameters, the loss function value is minimized.
  • Loss function used to measure the difference between the predicted value of the model and the true value.
  • Model testing After training, test data is applied to evaluate model performance.
  • Model application Apply the trained model to solve practical problems.
  • AI can make better decisions than humans in more and more complex scenarios. It has undoubtedly opened up new horizons for intelligent network construction, brought unprecedented new opportunities to network development, and also restructured telecommunications networks. The numerous difficulties and challenges encountered during the transformation process provide efficient and accelerated solutions.
  • Machine Learning is an important technical way to realize artificial intelligence.
  • Machine learning can be divided into supervised learning, unsupervised learning, and reinforcement learning.
  • Supervised learning uses machine learning algorithms to learn the mapping relationship between sample values and sample labels based on the collected sample values and sample labels, and uses a machine learning model to express the learned mapping relationship.
  • the process of training a machine learning model is the process of learning this mapping relationship.
  • the received signal containing noise is the sample
  • the real constellation point corresponding to the signal is the label.
  • Machine learning hopes to learn the mapping relationship between the sample and the label through training, that is, the machine learning model learns a signal detector.
  • model parameters are optimized by calculating the error between the model's predicted value and the true label.
  • the learned mapping can be applied to predict the sample label for each new sample.
  • the mapping relationships learned by supervised learning can include linear mapping and nonlinear mapping. According to the type of labels, learning tasks can be divided into classification tasks and regression tasks.
  • Unsupervised learning only relies on the collected sample values and applies algorithms to discover the inherent patterns of the samples.
  • model parameters are optimized by calculating the error between the model's predicted value and the sample itself.
  • Self-supervised learning can be used for signal compression and decompression recovery applications.
  • Common algorithms include autoencoders and adversarial generative networks.
  • Reinforcement learning is different from supervised learning in that it is a type of algorithm that learns strategies to solve problems by interacting with the environment. Different from supervised and unsupervised learning, reinforcement learning problems do not have clear "correct" action label data.
  • the algorithm needs to interact with the environment, obtain the reward signal fed back by the environment, and then adjust the decision-making action to obtain a larger reward signal value.
  • the reinforcement learning model adjusts the downlink transmit power of each user based on the total system throughput rate fed back by the wireless network, thereby hoping to obtain a higher system throughput rate.
  • the goal of reinforcement learning is also to learn the mapping relationship between the state of the environment and the preferred decision action. But because the label of the "correct action" cannot be obtained in advance, the network cannot be optimized by calculating the error between the action and the "correct action”. Training in reinforcement learning is achieved through iterative interaction with the environment.
  • Deep Neural Network is a specific implementation form of machine learning.
  • DNN generally uses supervised learning or unsupervised learning strategies to optimize model parameters.
  • neural networks can theoretically approximate any continuous function, which makes the neural network capable of learning any mapping.
  • Traditional communication systems require the use of rich expert knowledge to design communication modules, while deep learning communication systems based on DNN can automatically discover implicit pattern structures from a large number of data sets, establish mapping relationships between data, and obtain better results than traditional modeling. Method performance.
  • the corresponding execution effect of a neuron can be DNN generally has a multi-layer structure. Each layer of DNN can contain multiple neurons. The input layer processes the values received by the neurons and then passes them to the hidden layer in the middle. Similarly, the hidden layer then passes the calculation results to the final output layer to generate the final output of the DNN.
  • the neural network structure of the DNN is shown in Figure 2.
  • DNN generally has more than one hidden layer, and hidden layers often directly affect the ability to extract information and fit functions. Increasing the number of hidden layers of DNN or expanding the width of each layer can improve the function fitting ability of DNN.
  • the weighted value in each neuron is the parameter of the DNN network model.
  • the model parameters are optimized through the training process, so that the DNN network has the ability to extract data features and express mapping relationships.
  • DNN can be divided into Feedforward Neural Network (FNN), Convolutional Neural Networks (CNN) and Recurrent Neural Network (RNN).
  • FNN Feedforward Neural Network
  • CNN Convolutional Neural Networks
  • RNN Recurrent Neural Network
  • Figure 2 shows an FNN network, which is characterized by complete connections between neurons in adjacent layers. This makes FNN usually require a large amount of storage space and leads to high computational complexity.
  • CNN is a neural network specially designed to process data with a grid-like structure. For example, time series data (time axis off Both discrete sampling) and image data (two-dimensional discrete sampling) can be considered as data with a similar grid structure.
  • CNN does not apply all the input information at once to perform operations, but uses a fixed-size window to intercept part of the information for convolution operations, which greatly reduces the amount of calculation of model parameters.
  • each window can use different convolution kernel operations, which allows CNN to better extract the features of the input data.
  • RNN is a type of DNN network that uses feedback time series information. Its input includes the new input value at the current moment and its output value at the previous moment. RNN is suitable for obtaining sequence features that are correlated in time, and is especially suitable for applications such as speech recognition and channel coding and decoding.
  • the above-mentioned FNN, CNN, and RNN are common neural network structures, and these network structures are all constructed based on neurons.
  • each neuron performs a weighted summation operation on its input value, and the weighted summation result generates an output through a nonlinear function. Then we combine the weights of the weighted summation operation of the neurons in the neural network and the nonlinear function.
  • the functions are called parameters of the neural network.
  • the parameters of all neurons in a neural network constitute the parameters of this neural network.
  • DNN can be trained using Federated Learning (FL).
  • Federated Learning is a machine learning framework. Its original intention is to effectively help multiple organizations use data and machine learning while meeting the requirements of user privacy protection and data security. Modeling, within the federated learning framework, what is transferred between nodes is not the data itself, but the intermediate results obtained during training, such as model parameters or gradients.
  • Figure 3 is a schematic diagram of the system architecture of classic federated learning.
  • the system architecture for classic federated learning can include a central node 301 and multiple distribution nodes (distribution node 302a to distribution node 302c. Distribution nodes 302a to 302c are all communicatively connected with the central node 301.
  • the distribution nodes 302a to 302c each belong to different institutions or companies. Each of the distribution nodes 302a to 302c includes a distributed data set about the application environment of the distribution node.
  • Federated learning is a distributed machine learning paradigm. It can effectively solve the problem of data silos, allowing participants to jointly model without sharing data, and can technically break data silos and achieve AI collaboration. According to the different distribution of data sources among the participating parties, federated learning can be divided into Three categories: horizontal federated learning, vertical federated learning, and federated transfer learning.
  • Horizontal federated learning refers to splitting the data sets horizontally (i.e. user dimension) when the user characteristics of two data sets overlap more and the users overlap less, and extract the same user characteristics but different users. that part of the data for training.
  • Vertical federated learning refers to splitting the data set vertically (i.e., feature dimension) when the users of the two data sets overlap more but the user features have less overlap, and extract the same users but different user features. that part of the data for training.
  • Federated transfer learning means that when there is little overlap between users and user features in the two data sets, the data is not segmented, but transfer learning can be applied to overcome the situation of insufficient data or labels.
  • the base station side is performing codebook-based synchronization signals and physical broadcast signal (Physical broadcast channel, PBCH) blocks, that is, synchronization signal blocks (Synchronization Signal Block, SSB) or channel state information reference signals ( When scanning the beam of Channel state information-reference signal (CSI-RS), the channels between users and base stations at different locations are different.
  • the user terminal (referred to as the user) needs to measure the parameters of the received SSB or CSI-RS beam, such as Measure the physical layer reference signal receiving power (L1-reference signal receiving power, L1-RSRP), and use the beam corresponding to the maximum RSRP value as the target beam and feed it back to the base station.
  • L1-reference signal receiving power L1-reference signal receiving power
  • the user terminal applies AI/ML to train a model for beam selection, for example, using (some or all of) a plurality of received SSB/CSI-RS signals as input, or a plurality of received SSB/CSI-RS signals.
  • the intensity (RSRP) (part or all of it) is taken as input to infer the target beam ID and feed it back to the base station.
  • Each user can collect their own receiving beam/channel information and corresponding target beam ID as samples for training the above AL/ML model (i.e. local samples), but the number of samples that each user can collect is limited, and the target beam in the training sample ID is only a subset of the actual SSB/CSI-RS codebook, resulting in limited performance of models trained by users using only local data.
  • federated learning can be applied.
  • users correspond to child nodes and network devices correspond to central nodes.
  • the central node delivers a global model to each user participating in federated learning.
  • Each user uses local data to train the global model, obtains a local model, and sends the parameter information of the local model, such as gradients, weights, etc. (encrypted) to the network.
  • Model aggression, MA model aggression
  • federated learning architecture applies AI to wireless networks. It must first collect training data, such as label information.
  • some user local data may involve user privacy issues, such as user equipment in beam management issues and positioning scenarios.
  • UE User Equipment
  • horizontal federated learning is suitable.
  • the main process of horizontal federated learning is as follows:
  • Step 1 The central node (such as a base station or AI network element) delivers the global model to sub-nodes (such as UE).
  • sub-nodes such as UE
  • Step 2 Each child node applies local data to train the model, and uploads the gradients, weights or local models (after encryption) to the central node.
  • the central node aggregates the gradients or local models of each child node to update the global model parameters.
  • the federated learning process takes three child nodes (users 1, 2, and 3) as an example.
  • the central node sends the initial model to the child nodes (i.e., the downloaded global model M 0 in the figure).
  • Users 1, 2 and 3 train (global) models based on local data 1, 2, and 3 respectively to obtain models M 1 , M 2 , and M 3 , and send the models to the central node.
  • the central node models the received models of each child node. Fusion results in an updated global model, for example, using the fusion algorithm: FedAvg (Federated averaging), that is After obtaining the new global model, the central node sends the updated M 0 to each child node again.
  • FedAvg Federated averaging
  • the child nodes conduct a new round of local training based on the received global model, and then send the trained model to the central node. Iterate like this until the model converges. Users can upload the trained local model to the central node, or only upload the network parameters of the trained local model, such as gradients or weights.
  • the central node can restore each local model or directly fuse the network parameters.
  • computing power can be understood as the computing power or computing speed of the terminal device.
  • An indicator that can measure the computing power of the terminal device is the number of floating point operations per second (FLOPS, floating point operations per second). Because under the architecture of federated learning, child nodes (that is, users participating in a certain round of federated learning) need to assume the responsibility of training the network. Under normal circumstances, different models have different complexity. A commonly used indicator to measure the complexity of a model is floating point operations (FLOPs, floating point operations).
  • the different complexity of specific models can be reflected in: different network types, such as convolution network CNN, fully connected network FC, and recurrent neural network RNN; the same network type has different network depths/layers, such as multi-layer CNN or FC; neural network The number of neurons is different, such as the number of neurons in the hidden layer in the FC network. It can be seen that due to differences in computing power, different terminal devices have different abilities to train the same model, and the same terminal device requires different computing power to train different models. The following three situations may occur:
  • Case 3 The central node performs differentiated model updates for different user terminals. Specifically, the base station broadcasts multiple global models (grouped according to the computing power reported by the child nodes), or optionally, broadcasts a classification neural at the same time. The child node determines which group in the global model or classification neural it belongs to according to the preset conditions, then updates the local model corresponding to the group and reports the parameters, and at the same time reports the index of its updated model. The base station determines the global model corresponding to the parameters reported by the child nodes based on the index, updates the global model, and then broadcasts the multiple global models again in the second round of federated learning. It is equivalent to an architecture that implements multiple (parallel) federated learning.
  • the central node is responsible for unified management of multiple global models.
  • Each model corresponds to a group of computing power groups. Although it can solve the problem of the inability to train a unified model due to the difference in computing power of user equipment. problem, but has the following shortcomings: multiple models need to be interacted (broadcast) on the air interface, which increases the Air interface overhead; the amount of data required to train a single model is reduced. Grouping and delivering multiple models results in a decrease in the total amount of data used to train a single model, making it difficult to guarantee model performance.
  • the embodiment of this application proposes a federated learning architecture that supports the central node to design and issue differentiated AI/ML network parameter updates or freeze signaling based on the computing power reported by each child node to instruct the child nodes to train part of the training or the entire unified download. model method. Add the sub-node computing power feedback process to the original federated learning, as well as the above-mentioned process for the central node to perform differentiated AI/ML network parameter updates or freeze instructions to indicate.
  • Figure 6 shows a schematic diagram of update/freeze instructions based on the AI/ML network layer. All users (sub-nodes) participating in federated learning will download the AI/ML network in Figure 6, which is an AI/ML model with 4 network layers. In traditional federated learning, each child node needs to use local data to train a complete 4-layer network layer (all network parameters), obtain the model, and feed it back to the central node. In the embodiment of this application, based on the difference in computing power, the central node will instruct users with insufficient computing power to be responsible for updating some parameters in the network, while users with sufficient computing power will still update all parameters.
  • user A will only update the parameters of the network layer corresponding to the neurons in columns 1, 4, and 6, and user B will update the parameters of the network layer corresponding to the neurons in columns 1, 2, 4, and 6.
  • User C will update the parameters of all network layers corresponding to the neurons in columns 1, 2, 3, 4, 5, and 6.
  • the central node will be responsible for sending the differentiated update instructions to each corresponding child node.
  • parameter updates can also be instructed in units of neurons.
  • the instruction can indicate the network layer that the child node needs to freeze, or it can indicate the network layer that the child node needs to update.
  • Each sub-node user uses local data as a training set combined with the update/freeze instructions mentioned above to train the model.
  • the gradient/weight/network (or only the part responsible for updating) is uploaded to the central node.
  • the central node fuses the gradients/weights/networks uploaded by multiple child nodes. For example, if the FedAvg algorithm is still used for fusion, and parameter update instructions are used in network layer units, users A, B, and C correspond to models 1, 2, and 3.
  • network layer g is a frozen layer for user A, and for user B and user C are update layers.
  • the central node will not consider the parameters fed back by user A when updating the parameters of network layer g. That is, the fusion process can be expressed by a mathematical formula as:
  • M 0,g represents the fused model parameters
  • M 1,g , M 2,g , M 3,g represents the g-th layer network of each user's model
  • ⁇ k,g ⁇ 0,1 ⁇ reflects the k-th layer
  • the update instruction corresponding to the g-th layer network of the child node model, that is, ⁇ k,g 1 means that the model has updated the g-th layer parameters
  • the communication method provided by the embodiments of this application can be applied to various communication systems, such as: long term evolution (LTE) system, fifth generation (5th generation, 5G) mobile communication system, wireless fidelity (WiFi) ) system, future communication system, or a system integrating multiple communication systems, etc., are not limited by the embodiments of this application.
  • LTE long term evolution
  • 5G fifth generation
  • WiFi wireless fidelity
  • future communication system or a system integrating multiple communication systems, etc.
  • 5G can also be called new radio (NR).
  • the communication method provided by the embodiments of the present application can be applied to various communication scenarios.
  • it can be applied to one or more of the following communication scenarios: enhanced mobile broadband (eMBB), ultra-reliable low-latency communication (ultra Reliable low latency communication (URLLC), machine type communication (MTC), massive machine type communications (mMTC), device to device (D2D), vehicle outreach (vehicle to everything (V2X), vehicle to vehicle (V2V), and the Internet of things (IoT), etc.
  • eMBB enhanced mobile broadband
  • URLLC ultra-reliable low-latency communication
  • MTC machine type communication
  • mMTC massive machine type communications
  • D2D device to device
  • vehicle outreach vehicle to everything
  • V2X vehicle to vehicle
  • V2V vehicle to vehicle
  • IoT Internet of things
  • FIG. 7 is a schematic diagram of a communication system provided by an embodiment of the present application.
  • the communication system may include network equipment and multiple terminals.
  • the network device corresponds to the central node in the federated learning architecture, and the terminal corresponds to the sub-node in the federated learning architecture.
  • the network device delivers a global model to each terminal participating in federated learning.
  • Each user uses local data to train the global model, obtains a local model, and sends the parameter information of the local model, such as gradients, weights, etc. (encrypted) to the network.
  • Devices and network devices perform model fusion (model aggression, MA) to update the global model, and then send the updated global model to each device.
  • the user continues to update the local model, outputs the parameter information of the updated local model and sends it to the network device, so Multiple iterations are performed until the model converges.
  • model aggression model aggression
  • Figure 7 is only an exemplary framework diagram, and the number of network devices and terminals included in Figure 7 is not limited. In addition to the functional nodes shown in Figure 7, other nodes may also be included, such as core network equipment, gateway equipment, application servers, etc., without limitation. Network devices communicate with core network devices through wired or wireless methods, such as next generation (NG) interfaces.
  • NG next generation
  • the terminal device involved in the embodiments of this application may also be called a terminal, and may be a device with wireless transceiver functions.
  • Terminals can be deployed on land, including indoors, outdoors, handheld, and/or vehicle-mounted; they can also be deployed on water (such as ships, etc.); they can also be deployed in the air (such as aircraft, balloons, satellites, etc.).
  • the terminal device may be a user device.
  • Terminal devices include handheld devices, vehicle-mounted devices, wearable devices or computing devices with wireless communication capabilities.
  • the terminal device may be a mobile phone, a tablet computer, or a computer with wireless transceiver function.
  • the terminal device can also be a virtual reality (VR) terminal device, an augmented reality (AR) terminal device, a wireless terminal in industrial control, a wireless terminal in driverless driving, a wireless terminal in telemedicine, or a smart device.
  • VR virtual reality
  • AR augmented reality
  • the device used to implement the function of the terminal device may be a terminal device; it may also be a device that can support the terminal device to implement the function, such as a chip system.
  • the device can be installed in a terminal device or used in conjunction with the terminal device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • the device used to implement the functions of the terminal device is a terminal device. Taking the terminal device as a UE as an example, the technical solution provided by the embodiments of the present application is described.
  • the network equipment involved in the embodiments of this application includes a base station (Base Station, BS), which may be a device deployed in a radio access network (radio access network, RAN) and capable of communicating with terminal equipment.
  • BS Base Station
  • RAN radio access network
  • the wireless access network may also be referred to as an access network for short.
  • This network device may also be called an access network device.
  • Base stations may come in many forms, such as macro base stations, micro base stations, relay stations, or access points.
  • the base station involved in the embodiment of this application can be a base station in a 5G system, a base station in a Long Term Evolution (LTE) system, or a base station in other systems, without limitation.
  • LTE Long Term Evolution
  • the base station in the 5G system can also be called the Transmission Reception Point (TRP) or the Next Generation Node B (Next Generation Node B, gNB).
  • the base station in the embodiment of the present application may be an integrated base station, or may be a base station including a centralized unit (Centralized Unit, CU) and a distributed unit (Distributed Unit, DU).
  • a base station including CU and DU may also be called a base station in which CU and DU are separated, for example, the base station includes gNB-CU and gNB-DU.
  • CU can also be separated into CU Control Plane (CU-CP) and CU User Plane (CU User Plane, CU-UP).
  • the base station includes gNB-CU-CP, gNB-CU-UP and gNB -DU.
  • the device used to implement the function of the network device may be a network device; it may also be a device that can support the network device to implement the function, such as a chip system.
  • the device can be installed in a network device or used in conjunction with a network device.
  • the device for realizing the functions of the network device is a network device, and the network device is a base station as an example to describe the technical solution provided by the embodiments of the present application.
  • AI network elements or modules may also be introduced into the network. If an AI network element is introduced, it corresponds to an independent network element; if an AI module is introduced, it can be located inside a certain network element, and the corresponding network element can be a terminal device or a network device, etc.
  • the following takes the device for realizing the functions of the network equipment as a base station as an example and the device for realizing the functions of the terminal equipment as a terminal as an example.
  • the description provided by the embodiments of the present application is as follows: Methods.
  • Figure 8 shows a schematic flowchart of the model training method provided by the embodiment of the present application. As shown in Figure 8, the method may include the following steps:
  • Multiple sub-node devices send capability information to the central node device.
  • the central node device receives capability information from multiple sub-node devices.
  • the capability information of multiple child node devices is used to characterize the capabilities of multiple child node devices to train the model; the multiple child node devices are multiple child node devices that participate in federated learning of the model.
  • the information sent by the child node device to the central node device is called uploading, and the information sent by the central node device to the child node device is called downlink.
  • the central node configures the downlink resources used by the central node to deliver the first indication information and the model to the child nodes.
  • the downlink resource may be a control channel resource, such as a physical downlink control channel (PDCCH) resource. It may also be a data channel resource, such as a physical downlink share channel (PDSCH) resource. Specifically, it includes frequency domain resource block number and starting position, sub-band number, sub-band bandwidth, frequency hopping parameters, modulation and coding scheme (Modulation and coding scheme, MCS) and other parameters.
  • PDCCH physical downlink control channel
  • PDSCH physical downlink share channel
  • Models can be delivered by the central node through broadcast or multicast.
  • the model can be delivered by broadcast. Due to the characteristics of broadcast, child nodes not participating in federated learning can also receive the broadcast information. ;
  • the central node can also broadcast the model to each sub-node base station. Similarly, other base stations are not involved.
  • Sub-nodes in federated learning can also receive broadcast information; multicast can also be used for sub-nodes participating in federated learning.
  • the sub-nodes associated with the same central node are a group and have the same group number. , configure the same downlink resources. In multicast mode, child nodes that do not participate in the federated learning will not receive the multicast information.
  • the central node can also configure uplink resources for the child nodes to report the updated first model parameters.
  • Another federated learning management node may also configure the uplink resources used by the sub-nodes for updated first model parameters and necessary signaling reporting for the central node and sub-nodes.
  • uplink resources can be control channel resources, such as physical uplink control channel (PUCCH) resources, or data channel resources, such as physical uplink share channel (PUSCH) resource.
  • PUCCH physical uplink control channel
  • PUSCH physical uplink share channel
  • the sub-node needs to send capability information to the central node device in the access network.
  • Capability information must at least include the size of the memory space of the child node device that can be used to store AI/ML models, and the computing power information of the child nodes (the computing power of running the AI/ML model, such as the FLOPS computing performance and battery status mentioned above.
  • Power is also one of the factors that affects the computing power of the terminal device) and information related to the amount of local data collected (which can help the central node estimate the time required for model training).
  • Other optional capability information may include: whether the child node device supports running AI/ML models, and supported AI/ML model types (such as CNN, RNN, full connection, random forest model, etc.).
  • the central node Only when the child node supports the AI/ML model, the central node will allow it to participate in federated learning and send the model to it.
  • Other optional capability information may include: hardware information of the child node, for example, the antenna configuration of the child node (number of antennas, polarization direction, etc.), number of radio frequency channels, sensor type (position sensor/GPS, motion sensor, etc.) and Parameters etc. Since the model is trained within the federated learning framework, the child nodes use the local data of the child nodes for training, so the child nodes do not need to report information related to the actual collected beam information or involving privacy.
  • the central node device sends the first indication information and the model to each of the plurality of child node devices.
  • the multiple child node devices receive the first indication information and the model from the central node device.
  • the first indication information is used to indicate the first model parameters to be updated when multiple child node devices train models, and the first model parameters to be updated for the multiple child node devices are determined based on the capability information of the multiple child node devices.
  • the sending of the model includes the structure of the model and the parameters of the model.
  • the structure of the model such as the number of convolutional layers in the convolutional neural network (Convolutional Neural Networks, CNN), the number of channels of each convolutional layer and the size of the convolution kernel, the number of fully connected layers and the number of neurons, recursion The number of layers in the neural network (Recurrent Neural Network, RNN), the calculation method of each state in each layer structure, etc.
  • the first indication information can be sent through signaling.
  • the following three methods can be selected: 1. Send it to the designated sub-node through unicast; 2. If the same first indication information exists in different sub-nodes, you can Send the same first instruction information to a group of child nodes through multicast mode; 3. Send it through broadcast mode.
  • Each signaling message contains the child node index (index) corresponding to the signaling. Each child node receives The broadcast signal is followed by its own index corresponding to the signaling.
  • the central node can also issue information related to model training and reporting of child nodes, such as the number of iterations required for each wheel node, learning rate, loss function, batch size, reported parameter type (model parameter or gradient) and other information.
  • model compression methods include but are not limited to model pruning, model distillation, model quantization, etc.
  • S803 Multiple child node devices train the model based on the training set and the model parameters to be updated, and obtain updated first model parameters.
  • the training set includes beam information collected by multiple sub-node devices.
  • the sub-nodes in federated learning can be base stations or user terminal equipment
  • the central node can be an independent federated learning management node or a base station that functions as a central node.
  • the global model to be trained is an AI/ML model that takes the estimated channel measurement value or the received signal itself as input and uses the preferred beam index (ID) as the output.
  • the sub-node is responsible for collecting the training set as model input during the data collection phase.
  • the channel set can include: channel measurement values or received signals and labels used for training the model (ie, preferred beam ID).
  • This label can send all possible beams (codebook-based SSB or CSI-RS beams) to the UE one by one through the base station, and the UE selects the beam index with optimal performance as the label (preferred can refer to all SSB/CSI-RS beams) The beam with the largest L1-RSRP or SNR measurement).
  • the loss function used in training is related to the application scenario and the type of model used.
  • the cross-entropy function Cross-Entropy, CE
  • BCE Binary Cross-Entropy
  • MSE Mean Squared Error
  • MAE Mean Absolute
  • the sub-nodes can select the loss function according to the application scenario, or according to the instructions issued by the central node.
  • the child node can use the received global model as the initial model for this round for training.
  • the loss function is mainly related to the local data set.
  • the MSE loss function of the k-th child node is Among them, n k is the number of samples, y k, l is the output of the l-th sample, is the label of the lth sample.
  • S804 Multiple child node devices send updated first model parameters to the central node device.
  • the central node device receives updated first model parameters from multiple child node devices.
  • the child node reports the updated first model parameters through the uplink resources described in step S801.
  • Each child node may report all parameters of the beam training model, or may only report the first model parameters participating in the update.
  • M k [m k,1 ,m k,2 ,...,m k,G ], where m k, g represents the g-th parameter of the k-th child node. All parameters can be combined and sent as a vector; or the set of differences between the current model parameters and the previous round of model parameters can be reported, that is, the gradient of each parameter.
  • ⁇ M k [ ⁇ m k,1 , ⁇ m k,2 ,..., ⁇ m k,G ].
  • the order in which the updated first model parameters are reported is consistent with the order in which the first model parameters are delivered.
  • This reporting method can be adapted to network layer granular parameter reporting or single parameter granular parameter reporting at the same time. There is no need to set different reporting modes for the two parameter modes.
  • the sub-node reports the first model parameter of the sub-node in the local model through the uplink resources configured in S801. For example, assume that the model has a total of G parameters, among which G′ parameters are the first model parameters of the k-th child node.
  • the child node can also report auxiliary information, such as the number of samples n k of the child node's local data set, model type indication information, etc.
  • the central node device fuses global model parameters of multiple sub-node devices through a preset fusion algorithm to obtain fusion parameters.
  • the global model parameters include updated first model parameters and second model parameters in the model other than the first model parameters.
  • the preset fusion algorithm can fuse the global model parameters of multiple sub-node devices to obtain fusion parameters. For example, the preset fusion algorithm can be selected to find the average, take the median, take the mode, etc.
  • the central node device sends the fusion parameters to multiple sub-node devices, and accordingly, the multiple sub-node devices receive the fusion parameters.
  • multiple child node devices train models according to the fusion parameters until the model trained according to the fusion parameters converges; otherwise, return to S802 to adjust the first instruction information and then perform S802 to S807 until the model trained according to the fusion parameters converges. Whether the model converges can be judged by the loss function.
  • the loss function can refer to the corresponding instructions in S803 and use the cross-entropy function (Cross-Entropy, CE).
  • first, multiple child node devices send capability information to the central node device, and then the central node device sends first indication information and models to each of the multiple child node devices based on the capability information, and then, Multiple child node devices train the model based on the training set and the model parameters to be updated to obtain the updated first model parameters, and the central node device uses a preset fusion algorithm to perform global training on the multiple child node devices including the first model parameters.
  • the model parameters are fused to obtain the fusion parameters, and then multiple child node devices train the model based on the fusion parameters. This allows the sub-node devices to freeze some parameters during model training and not participate in the training. This ensures that when training the model, the model training progress will not be affected by the low computing power of the central node device, and the training model will be more efficient.
  • the method further includes:
  • the central node device predicts the model training time of multiple child node devices based on the capability information.
  • the central node device needs to predict the model training time of multiple child node devices based on the capability information to determine whether the child node devices can train the model.
  • the concept of time window can be used, that is, it is required to be within a given time window that requires child nodes to feedback local model parameters/gradients (it can be a time window with a length of T 0 from time t after the model is broadcast to t+T 0 , t ⁇ 0), the child node k can complete the training of the model.
  • T k used by the child node device k to train the model can be expressed as:
  • FLOPS refers to the complexity of the model, which is determined by the model type of the model.
  • FLOPs k represents the computing power performance of the k-th sub-node device (determined by capability information). The above formula can predict the model training time of multiple sub-node devices.
  • the central node device determines the first model parameters of the child node devices according to the model training duration.
  • the first model parameter of the child node device can be determined based on the model training time.
  • the embodiment of this application adopts a strategy of training some model parameters instead of all model parameters for model training for child node devices with insufficient capabilities.
  • the central node needs to determine the first model parameter of the child node device based on the reported model training duration, that is, the model parameters that participate in training are called update parameters, and the remaining model parameters that do not participate in training are called frozen parameters.
  • the index g above represents all parameters of the g-th layer network of the model
  • the model training duration of multiple sub-node devices is first predicted based on the capability information, and then the first model parameters of the sub-node devices are determined based on the model training duration, so that the determined first model parameters to be trained can be matched with the corresponding The child node equipment matches to ensure the model training progress.
  • S808 The central node device predicts the model training duration of multiple child node devices based on the capability information, which may include:
  • parts of the model are selected accordingly.
  • the two strategies of determining partial parameters as first model parameters and determining all parameters of the selected model as first model parameters ensure model training progress while ensuring model performance.
  • S8081 Select some parameters of the model to determine as the first model parameters, which may include:
  • Some parameters of the model are randomly selected and determined as the first model parameters.
  • some parameters of the randomly selected model are determined to be parameter selection of the first model parameter corresponding to a single parameter granularity.
  • some parameters of the model are randomly selected and determined as the first model parameters, so that the determined first model parameters to be trained can be matched with the corresponding child node devices to ensure the progress of model training.
  • the model includes a first network layer and a second network layer; the model training scenario indicates the first network layer, and the first network layer and the second network layer are different; S8081: Select some parameters of the model and determine them as the first network layer.
  • Model parameters can include:
  • Parameters of the first network layer and some parameters of the second network layer are selected and determined as first model parameters.
  • the first model parameters are selected by network layer, and the corresponding network layer is granular parameter selection.
  • the first network layer is a fully connected layer as an example, considering that for a network with multiple fully connected layers, if each sub-node user dispersesly updates the parameters corresponding to some neurons in each layer, the final network training effect will also be May be difficult to guarantee. Therefore, in this embodiment of the present application, all parameters of a certain first network layer are selected as first model parameters to ensure model performance.
  • the central node device when selecting parameters of the first network layer, can determine the first network layer corresponding to the child node device based on the capability information of the child node device. When the first network layer corresponding to the child node device is used, the set of the first network layer corresponding to all child node devices covers all the first network layers to ensure the accuracy of model training.
  • the second network layer (for example, a non-fully connected layer or a fully connected layer that does not need to be updated as indicated by the sub-node device) may be selected in a random selection manner. Since the channel conditions in the same cell are relatively fixed, the fully connected layer trained by some users and responsible for finer granularity can also be used by another group of users with poor computing power in the implementation stage to reduce computing power overhead.
  • the proportion of parameters occupied by some network layers may be much higher than that of other types of network layers.
  • the AI/ML model consists of convolutional layers (Conv1, Conv2) with feature extraction functions. ) are connected in series with fully connected layers (FC1 16Neural, FC2 64Neural) responsible for the classification function.
  • FC1 16Neural, FC2 64Neural fully connected layers
  • the parameters of the fully connected layer in this model account for a high proportion of all parameters of the model.
  • the same task can also have different classification granularities. For example, in the artificial intelligence-assisted AI-assisted beam management task, if the base station side initiates SSB wide beam scanning based on the 16-code word codebook, the computing power of some user equipment supports prediction.
  • the optimal beam ID among the 16 wide beams, while the computing power of other user equipment can support direct prediction of the optimal beam ID among the 64 narrow beams, so for these two types of users at least two full connections of different neurons are required layers (16 neurons and 64 neurons) to complete the above tasks.
  • model parameters can be expressed as:
  • ⁇ nFC represents the parameter set corresponding to the neural network parameters before all fully connected layers
  • any element in the set Represents the parameter set of the non-fully connected layer in the k-th child node
  • Z + represents a positive integer
  • the dimension is G.
  • the value of G depends on whether the parameter selection corresponds to each network parameter or each network layer.
  • ⁇ FC represents the set of neural network parameters of all fully connected layers, and any element in the set Represents the set of update/freeze information of the fully connected layer in the k-th child node, with dimension L, and the value of L only corresponds to the number of fully connected layers.
  • the first model parameter corresponding to the k-th sub-node can be expressed as:
  • Random() represents a partial selection function.
  • the fully connected layer corresponds to the beam ID tags of three resolutions ⁇ 16, 64, 128 ⁇ .
  • the k-th child node corresponds to the 64-beam prediction task
  • its loss function is through 64 neural
  • the cross entropy (CE Loss) between the output after the fully connected layer of 128 neurons and the label after the softmax operation is performed, rather than the cross entropy determined based on the final output of the network (the cross entropy between the output after the fully connected layer of 128 neurons and the label entropy).
  • CE Loss cross entropy between the output after the fully connected layer of 128 neurons and the label after the softmax operation
  • the parameters of the first network layer and some parameters of the second network layer are selected specifically to determine the first model.
  • Type parameters take into account the requirements of the model application scenario on the network layer to ensure model training performance.
  • the method may further include:
  • the central node device determines the first network layer required to build the model according to the model application scenario.
  • the model is also required to have a specific network layer.
  • the central node device has 64 antennas, and different sub-node devices can complete the optimal beam prediction task among 16 wide beams based on the 16DFT codebook, based on 16DFT
  • the wide-beam codebook directly predicts the prediction task of the optimal beam ID among 64 narrow beams, as well as the prediction of higher-precision non-codebook-based steering vector beam ID prediction (such as the preset 128 or higher-precision non-normal beam ID prediction).
  • the final fully connected layer in the distributed global model needs to include a fully connected layer with 16, 64, and 128/256 neurons, that is, the model application scenario indicates a fully connected layer of 16, 64, and 128/256 neurons.
  • the connection layer, the fully connected layer of 6, 64, and 128/256 neurons is the first network layer required to build the model.
  • At least one fully connected layer required to build the model is determined as the preset fully connected layer corresponding to the child node device, taking into account the requirements of the model application scenario for the network layer and ensuring model training performance.
  • the capability information is also used to characterize the type of model that the child node device can train.
  • the method further includes:
  • the central node device determines the child node device whose capability information includes a model type of the model as a target device for sending the first indication information and the model.
  • the child node device whose capability information includes the model type of the model is determined as the target device for sending the first instruction information and the model, ensuring that all child node devices participating in the training can train the model and ensuring the progress of model training.
  • S805 The central node device fuses global model parameters of multiple child node devices through a preset fusion algorithm to obtain fusion parameters, which may include:
  • the central node device sets the weight of the second model parameter to the first preset weight, and sets the weight of the first model parameter to the second preset weight to perform weighting. Calculate the average value to obtain the fusion parameters.
  • the function of the first preset weight is to reduce the influence of the second model parameter on the fusion parameter as much as possible, and the function of the second preset weight is to make the parameter assigned the weight influence the fusion parameter.
  • the first preset weight may be selected as 0, and the second preset weight may be selected as 1.
  • the calculation process of the g-th fusion parameter can be expressed as:
  • U represents the number of first model parameters.
  • ⁇ k, g also serves as the first preset parameter. The function of setting weight and second preset weight.
  • a preset fusion algorithm is designed so that the second model parameters cannot affect the fusion parameters, and the fusion parameters are calculated so that the unupdated second model parameters cannot affect the model training, thus ensuring the model training performance.
  • S805 The central node device fuses the global model parameters of multiple child node devices through a preset fusion algorithm to obtain the fusion parameters, which may also include:
  • the central node device For the model parameters of different sub-node devices with the same function, the central node device sets the weight of the second model parameter and the weight of the first model parameter to the second preset weight, performs weighted averaging, and obtains the fusion parameters.
  • the embodiment of the present application sets the weights of the first model parameter and the second model parameter to the second preset weight value, so that the first model parameter and the second model parameter All affect the fusion parameters.
  • the second preset weight value can be set to 1.
  • the calculation process of the g-th fusion parameter can be expressed as:
  • the above describes the model training method provided by the embodiment of the present application from the perspective of the interaction model parameters between the central node device and the sub-node devices. If the central node device and the sub-node devices interact with the model gradient, a round of model can be added by adding the model gradients.
  • the fusion parameters are calculated using the parameter method, which can be expressed as:
  • ⁇ M 0, g represents the fusion model gradient, corresponding to the fusion parameters
  • ⁇ M k, g represents the model gradient, corresponding to the model parameters.
  • the fusion parameters can still be calculated through the above formula, and then the model training can be achieved.
  • a preset fusion algorithm whose second model parameters can affect the fusion parameters is designed, the fusion parameters are calculated, and the determined fusion parameters can participate in model training, ensuring the progress of model training.
  • the above describes the training method of the model provided by the embodiment of the present application from the perspective of model training. After the model is trained based on the above method, the trained model can be used for beam selection.
  • the following describes the method provided by the embodiment of the present application from the perspective of model application.
  • the execution subject of this method may be a sub-node device. As shown in Figure 10, this method may include:
  • the sub-node device can receive beam information from the network device.
  • the model is trained according to the model training method introduced in the above embodiment.
  • the target beam is selected through a model, and the training efficiency of the model is high, so the efficiency of selecting the target beam is high.
  • each node such as a child node device and a central node device, includes a corresponding hardware structure and/or software module to perform each function.
  • each node such as a child node device and a central node device
  • each node includes a corresponding hardware structure and/or software module to perform each function.
  • the methods of the embodiments of the present application can be implemented in the form of hardware, software, or a combination of hardware and computer software. Whether a function is performed by hardware or computer software driving the hardware depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each specific application, but such implementations should not be considered beyond the scope of this application.
  • the embodiments of this application can divide the sub-node equipment and the central node equipment into functional modules according to the above method examples.
  • each functional module can be divided corresponding to each function, or two or more functions can be integrated into one processing module. middle.
  • the above integrated modules can be implemented in the form of hardware or software function modules. It should be noted that the division of modules in the embodiment of the present application is schematic and is only a logical function division. In actual implementation, there may be other division methods.
  • each network element shown in this application may adopt the composition structure shown in Figure 11 or include the components shown in Figure 11.
  • Figure 11 is a schematic structural diagram of a communication device 1100 provided by an embodiment of the present application.
  • the communication device 1100 can be a chip in the sub-node device. or system-on-a-chip.
  • the communication device 1100 may be a chip or a system-on-chip in the central node device.
  • the communication device 1100 may include a processor 1101, a communication line 1102 and a transceiver 1103.
  • the processor 1101, the memory 1104 and the transceiver 1103 may be connected through a communication line 1102.
  • the processor 1101 may include one or more CPUs, such as CPU0 and CPU1 in FIG. 11 .
  • the communication device 1100 includes multiple processors.
  • the processor 1101 in Figure 11 it may also include a processor 1107.
  • the processor 1101 can be a central processing unit (CPU), a general-purpose processor, a network processor (NP), a digital signal processor (digital signal processing, DSP), a microprocessor, or a microcontroller. , programmable logic device (PLD) or any combination thereof.
  • the processor 1101 may also be other devices with processing functions, such as circuits, devices or software modules.
  • the communication line 1102 is used to transmit information between various components included in the communication device 1100 .
  • Transceiver 1103 used to communicate with other devices or other communication networks.
  • the other communication network may be Ethernet, wireless access network (radio access network, RAN), wireless local area networks (wireless local area networks, WLAN), etc.
  • the transceiver 1103 may be an interface circuit, a pin, a radio frequency module, a transceiver, or any device capable of communication.
  • the communication device 1100 may also include a memory 1104.
  • Memory 1104 used to store instructions.
  • the instructions may be computer programs.
  • the memory 1104 can be a read-only memory (ROM) or other type of static storage device that can store static information and/or instructions, or it can be a random access memory (random access memory, RAM) or can store information. and/or other types of dynamic storage devices for instructions, which may also be electrically erasable programmable read_only memory (EEPROM), compact disc read_only memory, CD_ROM) or other optical disc storage, optical disc storage, magnetic disk storage media or other magnetic storage devices.
  • EEPROM electrically erasable programmable read_only memory
  • CD_ROM compact disc read_only memory
  • Optical disc storage includes compressed optical discs, laser discs, optical discs, digital versatile discs, or Blu-ray discs, etc.
  • the memory 1104 may exist independently of the processor 1101, or may be integrated with the processor 1101.
  • the memory 1104 can be used to store instructions or program codes or some data.
  • the memory 1104 may be located within the communication device 1100 or outside the communication device 1100, without limitation.
  • the communication device 1100 also includes an output device 1105 and an input device 1106.
  • the input device 1106 is a device such as a keyboard, a mouse, a microphone, or a joystick
  • the output device 1105 is a device such as a display screen, a speaker, or the like.
  • the communication device 1100 may be a desktop computer, a portable computer, a network server, a mobile phone, a tablet computer, a wireless terminal, an embedded device, a chip system, or a device with a similar structure as shown in FIG. 11 .
  • the composition structure shown in FIG. 11 does not constitute a limitation of the communication device.
  • the communication device may include more or less components than shown in the figure, or some components may be combined. , or a different component arrangement.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • FIG 12 shows a structural diagram of a communication device 120, which is applied to central node equipment.
  • Each module in the device shown in Figure 12 has the function of realizing the corresponding steps in Figure 8 and can achieve its corresponding technical effects.
  • the functions described can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the communication device may be a chip or a system on a chip in the central node device.
  • the communication device includes:
  • the receiving module 121 is used to receive capability information from multiple child node devices (child node devices participating in the federated learning of the model) used to characterize the child node device training model capabilities; the sending module 122 is used to send a message to each of the multiple child node devices.
  • the child node device sends the first indication information and the model used to instruct the child node device to update the first model parameter when training the model, and the first model parameter of the child node device to be updated is determined based on the capability information of the child node device; receive Module 121 is also configured to receive updated first model parameters from multiple child node devices; the processing module 123 is configured to fuse global model parameters of multiple child node devices through a preset fusion algorithm to obtain fusion parameters; the global The model parameters include updated first model parameters and second model parameters in the model other than the first model parameters; the sending module 122 is used to send fusion parameters to multiple child node devices.
  • the processing module 123 is specifically used to:
  • the processing module 123 is specifically used to:
  • model training duration does not meet the preset duration conditions, select some parameters of the model and determine them as the first model parameters; when the model training duration meets the preset duration conditions, select all the parameters of the model and determine them as the first model parameters. .
  • the processing module 123 is specifically used to:
  • Some parameters of the model are randomly selected and determined as the first model parameters.
  • the model includes a first network layer and a second network layer; the model application scenario indicates the first network layer, and the first network layer and the second network layer are different; the processing module 123 is specifically used to select the first network layer. Parameters of a network layer and part of parameters of a second network layer are determined as first model parameters.
  • processing module 123 is also used to:
  • the first network layer required to build the model is determined according to the model application scenario.
  • the capability information is also used to characterize the model type that the child node device can train, and the processing module 123 is also used to:
  • the child node device whose capability information includes a model type of the model is determined as a target device for sending the first indication information and the model.
  • the processing module 123 is specifically used to:
  • the weight of the second model parameter is set to the first preset weight, and the weight of the first model parameter is set to the second preset weight, and weighted averaging is performed to obtain the fusion. parameters; or, for model parameters with the same function of different sub-node devices, set the weight of the second model parameter and the weight of the first model parameter to the second preset weight, perform weighted averaging, and obtain the fusion parameters.
  • multiple sub-node devices send capability information to the central node device, and the central node device then The capability information sends the first instruction information and the model to each of the multiple child node devices. Then, the multiple child node devices train the model based on the training set and the model parameters to be updated to obtain the updated first model parameters. , and the central node device fuses global model parameters including the first model parameters of multiple child node devices through a preset fusion algorithm to obtain fusion parameters, and then multiple child node devices train models based on the fusion parameters. This allows the sub-node devices to freeze some parameters during model training and not participate in the training. This ensures that when training the model, the model training progress will not be affected by the low computing power of the central node device, and the training model will be more efficient.
  • first predict the model training duration of multiple child node devices based on the capability information and then determine the first model parameters of the child node devices based on the model training duration, so that the determined first model parameters to be trained can be compared with the corresponding child nodes.
  • Equipment matching ensures model training progress.
  • some parameters of the model are randomly selected and determined as the first model parameters, so that the determined first model parameters to be trained can match the corresponding child node devices to ensure the progress of model training.
  • the parameters of the first network layer and some parameters of the second network layer are targeted and determined as the first model parameters, taking into account the requirements of the model application scenario for the network layer and ensuring the model training performance.
  • model application scenario determines the first network layer required to build the model based on the model application scenario, taking into account the requirements of the model application scenario for the network layer to ensure model training performance.
  • the sub-node device whose capability information includes the model type of the model is determined as the target device for sending the first instruction information and the model, ensuring that all sub-node devices participating in the training can train the model and ensuring the model training progress.
  • a preset fusion algorithm is designed so that the second model parameters cannot affect the fusion parameters, and the fusion parameters are calculated so that the unupdated second model parameters cannot affect the model training, thus ensuring the model training performance.
  • a preset fusion algorithm whose second model parameters can affect the fusion parameters is also designed to calculate the fusion parameters. The determined fusion parameters can participate in model training, ensuring the progress of model training.
  • FIG 13 shows a structural diagram of a communication device 130, which is applied to sub-node devices.
  • Each module in the device shown in Figure 13 has the function of realizing the corresponding steps in Figure 8 and can achieve its corresponding technical effects.
  • the functions described can be implemented by hardware, or can be implemented by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions.
  • the communication device may be a chip or a system-on-chip in the sub-node device.
  • the communication device includes:
  • the sending module 131 is configured to send capability information representing the training model capability of the child node device to the central node device.
  • the receiving module 132 is configured to receive the first indication information and the model from the central node device for indicating the first model parameters to be updated when the child node device trains the model.
  • the first model parameters of the child node device to be updated are based on the child node device.
  • the capability information of the node device is determined.
  • the processing module 133 is used to train the model based on the training set (including the beam information collected by the child node device) and the model parameters to be updated, and obtain the updated first model parameters.
  • the sending module 131 is configured to send the updated first model parameters to the central node device.
  • the receiving module 132 is configured to receive fusion parameters calculated from the central node device according to the updated first model parameters.
  • the processing module 133 is used to train the model according to the fusion parameters.
  • the receiving module 132 is used to receive beam information.
  • the processing module 133 is used to input beam information into the model and output target beam information.
  • first, multiple child node devices send capability information to the central node device, and then the central node device sends first indication information and models to each of the multiple child node devices based on the capability information, and then, Multiple child node devices train the model based on the training set and the model parameters to be updated to obtain the updated first model parameters, and the central node device uses a preset fusion algorithm to perform global training on the multiple child node devices including the first model parameters.
  • the model parameters are fused to obtain the fusion parameters, and then multiple child node devices train the model based on the fusion parameters. This allows the sub-node devices to freeze some parameters during model training and not participate in the training. This ensures that when training the model, the model training progress will not be affected by the low computing power of the central node device, and the training model will be more efficient.
  • first predict the model training duration of multiple child node devices based on the capability information and then determine the first model parameters of the child node devices based on the model training duration, so that the determined first model parameters to be trained can be compared with the corresponding child nodes. equipment Match to ensure model training progress.
  • some parameters of the model are randomly selected and determined as the first model parameters, so that the determined first model parameters to be trained can match the corresponding child node devices to ensure the progress of model training.
  • the parameters of the first network layer and some parameters of the second network layer are targeted and determined as the first model parameters, taking into account the requirements of the model application scenario for the network layer and ensuring the model training performance.
  • model application scenario determines the first network layer required to build the model based on the model application scenario, taking into account the requirements of the model application scenario for the network layer to ensure model training performance.
  • the sub-node device whose capability information includes the model type of the model is determined as the target device for sending the first instruction information and the model, ensuring that all sub-node devices participating in the training can train the model and ensuring the model training progress.
  • a preset fusion algorithm is designed so that the second model parameters cannot affect the fusion parameters, and the fusion parameters are calculated so that the unupdated second model parameters cannot affect the model training, thus ensuring the model training performance.
  • a preset fusion algorithm whose second model parameters can affect the fusion parameters is also designed to calculate the fusion parameters. The determined fusion parameters can participate in model training, ensuring the progress of model training.
  • FIG 14 is a structural diagram of a communication system provided by an embodiment of the present application.
  • the communication system is a communication system corresponding to a model training scenario.
  • the communication system may include: a central node device 140 and a sub-node device 141.
  • the central node device 140 may have the function of the above-mentioned communication device 120
  • the sub-node device 141 may have the function of the above-mentioned communication device 130.
  • the central node device 140 performs the following steps:
  • Receive capability information from multiple child node devices (child node devices participating in federated learning of the model) used to characterize the child node device's ability to train the model, and send instructions to each of the multiple child node devices when the child node device trains the model.
  • the first indication information and model of the first model parameters to be updated (determined according to the capability information of the child node devices), receiving the updated first model parameters from multiple child node devices, and using a preset fusion algorithm to
  • the global model parameters including the updated first model parameters and the second model parameters other than the first model parameters in the model) are fused to obtain the fusion parameters, and the fusion parameters are sent to multiple child node devices.
  • first, multiple child node devices send capability information to the central node device, and then the central node device sends first indication information and models to each of the multiple child node devices based on the capability information, and then, Multiple child node devices train the model based on the training set and the model parameters to be updated to obtain the updated first model parameters, and the central node device uses a preset fusion algorithm to perform global training on the multiple child node devices including the first model parameters.
  • the model parameters are fused to obtain the fusion parameters, and then multiple child node devices train the model based on the fusion parameters. This allows the sub-node devices to freeze some parameters during model training and not participate in the training. This ensures that when training the model, the model training progress will not be affected by the low computing power of the central node device, and the training model will be more efficient.
  • the central node device 140 also performs the following steps:
  • the model training duration of multiple sub-node devices is first predicted based on the capability information, and then the first model parameters of the sub-node devices are determined based on the model training duration, so that the determined first model parameters to be trained can be matched with the corresponding The child node equipment matches to ensure the model training progress.
  • the first model parameter of the child node device is determined based on the model training duration, including:
  • the model training duration does not meet the preset duration conditions, some parameters of the selected model are determined as the first model parameters.
  • all parameters of the selected model are determined as the first model parameters. .
  • some parameters of the selected model are determined as the first model parameters, including:
  • Some parameters of the model are randomly selected and determined as the first model parameters.
  • some parameters of the model are randomly selected and determined as the first model parameters, so that the determined first model parameters to be trained can be matched with the corresponding child node devices to ensure the progress of model training.
  • the model includes a first network layer (the model application scenario indicates the first network layer) and a second network layer.
  • the first network layer and the second network layer are different; some parameters of the selected model are determined as
  • the first model parameters include:
  • Parameters of the first network layer and some parameters of the second network layer are selected and determined as first model parameters.
  • the parameters of the first network layer and some parameters of the second network layer are selected as the first model parameters in a targeted manner, taking into account the requirements of the model application scenario for the network layer and ensuring the model training performance.
  • the central node device 140 before selecting and determining the parameters of the preset first network layer in the first network layer as the first model parameters, the central node device 140 further performs the following steps:
  • the model application scenario determines the first network layer required to build the model based on the model application scenario.
  • the model application scenario determines the first network layer required to build the model according to the model application scenario, taking into account the requirements of the model application scenario for the network layer to ensure model training performance.
  • the capability information is also used to characterize the model type that the child node device can train, and the central node device 140 also performs the following steps:
  • the child node device whose capability information includes a model type of the model is determined as a target device for sending the first indication information and the model.
  • the child node device whose capability information includes the model type of the model is determined as the target device for sending the first instruction information and the model, ensuring that all child node devices participating in the training can train the model and ensuring the model training progress.
  • the global model parameters of multiple sub-node devices are fused through a preset fusion algorithm to obtain the fusion parameters, including:
  • the weight of the second model parameter is set to the first preset weight, the weight of the first model parameter is set to the second preset weight, and weighted averaging is performed to obtain fusion. parameters; or, for model parameters with the same function of different sub-node devices, set the weight of the second model parameter and the weight of the first model parameter to the second preset weight, and perform weighted averaging to obtain the fusion parameters.
  • Parameters of the first network layer and some parameters of the second network layer are selected and determined as first model parameters.
  • the parameters of the first network layer and some parameters of the second network layer are selected as the first model parameters in a targeted manner, taking into account the requirements of the model application scenario for the network layer and ensuring the model training performance.
  • the central node device 140 before selecting and determining the parameters of the preset first network layer in the first network layer as the first model parameters, the central node device 140 further performs the following steps:
  • the model application scenario determines the first network layer required to build the model based on the model application scenario.
  • the model application scenario determines the first network layer required to build the model according to the model application scenario, taking into account the requirements of the model application scenario for the network layer to ensure model training performance.
  • the capability information is also used to characterize the model type that the child node device can train, and the central node device 140 also performs the following steps:
  • the child node device whose capability information includes a model type of the model is determined as a target device for sending the first indication information and the model.
  • the child node device whose capability information includes the model type of the model is determined as the target device for sending the first instruction information and the model, ensuring that all child node devices participating in the training can train the model and ensuring the model training progress.
  • the global model parameters of multiple sub-node devices are fused through a preset fusion algorithm to obtain the fusion parameters, including:
  • the weight of the second model parameter is set to the first preset weight
  • the weight of the first model parameter is set to the second preset weight
  • weighted averaging is performed to obtain fusion. parameters
  • set the weight of the second model parameter and the weight of the first model parameter to the second preset weight, and perform weighted averaging to obtain the fusion parameters.
  • a preset fusion algorithm is designed so that the second model parameters cannot affect the fusion parameters, and the fusion parameters are calculated so that the unupdated second model parameters cannot affect the model training, thus ensuring the model training performance.
  • a preset fusion algorithm whose second model parameters can affect the fusion parameters is also designed to calculate the fusion parameters. The determined fusion parameters can participate in model training, ensuring the progress of model training.
  • the child node device 141 performs the following steps:
  • the first model parameters to be updated of the node device are determined based on the capability information of the child node device; the model is trained based on the training set including the beam information collected by the child node device and the model parameters to be updated, and the updated first model parameters are obtained ;
  • the target beam is selected through a model, and the efficiency of selecting the target beam is relatively high.
  • An embodiment of the present application also provides a computer-readable storage medium. All or part of the processes in the above method embodiments can be completed by instructing relevant hardware through a computer program.
  • the program can be stored in the above computer-readable storage medium. When executed, the program can include the processes of the above method embodiments.
  • the computer-readable storage medium may be the terminal device of any of the aforementioned embodiments, such as: an internal storage unit including a data sending end and/or a data receiving end, such as a hard disk or memory of the terminal device.
  • the above-mentioned computer-readable storage medium may also be an external storage device of the above-mentioned terminal device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card equipped on the above-mentioned terminal device, Flash card, etc.
  • the computer-readable storage medium may also include both an internal storage unit of the terminal device and an external storage device.
  • the above computer-readable storage medium is used to store the above computer program and other programs and data required by the above terminal device.
  • the above-mentioned computer-readable storage media can also be used to temporarily store data that has been output or is to be output.
  • An embodiment of the present application also provides a computer instruction. All or part of the processes in the above method embodiments can be completed by computer instructions to instruct related hardware (such as computers, processors, network equipment, terminals, etc.).
  • the program can be stored in the above-mentioned computer-readable storage medium.
  • An embodiment of the present application also provides a chip system.
  • the chip system may be composed of chips, or may include chips and other discrete devices without limitation.
  • the chip system includes a processor and a transceiver. All or part of the processes in the above method embodiments can be completed by the chip system.
  • the chip system can be used to implement the functions performed by the central node device in the above method embodiments, or, Implement the functions performed by the sub-node device in the above method embodiment.
  • the above chip system also includes a memory, the memory is used to save program instructions and/or data.
  • the processor executes the program instructions stored in the memory to enable The chip system performs the functions performed by the central node device in the above method embodiment or performs the functions performed by the sub-node devices in the above method embodiment.
  • the processor may be a general-purpose processor, a digital signal processor, an application-specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, which may implement or Execute each method, step and logical block diagram disclosed in the embodiment of this application.
  • a general-purpose processor may be a microprocessor or any conventional processor, etc. The steps of the methods disclosed in conjunction with the embodiments of the present application can be directly implemented by a hardware processor for execution, or can be executed by a combination of hardware and software modules in the processor.
  • the memory may be a non-volatile memory, such as a hard disk drive (HDD) or a solid-state drive (SSD), etc., or it may be a volatile memory (volatile memory), such as Random-access memory (RAM).
  • Memory is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
  • the memory in the embodiment of the present application can also be a circuit or any other device capable of realizing a storage function, used to store instructions and/or data.
  • At least one (item) refers to one or more
  • multiple refers to two or more
  • at least two (items) refers to two or more
  • three or more “and/or” is used to describe the relationship between associated objects, indicating that there can be three relationships.
  • a and/or B can mean: only A exists, only B exists, and at the same time There are three cases A and B, where A and B can be singular or plural.
  • the character “/" generally indicates that the related objects are in an "or” relationship.
  • At least one of the following” or similar expressions thereof refers to any combination of these items, including any combination of a single item (items) or a plurality of items (items).
  • at least one of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c” ”, where a, b, c can be single or multiple.
  • "B corresponding to A" means that B is associated with A.
  • B can be determined based on A.
  • determining B based on A does not mean determining B solely based on A, but may also determine B based on A and/or other information.
  • connection appearing in the embodiments of this application refers to various connection methods such as direct connection or indirect connection to realize communication between devices, and the embodiments of this application do not limit this in any way.
  • transmission in the embodiments of this application refers to two-way transmission, including the actions of sending and/or receiving.
  • transmission in the embodiments of this application includes the sending of data, the receiving of data, or the sending and receiving of data.
  • the data transmission here includes uplink and/or downlink data transmission.
  • Data may include channels and/or signals, uplink data transmission means uplink channel and/or uplink signal transmission, and downlink data transmission means downlink channel and/or downlink signal transmission.
  • the "network” and “system” appearing in the embodiments of this application express the same concept, and the communication system is the communication network.
  • the disclosed devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of modules or units is only a logical function division.
  • there may be other division methods for example, multiple units or components may be The combination can either be integrated into another device, or some features can be omitted, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated.
  • the components shown as units may be one physical unit or multiple physical units, that is, they may be located in one place, or they may be distributed to multiple different places. . Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units. If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solutions of the embodiments of the present application are essentially or contribute to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the software product is stored in a storage medium includes a number of instructions to cause a device, such as a microcontroller, a chip, etc., or a processor to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Telephonic Communication Services (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请公开了一种模型的训练方法及通信装置,涉及通信领域,该模型的训练方法,包括:接收来自多个子节点设备的用于表征子节点设备训练模型能力的能力信息,向多个子节点设备中的每个子节点设备发送指示子节点设备训练模型时待更新的第一模型参数的第一指示信息和模型,接收来自多个子节点设备的更新后的第一模型参数,通过预设融合算法对多个子节点设备的全局模型参数进行融合以得到融合参数,向多个子节点设备发送融合参数;解决了现有采用联邦学习架构的模型训练效率较低的问题。

Description

模型的训练方法及通信装置
本申请要求于2022年07月21日提交国家知识产权局、申请号为202210872876.X、申请名称为“模型的训练方法及通信装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,尤其涉及模型的训练方法及通信装置。
背景技术
基于传统联邦学习构架,由于各子节点设备要从中心节点设备下载统一的模型进行本地训练,考虑到子节点设备的能力各异,训练模型的效率受限于能力最差的子节点设备,模型训练效率较低。
发明内容
本申请提供一种模型的训练方法及通信装置,用于解决现有采用联邦学习的模型训练效率较低的问题。
为达到上述目的,本申请的采用如下技术方案:
第一方面,本申请提供一种模型的训练方法,该方法应用于中心节点设备,该方法包括:
接收来自多个子节点设备(参与模型的联邦学习的子节点设备)的用于表征子节点设备训练模型能力的能力信息,向多个子节点设备中的每个子节点设备发送指示子节点设备训练模型时待更新的第一模型参数(根据子节点设备的能力信息确定)的第一指示信息和模型,接收来自多个子节点设备的更新后的第一模型参数,通过预设融合算法对多个子节点设备的全局模型参数(包括更新后的第一模型参数以及模型中第一模型参数之外的第二模型参数)进行融合以得到融合参数,向多个子节点设备发送融合参数。
第一方面中,首先,多个子节点设备向中心节点设备发送能力信息,中心节点设备再根据能力信息向多个子节点设备中的每个多个子节点设备发送第一指示信息和模型,然后,多个子节点设备根据训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数,并且,中心节点设备通过预设融合算法对多个子节点设备的包括了第一模型参数的全局模型参数进行融合,得到融合参数,然后多个子节点设备根据融合参数训练模型。使得子节点设备在模型训练时,部分参数冻结,不参与训练,进而使得在训练模型时,不会因为低算力的中心节点设备影响模型训练进度,训练模型的效率较高。
在一种可能的实现方式中,方法还包括:
根据能力信息分别预测多个子节点设备的模型训练时长,根据模型训练时长确定子节点设备的第一模型参数。
在该实现中,首先根据能力信息分别预测多个子节点设备的模型训练时长,然后根据模型训练时长确定子节点设备的第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
在一种可能的实现方式中,根据模型训练时长确定子节点设备的第一模型参数,包括:
在模型训练时长不符合预设时长条件的情况下,选取模型的部分参数确定为第一模型参数,在模型训练时长符合预设时长条件的情况下,选取模型的全部参数确定为第一模型参数。
在该实现中,根据模型训练时长与预设时长条件不同的比对结果,相应采取选取模型的部分参数确定为第一模型参数和选取模型的全部参数确定为第一模型参数两种策略,在保证模型训练进度的同时保证了模型性能。
在一种可能的实现方式中,选取模型的部分参数确定为第一模型参数,包括:
随机选取模型的部分参数确定为第一模型参数。
在该实现中,随机选取模型的部分参数确定为第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
在一种可能的实现方式中,模型包括第一网络层(模型应用场景指示第一网络层)和第二网络层,该第一网络层和第二网络层不同;选取模型的部分参数确定为第一模型参数,包括:
选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数。
在该实现中,有针对性的选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种可能的实现方式中,在选取第一网络层中预设第一网络层的参数确定为第一模型参数之前,方法还包括:
模型应用场景根据模型应用场景确定构建模型所需的第一网络层。
在该实现中,模型应用场景根据模型应用场景确定构建模型所需的第一网络层,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种可能的实现方式中,能力信息还用于表征子节点设备能够训练的模型类型,方法还包括:
将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备。
在该实现中,将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备,确保了所有参与训练的子节点设备都能训练模型,保证了模型训练进度。
在一种可能的实现方式中,通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数,包括:
对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重设为第一预设权值,第一模型参数的权重设为第二预设权值,进行加权求均值以得到融合参数;或者,对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重和第一模型参数的权重均设为第二预设权值,进行加权求均值以得到融合参数。
在该实现中,设计了第二模型参数无法影响融合参数的预设融合算法,进行融合参数的计算,使得未更新的第二模型参数无法影响模型训练,保证了模型训练性能。还设计了第二模型参数能够影响融合参数的预设融合算法,进行融合参数的计算,确定出的融合参数能够参与模型训练,保证了模型训练进度。
第二方面,本申请提供一种选择波束信息的方法,该方法应用于子节点设备,包括:
接收波束信息,将波束信息输入模型(根据第一方面的方法训练得到),输出目标波束信息。
第二方面中,通过模型选择目标波束,选择目标波束的效率较高。
第三方面,本申请提供一种模型的训练方法,该方法应用于子节点设备,包括:
向中心节点设备发送用于表征子节点设备训练模型能力的能力信息,接收来自中心节点设备的用于指示子节点设备训练模型时待更新的第一模型参数的第一指示信息和模型,该子节点设备的待更新的第一模型参数根据子节点设备的能力信息确定;根据包括子节点设备采集到的波束信息的训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数;向中心节点设备发送更新后的第一模型参数,接收来自所述中心节点设备的根据所述更新后的第一模型参数计算得到的融合参数,根据所述融合参数训练所述模型。
第四方面,本申请提供一种通信装置,该通信装置可以为中心节点设备的芯片或者片上系统,包括:
接收模块,用于接收来自多个子节点设备(参与模型的联邦学习的子节点设备)用于表征子节点设备训练模型能力的能力信息;发送模块,用于向多个子节点设备中的每个子节点设备发送用于指示子节点设备训练模型时待更新的第一模型参数的第一指示信息和模型,该子节点设备的待更新的第一模型参数根据子节点设备的能力信息确定;接收模块,还用于接收来自多个子节点设备的更新后的第一模型参数;处理模块,用于通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数;该全局模型参数包括更新后的第一模型参数以及模型中第一模型参数之外的第二模型参数;发送模块,用于向多个子节点设备发送融合参数。
在一种可能的实现方式中,处理模块,具体用于:
根据能力信息分别预测多个子节点设备的模型训练时长;根据模型训练时长确定子节点设备的第一模型参数。
在一种可能的实现方式中,处理模块,具体用于:
在模型训练时长不符合预设时长条件的情况下,选取模型的部分参数确定为第一模型参数;在模型训练时长符合预设时长条件的情况下,选取模型的全部参数确定为第一模型参数。
在一种可能的实现方式中,处理模块,具体用于:
随机选取模型的部分参数确定为第一模型参数。
在一种可能的实现方式中,模型包括第一网络层和第二网络层;模型应用场景指示第一网络层,第一网络层和第二网络层不同;处理模块,具体用于选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数。
在一种可能的实现方式中,处理模块还用于:
在选取第一网络层中预设第一网络层的参数确定为第一模型参数之前,模型应用场景根据模型应用场景确定构建模型所需的第一网络层。
在一种可能的实现方式中,能力信息还用于表征子节点设备能够训练的模型类型,处理模块还用于:
将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备。
在一种可能的实现方式中,处理模块,具体用于:
对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重设为第一预设权值,第一模型参数的权重设为第二预设权值,进行加权求均值,得到融合参数;或者,对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重和第一模型参数的权重均设为第二预设权值,进行加权求均值,得到融合参数。
第五方面,本申请提供一种通信装置,该通信装置可以为子节点设备的芯片或者片上系统,包括:
接收模块,用于接收波束信息;处理模块,用于将波束信息输入模型,输出目标波束信息;该模型根据第四方面的通信装置训练得到。
第六方面,本申请提供一种通信装置,该通信装置可以为子节点设备的芯片或者片上系统,包括:
发送模块,用于向中心节点设备发送用于表征子节点设备训练模型能力的能力信息;接收模块,用于接收来自中心节点设备的用于指示子节点设备训练模型时待更新的第一模型参数的第一指示信息和模型,该子节点设备的待更新的第一模型参数根据子节点设备的能力信息确定;处理模块,用于根据训练集(包括子节点设备采集到的波束信息)和待更新模型参数对模型进行训练,得到更新后的第一模型参数;发送模块,用于向中心节点设备发送更新后的第一模型参数;接收模块,用于接收来自中心节点设备根据更新后的第一模型参数计算得到的融合参数,处理模块,用于根据融合参数训练模型。
第七方面,本申请提供一种通信装置,通信装置包括处理器和收发器,处理器和收发器用于支持通信装置执行第一方面或者第二方面或者第三方面的方法。进一步的,该通信装置还可以包括存储器,该存储器存储有计算机指令,当处理器可以运行该计算机指令执行第一方面或者第二方面或者第三方面的方法。
第八方面,本申请提供一种计算机可读存储介质,计算机可读存储介质存储计算机指令,当计算机指令运行时,第一方面或者第二方面或者第三方面的方法被执行。
第九方面,本申请提供一种包含指令的计算机程序产品,当其在计算机上运行时,使得计算机可以执行上述第一方面或者第二方面或者第三方面的方法。
第十方面,本申请提供一种芯片,该芯片包括处理器和收发器,处理器和收发器用于支持通信装置执行第一方面或者第二方面或者第三方面的方法。
其中,本申请中第四方面至第十方面描述的有益效果,可以对应参考第一方面至第三方面的有益效果分析,此处不再赘述。
附图说明
图1为本申请实施例提供的一种神经元结构示意图;
图2为本申请实施例提供的一种DNN的神经网络结构示意图;
图3为本申请实施例提供的一种经典联邦学习的系统架构示意图;
图4-a为本申请实施例提供的一种联邦学习流程示意图;
图4-b为本申请实施例提供的另一种联邦学习流程示意图;
图5为本申请实施例提供的另一种联邦学习流程示意图;
图6为本申请实施例提供的一种差异化网络参数更新原理示意图;
图7为本申请实施例提供的一种通信系统的示意图;
图8为本申请实施例提供的一种模型的训练方法的流程示意图;
图9为本申请实施例提供的一种模型网络结构示意图;
图10为本申请实施例提供的另一种模型的训练方法的流程示意图;
图11为本申请实施例提供的一种通信装置的结构图;
图12为本申请实施例提供的另一种通信装置的结构图;
图13为本申请实施例提供的另一种通信装置的结构图;
图14为本申请实施例提供的一种通信系统的结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。其中,在本申请的描述中,除非另有说明,“/”表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。并且,在本申请的描述中,除非另有说明,“多个”是指两个或多于两个。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c,a-b,a-c,b-c,或a-b-c,其中a,b,c可以是单个,也可以是多个。另外,为了便于清楚描述本申请实施例的技术方案,在本申请的实施例中,采用了“第一”、“第二”等字样对功能和作用基本相同的相同项或相似项进行区分。本领域技术人员可以理解“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。同时,在本申请实施例中,“示例性的”或者“例如”等词用于表示作例子、例证或说明。本申请实施例中被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
此外,本申请实施例描述的网络架构以及业务场景是为了更加清楚的说明本申请实施例的技术方案,并不构成对于本申请实施例提供的技术方案的限定,本领域普通技术人员可知,随着网络架构的演变和新业务场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
在介绍本申请实施例之前,对本申请实施例涉及的一些名词进行解释。
人工智能:让机器具有人类的智能,应用计算机的软硬件来模拟人类某些智能行为,包括机器学习和很多其他方法。
机器学习:从原始数据中学习模型或规则,存在很多种不同的机器学习方法,如神经网络、决策树、支持向量机等。
AI模型:这里指将一定维度的输入映射到一定维度的输出的函数模型,其模型参数通过机器学习训练得到。例如,f(x)=ax^2+b就是一个二次函数模型,它可以看做一个AI模型,a和b对应模型的参数,可以通过机器学习训练得到。
神经网络:这里指人工神经网络,它是一种模仿动物神经网络行为特征,进行分布式并行信息处理的数学模型,是AI模型的一种特殊形式。
数据集:机器学习中用于模型训练、验证和测试的数据,数据的数量和质量将影响到机器学习的效果。
模型训练:通过选择合适的损失函数,应用优化算法对模型参数进行训练,使得损失函数值最小化。
损失函数:用于衡量模型的预测值和真实值之间的差别。
模型测试:训练后应用测试数据评估模型性能。
模型应用:应用训练好的模型去解决实际问题。
构建智能化社会适应万物互联的新一代信息基础设施,保障信息基础设施的安全,对于促进信 息技术与实体经济融合、拓展数字经济空间具有重要意义。目前,全球已经掀起了人工智能(Artificial Intelligence,AI)应用的浪潮。在第五代移动通信系统(5th Generation Mobile Communication Technology,5G)网络中,多种业务模式并存,应用环境复杂多变。传统静态配置、人工维护、服务能力单一的方式已经无法满足5G的需求。在5G以及未来通信系统中引入AI,可以为网络、计算、应用等信息基础设施提供基于数据的感知、预测和管控能力,促进网络、计算、应用等基础设施的融合与协同。AI在越来越多的复杂场景下可以做出比人类更优的决策,无疑让网络智能化建设开拓了新的视野,给网络的发展带来了前所未有的新机遇,也为电信网络重构转型过程中遇到的众多困难和挑战提供了高效的加速解决路径。
机器学习(Machine Learning,ML)是实现人工智能的一种重要技术途径。机器学习可以分为监督学习、无监督学习、强化学习。
监督学习依据已采集到的样本值和样本标签,应用机器学习算法学习样本值到样本标签的映射关系,并用机器学习模型来表达学到的映射关系。训练机器学习模型的过程就是学习这种映射关系的过程。如信号检测中,含噪声的接收信号即为样本,该信号对应的真实星座点即为标签,机器学习期望通过训练学到样本与标签之间的映射关系,即,使机器学习模型学到一种信号检测器。在训练时,通过计算模型的预测值与真实标签的误差来优化模型参数。一旦映射关系学习完成,就可以应用学到映射来预测每一个新样本的样本标签。监督学习学到的映射关系可以包括线性映射、非线性映射。根据标签的类型可将学习的任务分为分类任务和回归任务。
无监督学习仅依据采集到的样本值,应用算法自行发掘样本的内在模式。无监督学习中有一类算法将样本自身作为监督信号,即模型学习从样本到样本的映射关系,称为自监督学习。训练时,通过计算模型的预测值与样本本身之间的误差来优化模型参数。自监督学习可用于信号压缩及解压恢复的应用,常见的算法包括自编码器和对抗生成型网络等。
强化学习不同于监督学习,是一类通过与环境进行交互来学习解决问题的策略的算法。与监督、无监督学习不同,强化学习问题并没有明确的“正确的”动作标签数据,算法需要与环境进行交互,获取环境反馈的奖励信号,进而调整决策动作以获得更大的奖励信号数值。如下行功率控制中,强化学习模型根据无线网络反馈的系统总吞吐率,调整各个用户的下行发送功率,进而期望获得更高的系统吞吐率。强化学习的目标也是学习环境状态与优选决策动作之间的映射关系。但因为无法事先获得“正确动作”的标签,所以不能通过计算动作与“正确动作”之间的误差来优化网络。强化学习的训练是通过与环境的迭代交互而实现的。
深度神经网络(Deep Neural Network,DNN)是机器学习的一种具体实现形式,DNN一般使用监督学习或无监督学习策略来优化模型参数。根据通用近似定理,神经网络理论上可以逼近任意连续函数,从而使得神经网络具备学习任意映射的能力。传统通信系统需要借助丰富的专家知识来设计通信模块,而基于DNN的深度学习通信系统可以从大量的数据集中自动发现隐含的模式结构,建立数据之间的映射关系,获得优于传统建模方法的性能。
DNN的思想来源于大脑组织的神经元结构。每个神经元都对其输入值做加权求和运算,并加权求和结果通过一个非线性函数产生输出,如图1所示的神经元结构所示。具体的,假设神经的输入为与输入对应的权值为w=[w0,…,wn],加权求和的偏置为b,非线性函数的形式可以多样化,以最大值函数max{0,x}为例,对应的一个神经元的执行的效果可以是DNN一般具有多层结构,DNN的每一层都可包含多个神经元,输入层将接收到的数值经过神经元处理后,传递给中间的隐藏层。类似的,隐藏层再将计算结果传递给最后的输出层,产生DNN的最后输出,DNN的神经网络结构如图2所示。
DNN一般具有多于一个的隐藏层,隐藏层往往直接影响提取信息和拟合函数的能力。增加DNN的隐藏层数或扩大每一层的宽度都可以提高DNN的函数拟合能力。每个神经元中加权值即为DNN网络模型的参数。模型参数通过训练过程得到优化,从而使得DNN网络具备提取数据特征、表达映射关系的能力。
根据网络的构建方式,DNN可分为前馈神经网络(Feedforward Neural Network,FNN)、卷积神经网络(Convolutional Neural Networks,CNN)和递归神经网络(Recurrent Neural Network,RNN)。图2所示即为一种FNN网络,其特点为相邻层的神经元之间两两完全相连,这使得FNN通常需要大量的存储空间、导致较高的计算复杂度。
CNN是一种专门来处理具有类似网格结构的数据的神经网络。例如,时间序列数据(时间轴离 散采样)和图像数据(二维离散采样)都可以认为是类似网格结构的数据。CNN并不一次性应用全部的输入信息做运算,而是采用一个固定大小的窗截取部分信息做卷积运算,这就大大降低了模型参数的计算量。另外根据窗截取的信息类型的不同(如同一副图中的人和物为不同类型信息),每个窗可以采用不同的卷积核运算,这使得CNN能更好的提取输入数据的特征。
RNN是一类应用反馈时间序列信息的DNN网络。它的输入包括当前时刻的新的输入值和自身在前一时刻的输出值。RNN适合获取在时间上具有相关性的序列特征,特别适用于语音识别、信道编译码等应用。
上述FNN、CNN、RNN为常见的神经网络结构,这些网络结构都是以神经元为基础而构造的。上述已经介绍,每个神经元都对其输入值做加权求和运算,并加权求和结果通过一个非线性函数产生输出,则我们将神经网络中神经元加权求和运算的权值以及非线性函数称作神经网络的参数。以max{0,x}为非线性函数的神经元为例,做操作的神经元的参数为权值为w=[w0,…,wn],加权求和的偏置为b,以及非线性函数max{0,x}。一个神经网络所有神经元的参数构成这个神经网络的参数。
DNN可以应用联邦学习(Federated Learning,FL)进行训练,联邦学习是一个机器学习框架,其初衷是为了能有效帮助多个机构在满足用户隐私保护、数据安全的要求下,进行数据使用和机器学习建模,在联邦学习框架内,节点之间传递的不是数据本身,而是训练中得到的中间结果,例如模型参数或者梯度等。如图3所示,图3是经典联邦学习的系统架构示意图。如图3所示,用于经典联邦学习的系统架构可以包括中心节点301和多个分布节点(分布节点302a至分布节点302c。分布节点302a至分布节点302c均与中心节点301通信连接,分布节点302a至分布节点302c各自属于不同的机构或公司。分布节点302a至分布节点302c中的每个分布节点均包括关于该分布节点的应用环境的分布数据集。联邦学习作为分布式的机器学习范式,可以有效解决数据孤岛问题,让参与方在不共享数据的基础上联合建模,能从技术上打破数据孤岛,实现AI协作。根据参与各方数据源分布的情况不同,联邦学习可以被分为三类:横向联邦学习、纵向联邦学习、联邦迁移学习。
横向联邦学习指的是在两个数据集的用户特征重叠较多而用户重叠较少的情况下,把数据集按照横向(即用户维度)切分,并取出双方用户特征相同而用户不完全相同的那部分数据进行训练。纵向联邦学习指的是在两个数据集的用户重叠较多而用户特征重叠较少的情况下,把数据集按照纵向(即特征维度)切分,并取出双方用户相同而用户特征不完全相同的那部分数据进行训练。联邦迁移学习指的是在两个数据集的用户与用户特征重叠都较少的情况下,不对数据进行切分,而可以应用迁移学习来克服数据或标签不足的情况。
以高频波束管理问题为例,基站侧在进行基于码本的同步信号和物理广播信号(Physical broadcast channel,PBCH)块,即同步信号块(Synchronization Signal Block,SSB)或者信道状态信息参考信号(Channel state information-reference signal,CSI-RS)的波束扫描时,不同位置的用户和基站间的信道不同,用户终端(简称用户)需要对接收到的SSB或者CSI-RS波束的参数进行测量,例如测量物理层参考信号接收功率(L1-reference signal receiving power,L1-RSRP),并将最大RSRP值所对应的波束作为目标波束,并反馈给基站。
用户终端应用AI/ML可以训练一个模型用于选择波束,例如,应用接收到的复数个SSB/CSI-RS信号(中的部分或全部)作为输入,或者接收的复数个SSB/CSI-RS信号的强度(RSRP)(中的部分或全部)作为输入,来推断出目标波束ID,并反馈给基站。每个用户可以收集自己的接收波束/信道信息以及对应的目标波束ID作为训练上述AL/ML模型的样本(即本地样本),但每个用户能够收集到的样本数量有限,训练样本中目标波束ID只是实际SSB/CSI-RS码本的一个子集,致使用户仅应用本地数据训练得到的模型性能会有局限。
如果用户将本地数据发送给服务器,服务器汇总各个用户的数据进行模型训练,虽然可以提升模型的性能,但会存在泄露用户的隐私信息的风险,例如通过信道能推测用户当前位置等信息。为了解决这一问题,可以应用联邦学习。在联邦学习框架中,用户对应子节点,网络设备对应中心节点。中心节点下发一个全局模型给每个参加联邦学习的用户,每个用户应用本地数据训练该全局模型,得到本地模型,将本地模型的参数信息,例如梯度,权值等(加密)发送给网络设备,网络设备进行模型融合(model aggression,MA)更新全局模型,然后将更新后的全局模型发送给各个用户,用户继续更新本地模型,输出更新后的本地模型的参数信息并发送给中心节点,如此进行多 次迭代,直到模型收敛。
常规的联邦学习构架将AI运用到无线网络中,首先要进行训练数据的收集,比如标签信息,然而某些用户本地的数据可能涉及用户隐私问题,例如在波束管理问题以及定位场景中,用户设备(User Equipment,UE)可以收集到包含用户信道,目标波束ID,地理位置等数据。由于单个UE收集到的数据可能不具备普遍性(比如上文提到的用户由于其地理位置,可能反馈的目标波束ID仅为整个码本的子集),无法遍历真实数据的所有可能的分布,并且单个UE收到的数据量可能也比较少,导致仅依靠单独一个UE的数据难以训练出性能足够好的神经网络,而如果所有UE都将原始数据发送给中心节点,由中心节点进行统一训练,不仅会造成极大的空口开销,还可能涉及隐私泄露的问题。
在上述无线通信场景中,需要使不同UE对同一类型的数据进行分布式训练,适合使用横向联邦学习。横向联邦学习的主要流程如下:
步骤1.中心节点(例如基站或AI网元)将全局模型下发给子节点(例如UE)。
步骤2.每个子节点应用本地数据训练模型,将梯度、权值或本地模型(加密后)上传给中心节点,中心节点聚合各子节点的梯度或本地模型更新全局模型参数。
重复上述两个步骤直至模型收敛。
如图4-a所示联邦学习流程,以3个子节点(用户1,2,3)为例,首先中心节点向子节点发送初始模型(即图中的下载全局模型M0),用户1、2、3分别根据本地数据1、2、3训练(全局)模型,得到模型M1、M2、M3,并将模型发送给中心节点,中心节点对接收到的各个子节点的模型进行模型融合得到更新的全局模型,例如,使用融合算法:FedAvg(Federated averaging),即得到新的全局模型后,中心节点将该更新后的M0再次发送给各个子节点,子节点根据收到的全局模型进行新的一轮本地训练,再将训练后的模型发送给中心节点,如此进行迭代,直到模型收敛。用户可以将训练好的本地模型上传至中心节点,也可以仅上传训练得到的本地模型的网络参数,比如梯度或者权值,中心节点可以恢复出各个本地模型或者直接在网络参数上进行融合。
基于传统联邦学习构架,由于各用户要从中心节点下载统一的模型进行本地训练,考虑到参与训练的用户设备能力各异(能力差异可以指代同一用户设备在不同时刻由于电量或者允许其他程序导致的算力波动),算力可以理解为终端设备的计算能力或者计算速度,一种可以衡量终端设备算力的指标即每秒浮点数运算次数(FLOPS,floating point operations per second)。由于在联邦学习的构架下,子节点(即参与某一轮联邦学习的用户)需要承担训练网络的职责。而通常情况下,不同模型的复杂度不同,一种常用的衡量模型复杂度的指标即浮点数运算数(FLOPs,floating point operations)。具体模型的复杂度不同可以体现在:网络类型不同,比如卷及网络CNN,全连接网络FC,循环神经网络RNN;相同网络类型具备不同的网络深度/层数,比如多层CNN或者FC;神经元数量不同,比如FC网络中隐藏层的神经元数量多少。可见,不同终端设备由于算力差异,训练相同模型的能力不同,同一终端设备训练不同模型所需要的算力不同。可能会出现如下3种情况:
情况1:所有参与训练的用户通过能力上报使中心节点获知其算力(训练能力,内存等)信息,中心节点下发所有人都能训练的模型,如图4-b所示。这种情况下,训练模型复杂度受制于算力较弱的用户。中心节点根据所有用户上报的算力调整全局模型,使得上一情况中算力较弱的用户1也能训练该模型,但可能造成全局模型的性能受制于算力最弱的用户。
情况2:下载统一模型,部分用户无法训练,该部分用户最后使用全局模型可能有损失,如图5所示。用户1下载M0后由于算力限制无法训练该全局模型,导致其本地数据的特征无法被全局模型学习,最终可能导致使用全局模型预测用户1的波束时性能下降。
情况3:中心节点对不同用户终端进行差异化模型更新。具体的,基站广播多个全局模型(依据子节点上报的算力分组),或者可选的,同时广播一个分类神经。子节点根据预设条件判断自己属于全局模型或者分类神经中的哪个分组,然后更新该分组对应的本地模型并上报参数,同时上报自己更新的模型的索引。基站根据索引确定子节点上报参数对应的全局模型,并更新全局模型,然后在第二轮联邦学习中再次广播该多个全局模型。等效于实现多个(并行的)联邦学习的构架,由中心节点负责统一管理多个全局模型,每个模型对应一组算力群体,虽然能解决用户设备算力差异化导致无法训练统一模型的问题,但是具有如下缺点:需要在空口交互(广播)多个模型,增大了 空口开销;训练单个模型的数据量变少,分组下发多个模型导致训练单个模型的总数据量下降,模型性能不易保证。
综上可见,现有采用联邦学习框架下的模型的训练效率较低。
为了解决上述技术问题,本申请实施例提供一种模型的训练方法,下面结合说明书附图,对本申请实施例提供的方法进行描述。
本申请实施例提出一种联邦学习构架下支持中心节点根据各子节点上报的算力情况设计并下发差异化的AI/ML网络参数更新或者冻结信令来指示子节点训练部分或者整个统一下载的模型的方法。在原有的联邦学习中增加子节点算力反馈流程,以及上文所述的中心节点进行差异化的AI/ML网络参数更新或者冻结指示来指示的流程。
具体的差异化网络参数更新或者冻结原理如图6所示,这里展示了一个以AI/ML网络层为单位的更新/冻结指示示意图。所有参与联邦学习的用户(子节点)都将下载图6中的AI/ML网络,该网络是一个拥有4层网络层的AI/ML模型。传统的联邦学习中,每个子节点都需要利用本地数据训练完整的4层网络层(所有网络参数),得到模型,并反馈给中心节点。而本申请实施例中,基于算力的不同,中心节点会指示算力不足的用户负责更新网络中的部分参数,算力足够的用户则仍会更新所有参数。如图6中右侧所示,用户A只会更新1、4、6列神经元对应的网络层的参数,用户B会更新1、2、4、6列神经元对应的网络层的参数,而用户C则会更新所有的1、2、3、4、5、6列神经元对应的网络层的参数。中心节点将负责将该差异化更新指令发送给各个对应的子节点。此外,出了以网络层为单位指示参数更新,也可以以神经元为单位进行参数更新指示。指令可以指示子节点需要冻结的网络层,也可以指示子节点需要更新的网络层。
各个子节点用户利用本地数据作为训练集结合上文所述的更新/冻结指示训练模型,训练完成后(加密)上传梯度/权值/网络(也可只上传负责更新的部分)给中心节点。中心节点对多个子节点上传的梯度/权值/网络进行融合。示例性的,若仍以FedAvg算法融合,采用以网络层为单位的参数更新指示,用户A,B,C对应模型1,2,3,则网络层g对于用户A是冻结层,对于用户B和用户C是更新层,模型融合时,中心节点更新网路层g的参数时将不考虑用户A反馈的参数,即融合过程可以用数学公式表达为:
M0,g,代表融合后的模型参数,M1,g,M2,g,M3,g代表各用户的模型的第g层网络,βk,g∈{0,1}反映第k个子节点模型的第g层网络所对应的更新指令,即βk,g=1表示该模型对第g层参数进行了更新,βk,g=0表示该模型对第g层参数进行了冻结,则是因为只有2个用户负责更新此层网络的参数。完成融合后中心节点再次下发更新。
本申请实施例提供的通信方法可以应用于各种通信系统,例如:长期演进(long term evolution,LTE)系统、第五代(5th generation,5G)移动通信系统、无线保真(wireless fidelity,WiFi)系统、未来的通信系统、或者多种通信系统融合的系统等,本申请实施例不做限定。其中,5G还可以称为新无线(new radio,NR)。
本申请实施例提供的通信方法可以应用于各种通信场景,例如可以应用于以下通信场景中的一种或多种:增强移动宽带(enhanced mobile broadband,eMBB)、超可靠低时延通信(ultra reliable low latency communication,URLLC)、机器类型通信(machine type communication,MTC)、大规模机器类型通信(massive machine type communications,mMTC)、设备到设备(device to device,D2D)、车辆外联(vehicle to everything,V2X)、车辆到车辆(vehicle to vehicle,V2V)、和物联网(internet of things,IoT)等。
下面以图7所示通信系统为例,对本申请实施例提供的通信方法进行描述。
图7是本申请实施例提供的一种通信系统的示意图,如图7所示,该通信系统可以包括网络设备和多个终端。网络设备对应联邦学习架构中的中心节点,终端对应联邦学习架构中的子节点。网络设备下发一个全局模型给每个参加联邦学习的终端,每个用户应用本地数据训练该全局模型,得到本地模型,将本地模型的参数信息,例如梯度,权值等(加密)发送给网络设备,网络设备进行模型融合(model aggression,MA)更新全局模型,然后将更新后的全局模型发送给各个,用户继续更新本地模型,输出更新后的本地模型的参数信息并发送给网络设备,如此进行多次迭代,直到模型收敛。
需要说明的是,图7仅为示例性框架图,图7中包括的网络设备的数量、终端的数量不受限制。除图7所示功能节点外,还可以包括其他节点,如:核心网设备、网关设备、应用服务器等等,不予限制。网络设备通过有线或无线的方式与核心网设备相互通信,如通过下一代(next generation,NG)接口相互通信。
本申请实施例涉及到的终端设备还可以称为终端,可以是一种具有无线收发功能的设备。终端可以被部署在陆地上,包括室内、室外、手持、和/或车载;也可以被部署在水面上(如轮船等);还可以被部署在空中(例如飞机、气球和卫星上等)。终端设备可以是用户设备。终端设备包括具有无线通信功能的手持式设备、车载设备、可穿戴设备或计算设备。示例性地,终端设备可以是手机、平板电脑或带无线收发功能的电脑。终端设备还可以是虚拟现实(Virtual Reality,VR)终端设备、增强现实(Augmented Reality,AR)终端设备、工业控制中的无线终端、无人驾驶中的无线终端、远程医疗中的无线终端、智能电网中的无线终端、智慧城市中的无线终端、和/或智慧家庭中的无线终端等等。
本申请实施例中,用于实现终端设备的功能的装置可以是终端设备;也可以是能够支持终端设备实现该功能的装置,例如芯片系统。该装置可以被安装在终端设备中或者和终端设备匹配使用。本申请实施例中,芯片系统可以由芯片构成,也可以包括芯片和其他分立器件。下述实施例中,用于实现终端设备的功能的装置是终端设备,以终端设备是UE为例,描述本申请实施例提供的技术方案。
本申请实施例涉及到的网络设备包括基站(Base Station,BS),可以是一种部署在无线接入网(radio access network,RAN)中能够和终端设备进行通信的设备。可选地,无线接入网还可以简称为接入网。该网络设备还可以称为接入网设备。基站可能有多种形式,比如宏基站、微基站、中继站或接入点等。本申请实施例涉及到的基站可以是5G系统中的基站、长期演进(Long Term Evolution,LTE)系统中的基站或其它系统中的基站,不做限制。其中,5G系统中的基站还可以称为发送接收点(Transmission Reception Point,TRP)或下一代节点B(Next Generation Node B,gNB)。本申请实施例中的基站可以是一体化基站,或者可以是包括集中式单元(Centralized Unit,CU)和分布式单元(Distributed Unit,DU)的基站。包括CU和DU的基站还可以称为CU和DU分离的基站,如该基站包括gNB-CU和gNB-DU。其中,CU还可以分离为CU控制面(CU Control Plane,CU-CP)和CU用户面(CU User Plane,CU-UP),如该基站包括gNB-CU-CP、gNB-CU-UP和gNB-DU。
本申请实施例中,用于实现网络设备的功能的装置可以是网络设备;也可以是能够支持网络设备实现该功能的装置,例如芯片系统。该装置可以被安装在网络设备中或者和网络设备匹配使用。下述实施例中,以用于实现网络设备的功能的装置是网络设备,以网络设备是基站为例,描述本申请实施例提供的技术方案。
为了在无线网络中支持机器学习功能,网络中还可能引入专门的AI网元或模块。如果引入AI网元,则对应一个独立的网元;如果引入AI模块,则可以位于某个网元内部,对应的网元可以是终端设备或者网络设备等。
在本申请实施例提供的技术方案中,下面以用于实现网络设备的功能的装置是基站为例、实现终端设备的功能的装置是终端为例,结合波束选择场景,描述本申请实施例提供的方法。
图8示出了本申请实施例提供的模型的训练方法的流程示意图。如图8所示,该方法可以包括以下步骤:
S801,多个子节点设备向中心节点设备发送能力信息,相应的,中心节点设备接收来自多个子节点设备的能力信息。
其中,多个子节点设备的能力信息用于表征多个子节点设备训练模型的能力;多个子节点设备为参与模型的联邦学习的多个子节点设备。
多个子节点设备与中心节点设备通信连接。本申请实施例中,子节点设备向中心节点设备发送信息称为上发,中心节点设备向子节点设备发送信息称为下发。中心节点为子节点配置中心节点用于下发第一指示信息和模型的下行资源。例如,在波束选择场景中,下行资源可以是控制信道资源,如物理下行控制信道(Physical downlink control channel,PDCCH)资源。也可以是数据信道资源,如物理下行共享信道(Physical downlink share channel,PDSCH)资源。具体的,包括频域资源块编号以及起始位置、子带号、子带带宽、跳频参数、调制编码方案(Modulation and coding  scheme,MCS)等参数。
模型可以采用广播或者组播的方式由中心节点进行下发。例如,在中心节点为基站,子节点为UE的单小区联邦学习构架中,可以采用广播的方式下发模型,由于广播的特性,未参与到联邦学习中的子节点也可以接收到该广播信息;在以具有联邦学习管理功能的基站作为中心节点,其他基站作为子节点的多小区联邦学习构架中也可以由中心节点采用广播的方式下发模型至各个子节点基站,同样的,其他未参与到联邦学习中的子节点也可以接收到广播的信息;也可以针对参与到的联邦学习中的子节点采用组播方式,同一个中心节点所关联的子节点作为一组,拥有同样的组号,配置相同的下行资源。组播模式下,不参与该联邦学习的子节点则不会接收到该组播信息。
中心节点还可以为子节点配置子节点上报更新后的第一模型参数的上行资源。也可以由另外的联邦学习管理节点为中心节点和子节点配置子节点用于更新后的第一模型参数以及必要信令上报的上行资源。与下行资源配置类似的,上行资源可以是控制信道资源,如物理上行控制信道(Physical uplink control channel,PUCCH)资源,也可以是数据信道资源,如物理上行共享信道(Physical uplink share channel,PUSCH)资源。
本申请实施例中需要子节点在接入网络向中心节点设备发送能力信息。能力信息至少需要包括子节点设备可用于存储AI/ML模型的内存空间大小,子节点的算力信息(运行AI/ML模型的计算能力,比如上文提到的FLOPS计算性能、电量情况,当前电量也是影响终端设备运算能力的因素之一)以及收集到的本地数据量相关信息(可以帮助中心节点推测模型训练所需时长)。其他可选的能力信息可以包括:该子节点设备是否支持运行AI/ML模型,支持的AI/ML模型类型(如CNN、RNN、全连接、随机森林模型等)。只有子节点支持AI/ML模型时,中心节点才会允许其参与联邦学习,并向其发送模型。其他可选的能力信息可以包括:子节点的硬件信息,例如,子节点的天线配置(天线个数、极化方向等)、射频通道数、传感器类型(位置传感器/GPS、运动传感器等)和参数等。由于是在联邦学习框架内训练模型,子节点运用子节点的本地数据进行训练,所以子节点可以不需要上报实际收集到的波束信息相关的或者涉及隐私的信息。
S802,中心节点设备向多个子节点设备中的每个多个子节点设备发送第一指示信息和模型,相应的,多个子节点设备接收来自中心节点设备的第一指示信息和模型。
其中,第一指示信息用于指示多个子节点设备训练模型时待更新的第一模型参数,多个子节点设备的待更新的第一模型参数根据多个子节点设备的能力信息确定。
模型的发送包括模型的结构和模型的参数。模型的结构如卷积神经网络(Convolutional Neural Networks,CNN)中卷积层的个数、每个卷积层的通道数和卷积核的大小,全连接层的个数以及神经元多少,递归神经网络(Recurrent Neural Network,RNN)中的层数(layer)、每层结构中各个状态的计算方式等。
第一指示信息可以通过信令的方式下发,具体可以选取如下三种方式:1,通过单播方式发给指定的子节点;2,如果不同子节点中存在相同的第一指示信息,可以通过组播方式将相同的第一指示信息发送给一组子节点;3,通过广播方式下发,在每条信令中包含该信令对应的子节点索引(index),各子节点收到该广播信号后使用自己索引对应的信令。
此外,中心节点还可以下发与子节点的模型训练和上报的相关信息,如每轮子节点所需迭代次数、学习率、损失函数、批尺寸、上报参数类型(模型参数或梯度)等信息。为了减小下发模型的空口开销,可以对模型进行压缩后再下发,模型压缩的方法包括但不限于模型剪枝、模型蒸馏、模型量化等。
S803,多个子节点设备根据训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数。
其中,训练集包括多个子节点设备采集到的波束信息。在波束管理场景中,联邦学习中的子节点可以是基站也可以是用户终端设备,而中心节点可以是独立的联邦学习管理节点或担任中心节点功能的基站。假设所要训练的全局模型是实现以估计的信道测量值或者接收信号本身为输入,以优选波束索引(ID)作为输出的AI/ML模型。那么子节点在收集数据阶段就负责收集训练集作为模型输入,该信道集可以包括:信道测量值或者接收信号以及训练模型所用的标签(即优选波束ID)。该标签可以通过基站将全部可能的波束(基于码本的SSB或者CSI-RS波束)逐个发送给UE,通过UE选择出性能优选的波束索引作为标签(优选可以指所有SSB/CSI-RS波束中具有最大的L1-RSRP或者SNR测量值的波束)。
在对模型进行训练时,如果该子节点设备存在冻结参数。可以通过在相应的AI/ML网络优化器中对冻结参数做无需梯度(required_grad=False)处理以使冻结参数不参与训练。
训练所采用的损失函数与应用场景和采用的模型类型有关,如在本申请实施例的对应的波束管理的分类问题(预测优选波束ID)中可以采用交叉熵函数(Cross-Entropy,CE),接收机性能/数据解调性能问题可以采用二元交叉熵(Binary Cross-Entropy,BCE),在信道估计等回归问题中可以采用均方误差(Mean Squared Error,MSE)、平均绝对误差(Mean Absolute Error,MAE)等,子节点可以根据应用场景选取损失函数,也可以根据中心节点下发的指示、选取。具体的,子节点可以将收到的全局模型作为这一轮的初始模型进行训练,损失函数主要与本地数据集有关,例如第k个子节点的MSE损失函数为其中,nk为样本数,yk,l为第l个样本的输出,为第l个样本的标签。
S804,多个子节点设备向中心节点设备发送更新后的第一模型参数,相应的,中心节点设备接收来自多个子节点设备的更新后的第一模型参数。
其中,子节点通过步骤S801中说明的上行资源进行更新后的第一模型参数的上报。各子节点可以上报波束训练模型的所有参数,也可以仅上报参与更新的第一模型参数。对于上报波束训练模型的所有参数的情况,示例性的,假设本地模型共有G个参数,第k个子节点上报Mk=[mk,1,mk,2,…,mk,G],其中mk,g表示第k个子节点的第g个参数,可以将所有参数组合起来作为一个向量发送;或者上报当前模型参数和上一轮模型参数之间的差值集合,即各个参数的梯度ΔMk=[Δmk,1,Δmk,2,…,Δmk,G]。上报更新的第一模型参数的顺序和下发第一模型参数时的顺序保持一致。这种上报方式可以同时适配网络层粒度参数上报或是单个参数粒度参数上报,无需为两种参数模式设置不同上报模式。
考虑到节约空口开销,对于仅上报参与更新的第一模型参数的,子节点通过S801中配置的上行资源进行本地模型中该子节点的第一模型参数上报。示例性的,假设模型共有G个参数,其中有G′个参数为第k个子节点得第一模型参数。那么,第k个子节点上报本地模型Mk=[mk,1,mk,2,…,mk,G′],其中mk,g表示第k个子节点的第g个参数,可以将所有参数组合起来作为一个向量发送;(或者上报当前模型参数和上一轮模型参数之间的差值集合,即各个参数的梯度ΔMk=[Δmk,1,Δmk,2,…,Δmk,G])。
此外,子节点还可以上报辅助信息,如子节点本地数据集的样本数nk、模型类型指示信息等。
S805,中心节点设备通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数。
其中,全局模型参数包括更新后的第一模型参数以及模型中第一模型参数之外的第二模型参数。预设融合算法能够对多个子节点设备的全局模型参数进行融合,得到融合参数,示例性的,预设融合算法可以选为求均值、取中位数、取众数等。
S806,中心节点设备向多个子节点设备发送融合参数,相应的,多个子节点设备接收融合参数。
S807,多个子节点设备根据融合参数训练模型。
其中,多个子节点设备根据融合参数训练模型,直至根据融合参数训练的模型收敛,否则返回S802调整第一指示信息后再执行S802至S807,直至根据融合参数训练的模型收敛。模型是否收敛可以通过损失函数判断,损失函数可以参考S803中的相应说明,采用交叉熵函数(Cross-Entropy,CE)。
本申请实施例中,首先,多个子节点设备向中心节点设备发送能力信息,中心节点设备再根据能力信息向多个子节点设备中的每个多个子节点设备发送第一指示信息和模型,然后,多个子节点设备根据训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数,并且,中心节点设备通过预设融合算法对多个子节点设备的包括了第一模型参数的全局模型参数进行融合,得到融合参数,然后多个子节点设备根据融合参数训练模型。使得子节点设备在模型训练时,部分参数冻结,不参与训练,进而使得在训练模型时,不会因为低算力的中心节点设备影响模型训练进度,训练模型的效率较高。
在一种实施例中,该方法还包括:
S808,中心节点设备根据能力信息分别预测多个子节点设备的模型训练时长。
其中,中心节点设备需要根据能力信息分别预测多个子节点设备的模型训练时长,以确定子节点设备是否能够训练模型。可以利用时间窗的概念,即要求在给定的需要子节点反馈本地模型参数/梯度的时间窗范围内(可以是广播模型后的t时刻到t+T0的长度为T0的时间窗,t≥0),子节点k能够完成模型的训练,假设子节点设备k训练模型所用时间Tk可以表达为:
其中,αk包括所有可能影响训练时长的超参数(第1轮训练可以由中心节点预配置,第1轮之后则可能由子节点提供),y=f(.)代表将超参数映射到等效于训练y个该模型,FLOPS指代模型的复杂度,由模型的模型类型确定。FLOPsk表示第k个子节点设备的算力性能(由能力信息确定)。通过上述算式能够预测出多个子节点设备的模型训练时长。
S809,中心节点设备根据模型训练时长确定子节点设备的第一模型参数。
参考S808说明中的算式,当预测出子节点设备的模型训练时长后,即可根据模型训练时长确定子节点设备的第一模型参数。当存在Tk≥T0,k=0,,,K,即预测出的子节点设备k的模型训练时长超出了预设的时间窗范围T0,则表明子节点设备k无法训练模型,或者,表明子节点设备k可以训练模型,但是训练时长超出了时间窗范围T0
为了使所有子节点都能够训练该模型,且能在给定时间窗内完成训练任务,本申请实施例针对能力不足的子节点设备,采用训练部分模型参数而非全部模型参数的策略进行模型训练。相应的,中心节点就需要根据上报的模型训练时长确定子节点设备的第一模型参数,即参与训练的模型参数,称为更新参数,剩下的不参与训练的模型参数称为冻结参数。
示例性的,如果以单个参数为粒度实施第一模型参数的选取,那么假设模型M0一共有G个参数,M0,g用来表示第g个参数。则第k个子节点模型对应所有参数集合可以表示为 其中M1,g表示该参数集合第g个参数。一种可能的表示参数是否参与训练的集合可以表达为:
β={β1 T,β2 T,…,βK T}
其中,任意元素βk是一个1行G列的向量,其包括的元素为{βk,1,βk,2,,…,βk,G},其中任意元素βk,g={0,1}表示第k个子节点对应的模型的第g个参数是否需要更新/冻结,这里假设βk,g=0表示该参数对于该子节点为冻结参数,βk,g=1表示该参数对于该子节点为更新参数。()T表示转置操作。类似的,假如冻结/更新以网络的各个层为单位进行,那么上文中的索引g则代表模型的第g层网络的全部参数,βk,g={0,1}表示第k个子节点对应的模型的第g层网络的全部参数是否需要更新/冻结。
具体的,如何划分冻结参数和更新参数,这里以层为单位的冻结/更新模型为例:首先,冻结更多的变量可以节省更多的算力但是代价就是模型性能的下降,同时如果更新层在所有参与联邦的子节点间分配不均也会影响模型性能。一种可能的划分准则是:根据终端上报的能力,从少到多逐渐增加需要冻结的层数,直到满足终端的内存限制以及能在指定时间窗内完成本地模型训练以及上传。同时,尽量提高参与一轮联邦学习的终端,增加终端差异化,减少更新层在子节点间分配不均的概率。设计准则这一部分主要是基站内部的实现行为,更好的优化/准则设计算法能更有效的提高模型性能。
本申请实施例中,首先根据能力信息分别预测多个子节点设备的模型训练时长,然后根据模型训练时长确定子节点设备的第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
在一种实施例中,S808:中心节点设备根据能力信息分别预测多个子节点设备的模型训练时长,可以包括:
S8081,在模型训练时长不符合预设时长条件的情况下,选取模型的部分参数确定为第一模型参数。
S8082,在模型训练时长符合预设时长条件的情况下,选取模型的全部参数确定为第一模型参数。
本申请实施例中,根据模型训练时长与预设时长条件不同的比对结果,相应采取选取模型的部 分参数确定为第一模型参数和选取模型的全部参数确定为第一模型参数两种策略,在保证模型训练进度的同时保证了模型性能。
在一种实施例中,S8081:选取模型的部分参数确定为第一模型参数,可以包括:
随机选取模型的部分参数确定为第一模型参数。
其中,随机选取模型的部分参数确定为第一模型参数对应单个参数粒度的参数选取,可以参考S809中,对以单个参数为粒度实施第一模型参数的选取的说明,不再赘述。
本申请实施例中,随机选取模型的部分参数确定为第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
在一种实施例中,模型包括第一网络层和第二网络层;模型训练场景指示第一网络层,第一网络层和第二网络层不同;S8081:选取模型的部分参数确定为第一模型参数,可以包括:
选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数。
其中,本申请实施例分网络层选取第一模型参数对应网络层为粒度的参数选取。以第一网络层为全连接层为例,考虑到对于拥有多层全连接层的网络,如果每个子节点用户分散的更新每一层中的部分神经元对应的参数,最终网络的训练效果也可能难以保证。因此,本申请实施例选取某第一网络层的全部参数确定为第一模型参数,以保证模型性能。
考虑到第一网络层可能会包括多个网络层,在选取第一网络层的参数时,中心节点设备可以根据子节点设备的能力信息确定该子节点设备对应的第一网络层,在确定多个子节点设备对应的第一网络层时,遵循所有的子节点设备对应的第一网络层的集合覆盖全部的第一网络层,以保证模型训练精度。
此外,第二网络层(例如,非全连接层或者子节点设备指示的不需要更新的全连接层)可以采用随机选取的方式选取出。由于同一小区内信道条件相对固定,所以由一部分用户训练的负责更细颗粒的全连接层也可被另一部分算力较差的用户在实施阶段利用,以实现降低算力开销。
此外,实际上某些网络层所占用的参数比重可能远高于其他类型的网络层,示例性的,如图9所示,AI/ML模型由具有特征提取功能的卷积层(Conv1、Conv2)串联上承担分类功能的全连接层(FC1 16Neural、FC2 64Neural)构成,该模型中全连接层的参数占该模型全部参数很高的比重。而且相同的任务也可以存在不同的分类颗粒度,比如人工智能辅助AI-assisted波束管理任务中,如果基站侧发起基于16码字码本的SSB宽波束扫描,某些用户设备的算力支持预测该16个宽波束中最优波束ID,而另一些用户设备的算力可以支持直接预测64个窄波束中的最优波束ID,因此针对这两类用户至少需要两个不同神经元的全连接层(16神经元和64神经元)来完成上述任务。
相应的,以网络层为粒度的参数选取场景下,模型参数可以表达为:

其中βnFC表示所有全连接层之前的神经网络参数对应的参数集合,集合中任一元素代表第k个子节点中非全连接层的参数集合,Z+表示正整数,维度为G,G的值取决于参数选取时是对应每个网络参数还是对应每个网络层。而βFC表示所有全连接层的神经网络参数集合,集合中任一元素代表第k个子节点中全连接层的更新/冻结信息的集合,维度为L,L的值仅对应全连接层的数量。对于子节点k应更新的全连接层,一种可能的矩阵表达式如下:
即表示一共有3层全连接,而第k个子节点将会负责更新前两个全连接层的所有网络参数。相应的,第k个子节点对应的第一模型参数可以表示为:
其中,Random()表示部分选取函数。
参照图9,在计算损失函数时,不同的子节点使用对应的第一网络层的输出进行损失函数的计算。仍以波束管理任务为例,全连接层对应{16,64,128}三个分辨率的波束ID标签,假设第k个子节点对应的是64波束预测任务,那么其损失函数则是经过64神经元全连接层后的输出进行softmax操作后与标签之间的交叉熵(CE Loss),而不是根据网络最终输出确定的交叉熵(经过128神经元的全连接层后的输出与标签间的交叉熵)。同理应用于16以及128波束预测任务对应的子节点。
本申请实施例中,有针对性的选取第一网络层的参数以及第二网络层的部分参数确定为第一模 型参数,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种实施例中,在选取全连接层中预设全连接层的参数确定为第一模型参数之前,该方法还可以包括:
S810,中心节点设备根据模型应用场景确定构建模型所需的第一网络层。
其中,基于模型应用场景的不同,相应的,也要求模型需要具备特定的网络层。示例性的,在波束管理任务中,假设高频毫米波下行系统,中心节点设备拥有64天线,不同的子节点设备可以完成包括基于16DFT码本的16宽波束中最优波束预测任务,基于16DFT宽波束码本直接预测64窄波束中最优波束ID的预测任务,以及预测更高精度的非码本的基于steering vector的波束ID预测(比如预先设定好的128或者更高精度的非正交码本),那么下发的全局模型中最后的全连接层需要包括拥有16、64、128/256神经元的全连接层,即模型应用场景指示16、64、128/256神经元的全连接层,6、64、128/256神经元的全连接层即构建模型所需的第一网络层。
本申请实施例中,将至少一个构建模型所需的全连接层确定为子节点设备对应的预设全连接层,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种实施例中,能力信息还用于表征子节点设备能够训练的模型类型,该方法还包括:
中心节点设备将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备。
本申请实施例中,将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备,确保了所有参与训练的子节点设备都能训练模型,保证了模型训练进度。
在一种实施例中,S805:中心节点设备通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数,可以包括:
S8051,对于不同子节点设备的相同功能的模型参数,中心节点设备将第二模型参数的权重设为第一预设权值,第一模型参数的权重设为第二预设权值,进行加权求均值,得到融合参数。
其中,第一预设权值的功能是尽可能的减小第二模型参数对融合参数的影响,第二预设权值的功能是使得被赋予权值的参数影响融合参数,示例性的,第一预设权重可以选取为0,而第二预设权值选取为1。相应的,第g个融合参数的计算过程可以用算式表达为:
其中U∈Z+,U≤K,U代表第一模型参数的数量。参照S809中的说明,βk,g=0表示该参数对于该子节点为冻结参数,βk,g=1表示该参数对于该子节点为更新参数,βk,g同时起到了第一预设权值和第二预设权值的功能。
本申请实施例中,设计了第二模型参数无法影响融合参数的预设融合算法,进行融合参数的计算,使得未更新的第二模型参数无法影响模型训练,保证了模型训练性能。
在一种实施例中,S805:中心节点设备通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数,还可以包括:
S8052,对于不同子节点设备的相同功能的模型参数,中心节点设备将第二模型参数的权重和第一模型参数的权重均设为第二预设权值,进行加权求均值,得到融合参数。
其中,参考S8051对第二预设权值的说明,本申请实施例将第一模型参数和第二模型参数的权重均设置为第二预设权值,使第一模型参数和第二模型参数均影响融合参数。示例性的,第二预设权值可以设为1。相应的,第g个融合参数的计算过程可以用算式表达为:
上面从中心节点设备和子节点设备交互模型参数的角度说明了本申请实施例提供的模型的训练方法,如果中心节点设备和子节点设备交互的是模型梯度,则可以通过模型梯度相加上一轮模型参数的方法计算出融合参数,用算式可以表示为:

其中,ΔM0,g表示融合模型梯度,对应融合参数;ΔMk,g表示模型梯度,对应模型参数。通过上述算式仍旧能够计算出融合参数,进而实现模型的训练。
本申请实施例中,设计了第二模型参数能够影响融合参数的预设融合算法,进行融合参数的计算,确定出的融合参数能够参与模型训练,保证了模型训练进度。
上面从模型训练角度介绍了本申请实施例提供的模型的训练方法,在基于上述方法训练出模型后,可以应用训练出的模型进行波束选择,下面从模型应用的角度说明本申请实施例提供的选择波束信息的方法,该方法的执行主体可以是子节点设备,如图10所示,该方法可以包括:
S101,接收波束信息。
其中,子节点设备能够接收到来自网络设备的波束信息。
S102,将波束信息输入模型,输出目标波束信息。
其中,模型根据上述实施例介绍的模型的训练方法训练得到。
本申请实施例中,通过模型选择目标波束,而该模型训练效率较高,进而选择目标波束的效率较高。
上述主要从各个节点之间交互的角度对本申请实施例提供的方案进行了介绍。可以理解的是,各个节点,例如子节点设备和中心节点设备为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的算法步骤,本申请实施例的方法能够以硬件、软件、或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用使用不同方法来实现所描述的功能,但这种实现不应认为超出本申请的范围。
本申请实施例可以根据上述方法示例对子节点设备和中心节点设备进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在具体实现时,本申请所示各网元,如:子节点设备和中心节点设备可采用图11所示的组成结构或者包括图11所示的部件。图11为本申请实施例提供的一种通信装置1100的结构示意图,当该通信装置1100具有本申请实施例所述的子节点设备的功能时,该通信装置1100可以为子节点设备中的芯片或片上系统。当通信装置1100具有本申请实施例所述的中心节点设备的功能时,通信装置1100可以为中心节点设备中的芯片或片上系统。
如图11所示,该通信装置1100可以包括处理器1101,通信线路1102以及收发器1103。其中,处理器1101,存储器1104以及收发器1103之间可以通过通信线路1102连接。在一种示例中,处理器1101可以包括一个或多个CPU,例如图11中的CPU0和CPU1。
作为一种可选的实现方式,通信装置1100包括多个处理器,例如,除图11中的处理器1101之外,还可以包括处理器1107。
其中,处理器1101可以是中央处理器(central processing unit,CPU)、通用处理器网络处理器(network processor,NP)、数字信号处理器(digital signal processing,DSP)、微处理器、微控制器、可编程逻辑器件(programmable logic device,PLD)或它们的任意组合。处理器1101还可以是其它具有处理功能的装置,如电路、器件或软件模块等。
通信线路1102,用于在通信装置1100所包括的各部件之间传送信息。
收发器1103,用于与其他设备或其它通信网络进行通信。该其它通信网络可以为以太网,无线接入网(radio access network,RAN),无线局域网(wireless local area networks,WLAN)等。收发器1103可以是接口电路、管脚、射频模块、收发器或者任何能够实现通信的装置。
进一步的,该通信装置1100还可以包括存储器1104。存储器1104,用于存储指令。其中,指令可以是计算机程序。
其中,存储器1104可以是只读存储器(read_only memory,ROM)或可存储静态信息和/或指令的其他类型的静态存储设备,也可以是随机存取存储器(random access memory,RAM)或者可存储信息和/或指令的其他类型的动态存储设备,还可以是电可擦可编程只读存储器(electrically erasable programmable read_only memory,EEPROM)、只读光盘(compact disc read_only  memory,CD_ROM)或其他光盘存储、光碟存储、磁盘存储介质或其他磁存储设备,光碟存储包括压缩光碟、激光碟、光碟、数字通用光碟、或蓝光光碟等。
需要说明的是,存储器1104可以独立于处理器1101存在,也可以和处理器1101集成在一起。存储器1104可以用于存储指令或者程序代码或者一些数据等。存储器1104可以位于通信装置1100内,也可以位于通信装置1100外,不予限制。处理器1101执行存储器1104中存储的指令时,可以实现本申请实施例提供的方法。
作为一种可选的实现方式,通信装置1100还包括输出设备1105和输入设备1106。示例性地,输入设备1106是键盘、鼠标、麦克风或操作杆等设备,输出设备1105是显示屏、扬声器(speaker)等设备。
需要说明的是,通信装置1100可以是台式机、便携式电脑、网络服务器、移动手机、平板电脑、无线终端、嵌入式设备、芯片系统或有图11中类似结构的设备。此外,图11中示出的组成结构并不构成对该通信装置的限定,除图11所示部件之外,该通信装置可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
本申请实施例中,芯片系统可以由芯片构成,也可以包括芯片和其他分立器件。
图12示出了一种通信装置120的结构图,该通信装置应用于中心节点设备。图12所示装置中各模块具有实现图8中对应步骤的功能,并能达到其相应技术效果。各模块执行步骤相应的有益效果可以参考图8对应步骤的说明,不再赘述。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的模块。该通信装置可以为中心节点设备中的芯片或者片上系统。如:该通信装置包括:
接收模块121,用于接收来自多个子节点设备(参与模型的联邦学习的子节点设备)用于表征子节点设备训练模型能力的能力信息;发送模块122,用于向多个子节点设备中的每个子节点设备发送用于指示子节点设备训练模型时待更新的第一模型参数的第一指示信息和模型,该子节点设备的待更新的第一模型参数根据子节点设备的能力信息确定;接收模块121,还用于接收来自多个子节点设备的更新后的第一模型参数;处理模块123,用于通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数;该全局模型参数包括更新后的第一模型参数以及模型中第一模型参数之外的第二模型参数;发送模块122,用于向多个子节点设备发送融合参数。
在一种可能的实现方式中,处理模块123,具体用于:
根据能力信息分别预测多个子节点设备的模型训练时长;根据模型训练时长确定子节点设备的第一模型参数。
在一种可能的实现方式中,处理模块123,具体用于:
在模型训练时长不符合预设时长条件的情况下,选取模型的部分参数确定为第一模型参数;在模型训练时长符合预设时长条件的情况下,选取模型的全部参数确定为第一模型参数。
在一种可能的实现方式中,处理模块123,具体用于:
随机选取模型的部分参数确定为第一模型参数。
在一种可能的实现方式中,模型包括第一网络层和第二网络层;模型应用场景指示第一网络层,第一网络层和第二网络层不同;处理模块123,具体用于选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数。
在一种可能的实现方式中,处理模块123还用于:
在选取全连接层中预设全连接层的参数确定为第一模型参数之前,根据模型应用场景确定构建模型所需的第一网络层。
在一种可能的实现方式中,能力信息还用于表征子节点设备能够训练的模型类型,处理模块123还用于:
将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备。
在一种可能的实现方式中,处理模块123,具体用于:
对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重设为第一预设权值,第一模型参数的权重设为第二预设权值,进行加权求均值,得到融合参数;或者,对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重和第一模型参数的权重均设为第二预设权值,进行加权求均值,得到融合参数。
本申请实施例中,首先,多个子节点设备向中心节点设备发送能力信息,中心节点设备再根据 能力信息向多个子节点设备中的每个多个子节点设备发送第一指示信息和模型,然后,多个子节点设备根据训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数,并且,中心节点设备通过预设融合算法对多个子节点设备的包括了第一模型参数的全局模型参数进行融合,得到融合参数,然后多个子节点设备根据融合参数训练模型。使得子节点设备在模型训练时,部分参数冻结,不参与训练,进而使得在训练模型时,不会因为低算力的中心节点设备影响模型训练进度,训练模型的效率较高。
进一步地,首先根据能力信息分别预测多个子节点设备的模型训练时长,然后根据模型训练时长确定子节点设备的第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
进一步地,根据模型训练时长与预设时长条件不同的比对结果,相应采取选取模型的部分参数确定为第一模型参数和选取模型的全部参数确定为第一模型参数两种策略,在保证模型训练进度的同时保证了模型性能。
进一步地,随机选取模型的部分参数确定为第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
进一步地,有针对性的选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
进一步地,模型应用场景根据模型应用场景确定构建模型所需的第一网络层,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
进一步地,将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备,确保了所有参与训练的子节点设备都能训练模型,保证了模型训练进度。
进一步地,设计了第二模型参数无法影响融合参数的预设融合算法,进行融合参数的计算,使得未更新的第二模型参数无法影响模型训练,保证了模型训练性能。还设计了第二模型参数能够影响融合参数的预设融合算法,进行融合参数的计算,确定出的融合参数能够参与模型训练,保证了模型训练进度。
图13示出了一种通信装置130的结构图,该通信装置应用于子节点设备。图13所示装置中各模块具有实现图8中对应步骤的功能,并能达到其相应技术效果。各模块执行步骤相应的有益效果可以参考图8对应步骤的说明,不再赘述。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。所述硬件或软件包括一个或多个上述功能相应的模块。该通信装置可以为子节点设备中的芯片或者片上系统。如:该通信装置包括:
发送模块131,用于向中心节点设备发送用于表征子节点设备训练模型能力的能力信息。
接收模块132,用于接收来自中心节点设备的用于指示子节点设备训练模型时待更新的第一模型参数的第一指示信息和模型,该子节点设备的待更新的第一模型参数根据子节点设备的能力信息确定。
处理模块133,用于根据训练集(包括子节点设备采集到的波束信息)和待更新模型参数对模型进行训练,得到更新后的第一模型参数。
发送模块131,用于向中心节点设备发送更新后的第一模型参数。
接收模块132,用于接收来自中心节点设备根据更新后的第一模型参数计算得到的融合参数。
处理模块133,用于根据融合参数训练模型。
接收模块132,用于接收波束信息。
处理模块133,用于将波束信息输入模型,输出目标波束信息。
本申请实施例中,首先,多个子节点设备向中心节点设备发送能力信息,中心节点设备再根据能力信息向多个子节点设备中的每个多个子节点设备发送第一指示信息和模型,然后,多个子节点设备根据训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数,并且,中心节点设备通过预设融合算法对多个子节点设备的包括了第一模型参数的全局模型参数进行融合,得到融合参数,然后多个子节点设备根据融合参数训练模型。使得子节点设备在模型训练时,部分参数冻结,不参与训练,进而使得在训练模型时,不会因为低算力的中心节点设备影响模型训练进度,训练模型的效率较高。
进一步地,首先根据能力信息分别预测多个子节点设备的模型训练时长,然后根据模型训练时长确定子节点设备的第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备 匹配,保证模型训练进度。
进一步地,根据模型训练时长与预设时长条件不同的比对结果,相应采取选取模型的部分参数确定为第一模型参数和选取模型的全部参数确定为第一模型参数两种策略,在保证模型训练进度的同时保证了模型性能。
进一步地,随机选取模型的部分参数确定为第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
进一步地,有针对性的选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
进一步地,模型应用场景根据模型应用场景确定构建模型所需的第一网络层,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
进一步地,将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备,确保了所有参与训练的子节点设备都能训练模型,保证了模型训练进度。
进一步地,设计了第二模型参数无法影响融合参数的预设融合算法,进行融合参数的计算,使得未更新的第二模型参数无法影响模型训练,保证了模型训练性能。还设计了第二模型参数能够影响融合参数的预设融合算法,进行融合参数的计算,确定出的融合参数能够参与模型训练,保证了模型训练进度。
图14为本申请实施例提供的一种通信系统的结构图,该通信系统为模型训练场景相应的通信系统,如图14所示,该通信系统可以包括:中心节点设备140和子节点设备141。其中,中心节点设备140可以具有上述通信装置120的功能,子节点设备141可以具有上述通信装置130的功能。
具体的,中心节点设备140执行如下步骤:
接收来自多个子节点设备(参与模型的联邦学习的子节点设备)的用于表征子节点设备训练模型能力的能力信息,向多个子节点设备中的每个子节点设备发送指示子节点设备训练模型时待更新的第一模型参数(根据子节点设备的能力信息确定)的第一指示信息和模型,接收来自多个子节点设备的更新后的第一模型参数,通过预设融合算法对多个子节点设备的全局模型参数(包括更新后的第一模型参数以及模型中第一模型参数之外的第二模型参数)进行融合以得到融合参数,向多个子节点设备发送融合参数。
本申请实施例中,首先,多个子节点设备向中心节点设备发送能力信息,中心节点设备再根据能力信息向多个子节点设备中的每个多个子节点设备发送第一指示信息和模型,然后,多个子节点设备根据训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数,并且,中心节点设备通过预设融合算法对多个子节点设备的包括了第一模型参数的全局模型参数进行融合,得到融合参数,然后多个子节点设备根据融合参数训练模型。使得子节点设备在模型训练时,部分参数冻结,不参与训练,进而使得在训练模型时,不会因为低算力的中心节点设备影响模型训练进度,训练模型的效率较高。
在一种可能的实现方式中,中心节点设备140还执行如下步骤:
根据能力信息分别预测多个子节点设备的模型训练时长,根据模型训练时长确定子节点设备的第一模型参数。
本申请实施例中,首先根据能力信息分别预测多个子节点设备的模型训练时长,然后根据模型训练时长确定子节点设备的第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
在一种可能的实现方式中,根据模型训练时长确定子节点设备的第一模型参数,包括:
在模型训练时长不符合预设时长条件的情况下,选取模型的部分参数确定为第一模型参数,在模型训练时长符合预设时长条件的情况下,选取模型的全部参数确定为第一模型参数。
本申请实施例中,根据模型训练时长与预设时长条件不同的比对结果,相应采取选取模型的部分参数确定为第一模型参数和选取模型的全部参数确定为第一模型参数两种策略,在保证模型训练进度的同时保证了模型性能。
在一种可能的实现方式中,选取模型的部分参数确定为第一模型参数,包括:
随机选取模型的部分参数确定为第一模型参数。
本申请实施例中,随机选取模型的部分参数确定为第一模型参数,使得确定出的待训练的第一模型参数能够与相应的子节点设备匹配,保证模型训练进度。
在一种可能的实现方式中,模型包括第一网络层(模型应用场景指示第一网络层)和第二网络层,该第一网络层和第二网络层不同;选取模型的部分参数确定为第一模型参数,包括:
选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数。
本申请实施例中,有针对性的选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种可能的实现方式中,在选取第一网络层中预设第一网络层的参数确定为第一模型参数之前,中心节点设备140还执行如下步骤:
模型应用场景根据模型应用场景确定构建模型所需的第一网络层。
本申请实施例中,模型应用场景根据模型应用场景确定构建模型所需的第一网络层,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种可能的实现方式中,能力信息还用于表征子节点设备能够训练的模型类型,中心节点设备140还执行如下步骤:
将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备。
本申请实施例中,将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备,确保了所有参与训练的子节点设备都能训练模型,保证了模型训练进度。
在一种可能的实现方式中,通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数,包括:
对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重设为第一预设权值,第一模型参数的权重设为第二预设权值,进行加权求均值以得到融合参数;或者,对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重和第一模型参数的权重均设为第二预设权值,进行加权求均值以得到融合参数。选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数。
本申请实施例中,有针对性的选取第一网络层的参数以及第二网络层的部分参数确定为第一模型参数,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种可能的实现方式中,在选取第一网络层中预设第一网络层的参数确定为第一模型参数之前,中心节点设备140还执行如下步骤:
模型应用场景根据模型应用场景确定构建模型所需的第一网络层。
本申请实施例中,模型应用场景根据模型应用场景确定构建模型所需的第一网络层,考虑到了模型应用场景对网络层的要求,保证模型训练性能。
在一种可能的实现方式中,能力信息还用于表征子节点设备能够训练的模型类型,中心节点设备140还执行如下步骤:
将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备。
本申请实施例中,将能力信息包括模型的模型类型的子节点设备确定为发送第一指示信息和模型的目标设备,确保了所有参与训练的子节点设备都能训练模型,保证了模型训练进度。
在一种可能的实现方式中,通过预设融合算法对多个子节点设备的全局模型参数进行融合,得到融合参数,包括:
对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重设为第一预设权值,第一模型参数的权重设为第二预设权值,进行加权求均值以得到融合参数;或者,对于不同子节点设备的相同功能的模型参数,将第二模型参数的权重和第一模型参数的权重均设为第二预设权值,进行加权求均值以得到融合参数。
本申请实施例中,设计了第二模型参数无法影响融合参数的预设融合算法,进行融合参数的计算,使得未更新的第二模型参数无法影响模型训练,保证了模型训练性能。还设计了第二模型参数能够影响融合参数的预设融合算法,进行融合参数的计算,确定出的融合参数能够参与模型训练,保证了模型训练进度。
子节点设备141执行如下步骤:
向中心节点设备发送用于表征子节点设备训练模型能力的能力信息,接收来自中心节点设备的用于指示子节点设备训练模型时待更新的第一模型参数的第一指示信息和模型,该子节点设备的待更新的第一模型参数根据子节点设备的能力信息确定;根据包括子节点设备采集到的波束信息的训练集和待更新模型参数对模型进行训练,得到更新后的第一模型参数;向中心节点设备发送更新后 的第一模型参数,接收来自所述中心节点设备的根据所述更新后的第一模型参数计算得到的融合参数,根据融合参数训练所述模型;接收波束信息,将波束信息输入模型,输出目标波束信息。
本申请实施例中,通过模型选择目标波束,选择目标波束的效率较高。
本申请实施例还提供了一种计算机可读存储介质。上述方法实施例中的全部或者部分流程可以由计算机程序来指令相关的硬件完成,该程序可存储于上述计算机可读存储介质中,该程序在执行时,可包括如上述各方法实施例的流程。计算机可读存储介质可以是前述任一实施例的终端装置,如:包括数据发送端和/或数据接收端的内部存储单元,例如终端装置的硬盘或内存。上述计算机可读存储介质也可以是上述终端装置的外部存储设备,例如上述终端装置上配备的插接式硬盘,智能存储卡(smart media card,SMC),安全数字(secure digital,SD)卡,闪存卡(flash card)等。进一步地,上述计算机可读存储介质还可以既包括上述终端装置的内部存储单元也包括外部存储设备。上述计算机可读存储介质用于存储上述计算机程序以及上述终端装置所需的其他程序和数据。上述计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例还提供了一种计算机指令。上述方法实施例中的全部或者部分流程可以由计算机指令来指令相关的硬件(如计算机、处理器、网络设备、和终端等)完成。该程序可被存储于上述计算机可读存储介质中。
本申请实施例还提供了一种芯片系统。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件,不予限制。该芯片系统包括处理器以及收发器,上述方法实施例中的全部或者部分流程可以由该芯片系统完成,如该芯片系统可以用于实现上述方法实施例中中心节点设备所执行的功能,或者,实现上述方法实施例中子节点设备所执行的功能。
在一种可能的设计中,上述芯片系统还包括存储器,所述存储器,用于保存程序指令和/或数据,当该芯片系统运行时,该处理器执行该存储器存储的该程序指令,以使该芯片系统执行上述方法实施例中中心节点设备所执行的功能或者执行上述方法实施例中子节点设备所执行的功能。
在本申请实施例中,处理器可以是通用处理器、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件,可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
在本申请实施例中,存储器可以是非易失性存储器,比如硬盘(hard disk drive,HDD)或固态硬盘(solid-state drive,SSD)等,还可以是易失性存储器(volatile memory),例如随机存取存储器(random-access memory,RAM)。存储器是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质,但不限于此。本申请实施例中的存储器还可以是电路或者其它任意能够实现存储功能的装置,用于存储指令和/或数据。
需要说明的是,本申请的说明书、权利要求书及附图中的术语“第一”和“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
应当理解,在本申请实施例中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上,“至少两个(项)”是指两个或三个及三个以上,“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。应理解,在本申请实施例中,“与A对应的B”表示B与A相关联。例如,可以根据A可以确定B。还应理解,根据A确定B并不意味着仅仅根据A确定B,还可以根据A和/或其它信息确定B。此外,本申请实施例中出现的“连接”是指直接连接或者间接连接等各种连接方式,以实现设备间的通信,本申请实施例对此不做任何限定。
本申请实施例中出现的“传输”(transmit/transmission)如无特别说明,是指双向传输,包含发送和/或接收的动作。具体地,本申请实施例中的“传输”包含数据的发送,数据的接收,或者数据的 发送和数据的接收。或者说,这里的数据传输包括上行和/或下行数据传输。数据可以包括信道和/或信号,上行数据传输即上行信道和/或上行信号传输,下行数据传输即下行信道和/或下行信号传输。本申请实施例中出现的“网络”与“系统”表达的是同一概念,通信系统即为通信网络。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备,如:可以是单片机,芯片等,或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (22)

  1. 一种模型的训练方法,其特征在于,所述方法应用于中心节点设备,包括:
    接收来自多个子节点设备的能力信息,所述子节点设备的能力信息用于表征所述子节点设备训练所述模型的能力;所述多个子节点设备为参与所述模型的联邦学习的节点设备;
    向所述多个子节点设备中的每个子节点设备发送第一指示信息和所述模型,所述第一指示信息用于指示所述子节点设备训练所述模型时待更新的第一模型参数,所述子节点设备的待更新的第一模型参数根据所述子节点设备的能力信息确定;
    接收来自所述多个子节点设备的更新后的所述第一模型参数;
    通过预设融合算法对所述多个子节点设备的全局模型参数进行融合,得到融合参数;所述全局模型参数包括所述更新后的所述第一模型参数以及所述模型中所述第一模型参数之外的第二模型参数;
    向所述多个子节点设备发送所述融合参数。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    根据所述能力信息分别预测所述多个子节点设备的模型训练时长;
    根据所述模型训练时长确定所述子节点设备的所述第一模型参数。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述模型训练时长确定所述子节点设备的所述第一模型参数,包括:
    在所述模型训练时长不符合预设时长条件的情况下,选取所述模型的部分参数确定为所述第一模型参数;
    在所述模型训练时长符合预设时长条件的情况下,选取所述模型的全部参数确定为所述第一模型参数。
  4. 根据权利要求3所述的方法,其特征在于,所述选取所述模型的部分参数确定为所述第一模型参数,包括:
    随机选取所述模型的部分参数确定为所述第一模型参数。
  5. 根据权利要求3所述的方法,其特征在于,所述模型包括第一网络层和第二网络层;模型应用场景指示所述第一网络层,所述第一网络层和第二网络层不同;
    所述选取所述模型的部分参数确定为所述第一模型参数,包括:
    选取所述第一网络层的参数以及所述第二网络层的部分参数确定为所述第一模型参数。
  6. 根据权利要求5所述的方法,其特征在于,在选取所述第一网络层中预设第一网络层的参数确定为所述第一模型参数之前,所述方法还包括:
    根据模型应用场景确定构建所述模型所需的第一网络层。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述能力信息还用于表征所述子节点设备能够训练的模型类型,所述方法还包括:
    将所述能力信息包括所述模型的模型类型的所述子节点设备确定为发送第一指示信息和所述模型的目标设备。
  8. 根据权利要求1-7任一项所述的方法,其特征在于,所述通过预设融合算法对所述多个子节点设备的全局模型参数进行融合,得到融合参数,包括:
    对于不同子节点设备的相同功能的模型参数,将所述第二模型参数的权重设为第一预设权值,所述第一模型参数的权重设为第二预设权值,进行加权求均值,得到所述融合参数;
    或者,对于不同子节点设备的相同功能的模型参数,将所述第二模型参数的权重和所述第一模型参数的权重均设为所述第二预设权值,进行加权求均值,得到所述融合参数。
  9. 一种选择波束信息的方法,其特征在于,所述方法应用于子节点设备,包括:
    接收波束信息;
    将所述波束信息输入模型,输出目标波束信息;其中所述模型根据权利要求1-8任一项所述的方法训练得到。
  10. 一种模型的训练方法,其特征在于,所述方法应用于子节点设备,包括:
    向中心节点设备发送能力信息,所述子节点设备的能力信息用于表征所述子节点设备训练模 型的能力;
    接收来自所述中心节点设备的第一指示信息和所述模型,所述第一指示信息用于指示所述子节点设备训练所述模型时待更新的第一模型参数,所述子节点设备的待更新的第一模型参数根据所述子节点设备的能力信息确定;
    根据训练集和所述待更新模型参数对所述模型进行训练,得到更新后的所述第一模型参数,所述训练集包括所述子节点设备采集到的波束信息;
    向所述中心节点设备发送所述更新后的第一模型参数;
    接收来自所述中心节点设备的融合参数,所述融合参数根据所述更新后的第一模型参数计算得到;
    根据所述融合参数训练所述模型。
  11. 一种模型的训练装置,其特征在于,所述装置应用于中心节点设备,包括:
    接收模块,用于接收来自多个子节点设备的能力信息,所述子节点设备的能力信息用于表征所述子节点设备训练所述模型的能力;所述多个子节点设备为参与所述模型的联邦学习的节点设备;
    发送模块,用于向所述多个子节点设备中的每个子节点设备发送第一指示信息和所述模型,所述第一指示信息用于指示所述子节点设备训练所述模型时待更新的第一模型参数,所述子节点设备的待更新的第一模型参数根据所述子节点设备的能力信息确定;
    所述接收模块,还用于接收来自所述多个子节点设备的更新后的所述第一模型参数;
    处理模块,用于通过预设融合算法对所述多个子节点设备的全局模型参数进行融合,得到融合参数;所述全局模型参数包括所述更新后的所述第一模型参数以及所述模型中所述第一模型参数之外的第二模型参数;
    所述处理模块,还用于向所述多个子节点设备发送所述融合参数。
  12. 根据权利要求11所述的装置,其特征在于,所述处理模块,具体用于:
    根据所述能力信息分别预测所述多个子节点设备的模型训练时长;
    根据所述模型训练时长确定所述子节点设备的所述第一模型参数。
  13. 根据权利要求12所述的装置,其特征在于,所述处理模块,具体用于:
    在所述模型训练时长不符合预设时长条件的情况下,选取所述模型的部分参数确定为所述第一模型参数;
    在所述模型训练时长符合预设时长条件的情况下,选取所述模型的全部参数确定为所述第一模型参数。
  14. 根据权利要求13所述的装置,其特征在于,所述处理模块,具体用于:
    随机选取所述模型的部分参数确定为所述第一模型参数。
  15. 根据权利要求13所述的装置,其特征在于,所述模型包括第一网络层和第二网络层;模型应用场景指示所述第一网络层,所述第一网络层和第二网络层不同;
    所述处理模块,具体用于:
    选取所述第一网络层的参数以及所述第二网络层的部分参数确定为所述第一模型参数。
  16. 根据权利要求15所述的装置,其特征在于,所述处理模块还用于:
    在选取所述第一网络层中预设第一网络层的参数确定为所述第一模型参数之前,根据模型应用场景确定构建所述模型所需的第一网络层。
  17. 根据权利要求11-16任一项所述的装置,其特征在于,所述能力信息还用于表征所述子节点设备能够训练的模型类型,所述处理模块还用于:
    将所述能力信息包括所述模型的模型类型的所述子节点设备确定为发送第一指示信息和所述模型的目标设备。
  18. 根据权利要求11-17任一项所述的装置,其特征在于,所述处理模块,具体用于:
    对于不同子节点设备的相同功能的模型参数,将所述第二模型参数的权重设为第一预设权值,所述第一模型参数的权重设为第二预设权值,进行加权求均值,得到所述融合参数;
    或者,对于不同子节点设备的相同功能的模型参数,将所述第二模型参数的权重和所述第一 模型参数的权重均设为所述第二预设权值,进行加权求均值,得到所述融合参数。
  19. 一种通信装置,其特征在于,所述装置应用于子节点设备包括:
    接收模块,用于接收波束信息;
    处理模块,用于将所述波束信息输入模型,输出目标波束信息;其中所述模型根据权利要求11-18任一项所述装置训练得到。
  20. 一种通信装置,其特征在于,所述装置应用于子节点设备,包括:
    发送模块,用于向中心节点设备发送能力信息,所述子节点设备的能力信息用于表征所述子节点设备训练模型的能力;
    接收模块,用于接收来自所述中心节点设备的第一指示信息和所述模型,所述第一指示信息用于指示所述子节点设备训练所述模型时待更新的第一模型参数,所述子节点设备的待更新的第一模型参数根据所述子节点设备的能力信息确定;
    处理模块,用于根据训练集和所述待更新模型参数对所述模型进行训练,得到更新后的所述第一模型参数,所述训练集包括所述子节点设备采集到的波束信息;
    发送模块,用于向所述中心节点设备发送所述更新后的第一模型参数;
    接收模块,用于接收来自所述中心节点设备的融合参数,所述融合参数根据所述更新后的第一模型参数计算得到;
    处理模块,用于根据所述融合参数训练所述模型。
  21. 一种通信装置,其特征在于,所述通信装置包括处理器和收发器,所述处理器和所述收发器用于支持所述通信装置执行如权利要求1-10任一项所述的方法。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储计算机指令,当所述计算机指令运行时,执行如权利要求1-10任一项所述的方法。
PCT/CN2023/104254 2022-07-21 2023-06-29 模型的训练方法及通信装置 WO2024017001A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210872876.X 2022-07-21
CN202210872876.XA CN117474116A (zh) 2022-07-21 2022-07-21 模型的训练方法及通信装置

Publications (1)

Publication Number Publication Date
WO2024017001A1 true WO2024017001A1 (zh) 2024-01-25

Family

ID=89617044

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/104254 WO2024017001A1 (zh) 2022-07-21 2023-06-29 模型的训练方法及通信装置

Country Status (2)

Country Link
CN (1) CN117474116A (zh)
WO (1) WO2024017001A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117709486B (zh) * 2024-02-05 2024-04-19 清华大学 一种面向协作学习的动态聚合方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310932A (zh) * 2020-02-10 2020-06-19 深圳前海微众银行股份有限公司 横向联邦学习系统优化方法、装置、设备及可读存储介质
CN112818207A (zh) * 2021-02-26 2021-05-18 深圳前海微众银行股份有限公司 网络结构搜索方法、装置、设备、存储介质及程序产品
US20210406782A1 (en) * 2020-06-30 2021-12-30 TieSet, Inc. System and method for decentralized federated learning
CN114386570A (zh) * 2021-12-21 2022-04-22 中山大学 一种基于多分支神经网络模型的异构联邦学习训练方法
CN114444708A (zh) * 2020-10-31 2022-05-06 华为技术有限公司 获取模型的方法、装置、设备、系统及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310932A (zh) * 2020-02-10 2020-06-19 深圳前海微众银行股份有限公司 横向联邦学习系统优化方法、装置、设备及可读存储介质
US20210406782A1 (en) * 2020-06-30 2021-12-30 TieSet, Inc. System and method for decentralized federated learning
CN114444708A (zh) * 2020-10-31 2022-05-06 华为技术有限公司 获取模型的方法、装置、设备、系统及可读存储介质
CN112818207A (zh) * 2021-02-26 2021-05-18 深圳前海微众银行股份有限公司 网络结构搜索方法、装置、设备、存储介质及程序产品
CN114386570A (zh) * 2021-12-21 2022-04-22 中山大学 一种基于多分支神经网络模型的异构联邦学习训练方法

Also Published As

Publication number Publication date
CN117474116A (zh) 2024-01-30

Similar Documents

Publication Publication Date Title
Wang et al. Transfer learning promotes 6G wireless communications: Recent advances and future challenges
Chen et al. Distributed learning in wireless networks: Recent progress and future challenges
Xu et al. Edge learning for B5G networks with distributed signal processing: Semantic communication, edge computing, and wireless sensing
Brik et al. Deep learning for B5G open radio access network: Evolution, survey, case studies, and challenges
Nguyen et al. Enabling AI in future wireless networks: A data life cycle perspective
Maksymyuk et al. Deep learning based massive MIMO beamforming for 5G mobile network
Hu et al. Distributed machine learning for wireless communication networks: Techniques, architectures, and applications
He et al. An overview on the application of graph neural networks in wireless networks
Nassef et al. A survey: Distributed Machine Learning for 5G and beyond
WO2024017001A1 (zh) 模型的训练方法及通信装置
Nguyen et al. DRL‐based intelligent resource allocation for diverse QoS in 5G and toward 6G vehicular networks: a comprehensive survey
US20230262489A1 (en) Apparatuses and methods for collaborative learning
Liu et al. Distributed intelligence in wireless networks
Boulogeorgos et al. Machine learning: A catalyst for THz wireless networks
Koudouridis et al. An architecture and performance evaluation framework for artificial intelligence solutions in beyond 5G radio access networks
Nguyen et al. Wireless AI: Enabling an AI-governed data life cycle
CN116419257A (zh) 一种通信方法及装置
Gures et al. A comparative study of machine learning-based load balancing in high-speed train system
Alhammadi et al. Artificial Intelligence in 6G Wireless Networks: Opportunities, Applications, and Challenges
WO2023098860A1 (zh) 通信方法和通信装置
Lin et al. Heuristic-learning-based network architecture for device-to-device user access control
JP7365481B2 (ja) 無線アクセスネットワークにおける電力節約
Liu et al. Joint radio map construction and dissemination in MEC networks: a deep reinforcement learning approach
Rajiv et al. Massive MIMO based beamforming by optical multi-hop communication with energy efficiency for smart grid IoT 5G application
Zhang et al. Map2Schedule: An End-to-End Link Scheduling Method for Urban V2V Communications

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23842066

Country of ref document: EP

Kind code of ref document: A1