WO2023116787A1 - 智能模型的训练方法和装置 - Google Patents

智能模型的训练方法和装置 Download PDF

Info

Publication number
WO2023116787A1
WO2023116787A1 PCT/CN2022/140797 CN2022140797W WO2023116787A1 WO 2023116787 A1 WO2023116787 A1 WO 2023116787A1 CN 2022140797 W CN2022140797 W CN 2022140797W WO 2023116787 A1 WO2023116787 A1 WO 2023116787A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
target
gradient information
gradient
feature
Prior art date
Application number
PCT/CN2022/140797
Other languages
English (en)
French (fr)
Inventor
马梦瑶
刘坚能
苏立群
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP22910099.5A priority Critical patent/EP4435675A1/en
Publication of WO2023116787A1 publication Critical patent/WO2023116787A1/zh
Priority to US18/750,688 priority patent/US20240346329A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0495Quantised networks; Sparse networks; Compressed networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the communication field, and more specifically, to a method and device for training an intelligent model.
  • federated learning federated learning, FL
  • the server provides model parameters for multiple devices, and multiple devices perform intelligent model training based on their own data sets, and then transfer the gradient information of the loss function to Feedback to the server, the server updates the model parameters based on the gradient information fed back by multiple devices.
  • the models in the multiple devices participating in the model training are the same as the model of the server, and the types of training data used by the devices participating in the model training are the same.
  • multiple image acquisition devices can use the image data collected by each Training the model, this method can improve the diversity of training data, but does not consider the feature diversity of the inference target, how to achieve feature diversity model training in federated learning to improve model performance, there is currently no effective solution .
  • the present application provides a training method and device for an intelligent model, which realizes distributed training of federated learning based on different features, and can improve the performance of the trained model.
  • a training method for an intelligent model is provided.
  • the central node and multiple participating node groups jointly execute the training of the intelligent model.
  • the intelligent model is composed of multiple feature models corresponding to various features of the inference target.
  • the participating nodes in the node group train a feature model, and the training method is executed by the first participating node that trains the first feature model in the multiple participating node groups, including: receiving first information from the central node, the first The information is used to indicate a constraint variable between features, and the constraint variable between features is used to represent a constraint relationship between different features.
  • the gradient inference model is used to obtain first gradient information
  • the first gradient information is the gradient information corresponding to the inter-feature constraint variables. Send the first gradient information to the central node.
  • different types of participating nodes calculate the gradient information corresponding to the constraint variables between features as the model parameters are updated during the model training process, and feed back to the central node.
  • the central node infers the features based on the participating nodes corresponding to different feature models
  • the gradient information of the inter-constraint variables update the inter-feature constraint variables that represent the relationship between features, so as to realize the decoupling between the features of the model, so that different types of participating nodes can train different feature models based on the inter-feature constraint variables and local feature data , so that the central node can update the inter-feature constraint variables based on the gradient of the inter-feature constraint variables fed back by different types of participating nodes. It realizes the diversity of training data and feature diversity of federated learning without transmitting the original data. That is, it avoids the leakage of original data and can improve the performance of the trained model.
  • the method further includes: receiving a first identification set from the central node, where the first identification set includes sample data of inter-feature constraint variables selected by the central node logo.
  • the gradient inference model is used to obtain the first gradient information
  • the first gradient information is the gradient information corresponding to the inter-feature constraint variables, including : It is determined that the sample data set of the first participating node includes the first sample data corresponding to the first identifier, and the first identifier belongs to the first identifier set.
  • the gradient inference model is used to obtain first gradient information
  • the first gradient information is the gradient information corresponding to the inter-feature constraint variables.
  • the central node selects part of the sample data through the first identification set to infer gradient information of constraint variables between features.
  • the participating nodes that store the selected sample data reason the gradient information of constraint variables between features based on the model parameters of the current feature model and the sample data, and feed back to the central node.
  • resource overhead and implementation complexity can be reduced.
  • the sending the first gradient information to the central node includes: sending quantized first target gradient information to the central node, the first target gradient information include the first gradient information, or the first target gradient information includes the first gradient information and first residual gradient information, and the first residual gradient information is used to indicate that the first gradient information is not sent to the The residual amount of the gradient information corresponding to the inter-feature constraint variable of the central node.
  • the participating nodes can send the gradient information obtained from the training and the residual amount of the gradient information not transmitted to the central node before this model training to the central node. This enables the central node to obtain residual gradient information and improves the efficiency of model training.
  • the method further includes: obtaining second residual gradient information based on the first target gradient information and the quantized first target gradient information, and the second The residual gradient information is the residual amount not sent to the central node in the first target gradient information.
  • the method further includes: determining a first threshold value according to the first quantization noise information and channel resource information, wherein the first quantization noise information is used to characterize The loss amount of the quantized coding and decoding of the first target gradient information.
  • Sending the quantized first target gradient information to the central node includes: determining that the metric value of the first target gradient information is greater than the first threshold; and sending the quantized first target gradient information to the central node.
  • the method further includes: if the metric value of the first target gradient information is less than or equal to the first threshold value, determining not to send the quantized The gradient information of the first target.
  • the participating nodes determine a threshold value for judging whether to send quantized target gradient information based on quantization noise information and channel resource information.
  • This method considers the amount of quantized codec loss of target information, determines whether to send the target information to the central node, realizes the adaptive scheduling of the participating nodes to the channel environment, and can improve the reliability of signal transmission and the utilization rate of channel resources.
  • the method further includes: if the metric value of the first target gradient information is smaller than the first threshold value, determining third residual gradient information, the third The residual gradient information is the first target gradient information.
  • the method further includes: obtaining the first quantization noise information according to channel resource information, communication cost information, and first target gradient information, and the communication cost information is used for Indicates the communication cost weight of the communication resource, where the communication resource includes transmission power and/or transmission bandwidth.
  • the participating nodes can obtain the first quantization noise information based on the channel resource information, communication cost information and the first target gradient information, so that adaptive scheduling can be realized based on the first quantization noise information, and the reliability of signal transmission and channel resource utilization.
  • the determining the first threshold value according to the first quantization noise information and channel resource information includes: according to the first quantization noise information, communication cost information, the The channel resource information and the first target gradient information determine the transmission bandwidth and/or the transmission power, the communication cost information is used to indicate the communication cost weight of the communication resource, and the communication resource includes the transmission power and/or the transmission bandwidth; according to the first The noise information and the communication resource are quantized to determine the first threshold.
  • the method further includes: receiving second information from the central node, where the second information is used to indicate the communication cost information.
  • the method further includes: training the first feature model according to the inter-feature constraint variable and model training data to obtain second gradient information. Send the second gradient information to the central node.
  • the feature decoupling of the model is realized through the inter-feature constraint variables, so that different types of participating nodes can train different feature models based on the inter-feature constraint variables and local feature data, which not only realizes the training data in federated learning.
  • Diversity enables model training for different features. So as to achieve the effect of improving the performance of the trained model.
  • the sending the second gradient information to the central node includes: sending quantized second target gradient information to the central node, the second target gradient information include the second gradient information, or the second target gradient information includes the second gradient information and fourth residual gradient information, and the fourth residual gradient information is used to indicate that the second gradient information is not sent to the The residual amount of the gradient information of the central node.
  • the method further includes: obtaining fifth residual gradient information based on the second target gradient information and the quantized second target gradient information, the fifth The residual gradient information is used to characterize the residual amount not sent to the central node in the second target gradient information.
  • the method further includes: determining a second threshold value according to the second quantization noise information and channel resource information, wherein the second quantization noise information is used to characterize Loss amount of the quantized codec for the second target gradient information.
  • Sending the quantized second target gradient information to the central node includes: determining that the metric value of the second target gradient information is greater than the second threshold; and sending the quantized second target gradient information to the central node.
  • the method further includes: if the metric value of the second target gradient information is less than or equal to the second threshold value, determining not to send the quantized The gradient information of the second target.
  • the method further includes: if the metric value of the second target gradient information is less than the second threshold value, determining sixth residual gradient information, the sixth The residual gradient information is the second target gradient information.
  • the method further includes: obtaining the second quantization noise information according to channel resource information, communication cost information, and second target gradient information, and the communication cost information is used for Indicates the communication cost weight of the communication resource, where the communication resource includes transmission power and/or transmission bandwidth.
  • the determining the second threshold value according to the second quantization noise information and channel resource information includes: according to the second quantization noise information, communication cost information, the The channel resource information and the second target gradient information determine the transmission bandwidth and/or the transmission power, the communication cost information is used to indicate the communication cost weight of the communication resource, and the communication resource includes the transmission power and/or the transmission bandwidth; according to the second Quantize the noise information and the communication resource to determine the second threshold.
  • the method further includes: receiving third information from the central node, where the third information is used to indicate the updated parameters of the first feature model; according to The third information updates the parameters of the first feature model.
  • a training method for an intelligent model is provided.
  • the central node and multiple participating node groups jointly execute the training of the intelligent model.
  • the intelligent model is composed of multiple feature models corresponding to various features of the inference target. Participating nodes in the node group train a feature model, and the training method is executed by the central node, including: determining inter-feature constraint variables, the inter-feature constraint variables are used to represent different constraint relationships between the features; Participating nodes in a participating node group send first information including the inter-feature constraint variable.
  • the method further includes: receiving at least one second target gradient information from participating nodes in the first participating node group, the plurality of participating node groups including the first A participating node group; according to the at least one second target gradient information, determine the updated model parameters of the first feature model, the first feature model is the feature model trained by the participating nodes in the first participating node group; The first group of participating nodes sends the updated model parameters.
  • the central node receives at least one second target gradient information from participating nodes in the first participating node group, specifically, the central node receives at least one quantized target gradient information from participating nodes in the first participating node group the second target gradient information, and quantize and decode the quantized second target gradient information to obtain the second target gradient information.
  • the second target gradient information obtained by quantization and decoding of the central node may have a loss due to quantization coding and decoding compared with the second target gradient information before quantization and coding of participating nodes.
  • the method further includes: sending a first identification set to the participating nodes in the plurality of participating node groups, the first identification set including the feature interval selected by the central node The ID of the sample data for the constraint variable.
  • the method further includes: receiving a plurality of first target gradient information from participating nodes in the plurality of participating node groups, the first target gradient information being the Gradient information corresponding to inter-feature constraint variables obtained by participating node reasoning; and determining the inter-feature constraint variables includes: determining the inter-feature constraint variables according to the plurality of first target gradient information.
  • the method further includes: sending second information to participating nodes in the multiple participating node groups, where the second information is used to indicate communication cost information, and the communication The cost information is used to indicate the communication cost weight of the communication resources, where the communication resources include transmission power and/or transmission bandwidth.
  • a communication method includes: determining a threshold value according to quantization noise information and channel resource information, wherein the quantization noise information is used to represent the loss amount of quantization coding and decoding of target information; When the metric value of the target information is greater than the threshold value, the quantized target information is sent; when the metric value of the target information is less than or equal to the threshold value, the quantized target information is not sent.
  • the participating nodes determine whether to send the target information to the central node based on the target information to be transmitted and the communication cost information broadcast by the central node, taking into account the quantified encoding and decoding loss of the target information, and realize the autonomous control of the channel environment by the participating nodes.
  • Adaptive scheduling can improve the reliability of target signal transmission and the utilization rate of channel resources.
  • the target information includes gradient information obtained from the Nth model training and first target residual information, where the first target residual information is obtained before the gradient information Residual amount of unsent gradient information.
  • the method further includes: in a case where the metric value of the target information is greater than the threshold value, based on the target information and the quantized target information, obtain Second target residual information, where the second target residual information is an unsent residual amount in the target information.
  • the method further includes: if the metric value of the target information is less than or equal to the threshold value, determining third target residual information, the third target residual The information is the target information.
  • the method further includes: obtaining the quantization noise information according to the channel resource information, the communication cost information, and the target information, and the communication cost information is used to indicate the communication resource Communication cost weight, the communication resources include transmission power and/or transmission bandwidth.
  • determining the threshold value according to the quantization noise information and channel resource information includes: according to the quantization noise information, communication cost information, the channel resource information and the target Information to determine the transmission bandwidth and/or transmission power, the communication cost information is used to indicate the communication cost weight of the communication resource, the communication resource includes transmission power and/or transmission bandwidth; according to the quantization noise information and the communication resource, determine the gate limit.
  • the method further includes: receiving second information, where the second information is used to indicate the communication cost information.
  • an intelligent model training device including: a transceiver unit, configured to receive first information from a central node, the first information is used to indicate inter-feature constraint variables, and the inter-feature constraint variables are used to represent The constraint relationship between the different features; the processing unit is used to obtain the first gradient information by using the gradient reasoning model according to the constraint variables between the features, the model parameters of the first feature model and the first sample data, and the first The gradient information is the gradient information corresponding to the inter-feature constraint variables; the sending and receiving unit is also used to send the first gradient information to the central node.
  • an intelligent model training device including: a processing unit, configured to determine inter-feature constraint variables, and the inter-feature constraint variables are used to represent different constraint relationships between the features; a transceiver unit, configured to First information is sent to participating nodes in the plurality of participating node groups, the first information including the inter-feature constraint variable.
  • a smart communication device including: a processing unit, configured to determine a threshold value according to quantization noise information and channel resource information, wherein the quantization noise information is used to represent the quantization coding and decoding of target information loss amount.
  • a transceiver unit configured to send the quantized target information when the metric value of the target information is greater than the threshold value; not to send the quantized target information when the metric value of the target information is less than or equal to the threshold value After the target information.
  • a communication device including a processor.
  • the processor may implement the method in the above-mentioned first aspect and any possible implementation manner of the first aspect, or implement the method in the above-mentioned second aspect and any possible implementation manner of the second aspect, or implement the above-mentioned third aspect. aspect and the method in any possible implementation manner of the third aspect.
  • the communication device further includes a memory
  • the processor is coupled to the memory, and can be used to execute instructions in the memory, so as to implement the method in the above first aspect and any possible implementation manner of the first aspect, or implement The above second aspect and the method in any possible implementation manner of the second aspect, or realize the above third aspect and the method in any possible implementation manner of the third aspect.
  • the communication device further includes a communication interface, and the processor is coupled to the communication interface.
  • the communication interface may be a transceiver, a pin, a circuit, a bus, a module or other types of communication interfaces, without limitation.
  • the communication device is a communication device.
  • the communication interface may be a transceiver, or an input/output interface.
  • the communication device is a chip configured in a communication device.
  • the communication interface may be an input/output interface
  • the processor may be a logic circuit.
  • the transceiver may be a transceiver circuit.
  • the input/output interface may be an input/output circuit.
  • a processor including: an input circuit, an output circuit, and a processing circuit.
  • the processing circuit is configured to receive a signal through the input circuit and transmit a signal through the output circuit, so that the processor executes the method in the first aspect and any possible implementation manner of the first aspect.
  • the above-mentioned processor can be one or more chips
  • the input circuit can be an input pin
  • the output circuit can be an output pin
  • the processing circuit can be a transistor, a gate circuit, a flip-flop and various logic circuits, etc. .
  • the input signal received by the input circuit may be received and input by, for example but not limited to, the receiver
  • the output signal of the output circuit may be, for example but not limited to, output to the transmitter and transmitted by the transmitter
  • the circuit may be the same circuit, which is used as an input circuit and an output circuit respectively at different times.
  • the embodiment of the present application does not limit the specific implementation manners of the processor and various circuits.
  • a computer program product includes: a computer program (also called code, or instruction), when the computer program is executed, the computer executes the above-mentioned first aspect and the first aspect A method in any of the possible implementations, or a method in any of the possible implementations of the above-mentioned second aspect and the second aspect, or in any of the possible implementations of the above-mentioned third aspect and the third aspect Methods.
  • a computer program also called code, or instruction
  • a computer-readable storage medium stores a computer program (also referred to as code, or instruction) when it is run on a computer, so that the computer executes the above-mentioned first aspect and
  • a computer program also referred to as code, or instruction
  • the computer executes the above-mentioned first aspect and The method in any of the possible implementations of the first aspect, or the method in any of the possible implementations of the above-mentioned second aspect and the second aspect, or the realization of the above-mentioned third aspect and any of the possibilities of the third aspect method in the implementation.
  • a communication system including the aforementioned multiple participating nodes and at least one central node.
  • FIG. 1 is a schematic diagram of a communication system applicable to an embodiment of the present application
  • Fig. 2 is a schematic flowchart of the intelligent model training method provided by the embodiment of the present application.
  • Fig. 3 is another schematic flowchart of the intelligent model training method provided by the embodiment of the present application.
  • FIG. 4 is a schematic diagram of multiple participating nodes sharing transmission resources provided by an embodiment of the present application.
  • Fig. 5 is a schematic block diagram of an example of the communication device of the present application.
  • FIG. 6 is a schematic configuration diagram of an example of a communication device of the present application.
  • words such as “exemplary” or “for example” are used to represent examples, illustrations or illustrations, and any embodiment or design described as “exemplary” or “for example” should not be interpreted It is more preferred or more advantageous than other embodiments or design solutions.
  • the use of words such as “exemplary” or “for example” is intended to present related concepts in a specific manner for easy understanding.
  • At least one (species) can also be described as one (species) or multiple (species), multiple (species) can be two (species), three (species), four (species) ) or more (species), this application does not limit.
  • the technical solution of the embodiment of the present application can be applied to various communication systems, for example: long term evolution (long term evolution, LTE) system, LTE frequency division duplex (frequency division duplex, FDD) system, LTE time division duplex (time division duplex) , TDD), the fifth generation (5th generation, 5G) communication system, the future communication system (such as the sixth generation (6th generation, 6G) communication system), or a system where multiple communication systems are integrated, etc., the embodiments of the present application do not Do limited.
  • 5G can also be called new radio (new radio, NR).
  • FIG. 1 is a schematic diagram of a communication system applicable to an embodiment of the present application.
  • the communication system applicable to the embodiment of the present application may include at least one central node and multiple participating node groups.
  • the central node and multiple participating node groups perform federated learning of the intelligent model, and the intelligent model is determined by the inference target It consists of multiple feature models corresponding to multiple features, and the participating nodes in a participating node group train a feature model.
  • the training samples (or called training data, sample data) used by the participating nodes in the same participating node group to train the feature model are different, or the training samples of the participating nodes in the same participating node group belong to different sample spaces; different participating nodes
  • the sample features of the training samples used by the participating nodes in the group to train the feature model are different, or in other words, the training samples of the participating nodes in different participating node groups belong to different feature spaces.
  • the central node and participating nodes can decouple the intelligent model into multiple feature models based on the intelligent model training method provided by the embodiment of the present application, and train different feature models based on corresponding feature sample data by different participating node groups. Then the central node aggregates the gradient information fed back by the participating nodes after training, and updates the parameters of the intelligent model. Model training with feature diversity is realized, which can improve the performance of the model.
  • the central node provided in the embodiment of the present application may be a network device, for example, a server, a base station, and the like.
  • the central node may be a device deployed in the radio access network and capable of directly or indirectly communicating with participating nodes.
  • the participating node provided in the embodiment of the present application may be a device with a transceiver function, such as a terminal or a terminal device.
  • the participating node may be a sensor or a device with a data collection function.
  • Participating nodes can be deployed on land, including indoors, outdoors, handheld, and/or vehicle; can also be deployed on water (such as ships, etc.); participating nodes can also be deployed in the air (such as aircraft, balloons and satellites) wait).
  • a participating node may be a user equipment (user equipment, UE), and the UE includes a handheld device, a vehicle-mounted device, a wearable device or a computing device with a wireless communication function.
  • the UE may be a mobile phone (mobile phone), a tablet computer or a computer with a wireless transceiver function.
  • the terminal device can also be a virtual reality (virtual reality, VR) terminal device, an augmented reality (augmented reality, AR) terminal device, a wireless terminal in industrial control, a wireless terminal in unmanned driving, a wireless terminal in telemedicine, a smart A wireless terminal in a power grid, a wireless terminal in a smart city (smart city), and/or a wireless terminal in a smart home (smart home), etc.
  • VR virtual reality
  • AR augmented reality
  • a wireless terminal in industrial control a wireless terminal in unmanned driving
  • a wireless terminal in telemedicine a smart A wireless terminal in a power grid
  • a wireless terminal in a smart city smart city
  • smart home smart home
  • the technical solution provided by the embodiment of the present application can be used in various scenarios, such as smart retail, smart home, video surveillance (video surveillance), vehicle network (such as automatic driving, unmanned driving, etc.), and industrial wireless sensor network (industrial wireless sens or network, IWSN), etc. But the present application is not limited thereto.
  • the technical solution provided by this application can be applied to smart homes to provide personalized services for customers based on customer needs.
  • the central node can be a base station or a server, and the participating nodes can be client devices installed in each home.
  • the client device only provides the server with model training based on local data and synthesizes gradient information through the router, which can share training result information with the server while protecting the privacy of customer data.
  • the server obtains the aggregated gradient information of the synthetic gradient information provided by multiple client devices, determines the updated model parameters and notifies each client device, and continues the training of the intelligent model. After the model training is completed, the client device applies the trained model to the client. Provide personalized service.
  • the technical solutions provided by this application can be applied to industrial wireless sensor networks to realize industrial intelligence.
  • the central node can be a server, and the participating nodes can be multiple sensors in the factory (for example, mobile intelligent robots, etc.). Aggregation of gradient information Gradient information, determine the updated model parameters and notify each sensor, continue the training of the intelligent model, after completing the model training, the sensor application
  • the trained model is to perform factory tasks, for example, the sensor is a mobile intelligent robot, which can be based on The trained model obtains the moving route and completes the factory handling tasks, express sorting tasks, etc.
  • AI Artificial intelligence
  • Neural network As an important branch of artificial intelligence, it is a network structure that imitates the behavior characteristics of animal neural networks for information processing.
  • the structure of the neural network is composed of a large number of nodes (or neurons) connected to each other.
  • the neural network is based on a specific computing model, and achieves the purpose of processing information by learning and training the input information.
  • a neural network consists of an input layer, a hidden layer and an output layer.
  • the input layer is responsible for receiving input signals
  • the output layer is responsible for outputting the calculation results of the neural network
  • the hidden layer is responsible for complex functions such as feature expression.
  • the function of the hidden layer is characterized by a weight matrix and a corresponding activation function.
  • a deep neural network is generally a multi-layer structure. Increasing the depth and width of the neural network can improve its expressive ability and provide more powerful information extraction and abstract modeling capabilities for complex systems.
  • the depth of a neural network can be expressed as the number of layers of the neural network. For one layer, the width of the neural network can be expressed as the number of neurons included in the layer.
  • DNN can be constructed in a variety of ways, including but not limited to, recurrent neural network (recurrent neural network, RNN), convolutional neural network (convolutional neural network, CNN) and fully connected neural network.
  • RNN recurrent neural network
  • CNN convolutional neural network
  • fully connected neural network recurrent neural network
  • Training refers to the process of processing a model (or called a training model). During this process, the model learns to perform a specific task by optimizing parameters in the model, such as weights.
  • the embodiment of the present application is applicable to but not limited to one or more of the following training methods: supervised learning, unsupervised learning, reinforcement learning, and migration learning.
  • Supervised learning uses a set of training samples that have been correctly labeled for training. Among them, the correct label means that each sample has an expected output value.
  • unsupervised learning refers to a method that automatically classifies or clusters the input data given no pre-labeled training samples.
  • Inference refers to performing data processing using a trained model (a trained model may be called an inference model).
  • the actual data is input into the inference model for processing, and the corresponding inference results are obtained.
  • Reasoning can also be called prediction or decision-making, and the result of reasoning can also be called prediction result or decision result.
  • a distributed AI training method which puts the training process of the AI algorithm on multiple devices instead of aggregating it on one server, which can solve the time-consuming and massive communication overhead problems caused by collecting data during centralized AI training.
  • the central node sends the AI model to multiple participating nodes, and the participating nodes conduct AI model training based on their own data, and report the AI model trained by themselves to the central node in a gradient manner.
  • the central node aggregates the gradient information fed back by multiple participating nodes to obtain the parameters of the new AI model.
  • the central node can send the updated parameters of the AI model to multiple participating nodes, and the participating nodes perform training on the AI model again.
  • the participating nodes selected by the central node may be the same or different, which is not limited in this application.
  • the model in the training multiple participating nodes is the same as that of the server, and the equipment participating in the model training uses the same type of training data, which can be called a homogeneous network.
  • multiple image acquisition devices can use the image data collected by each to train the model. This method can increase the diversity of training data, but does not consider the feature diversity of the inference target.
  • the classification between cats and dogs can be more accurate when based on both images and audio of animals.
  • cameras, positioning systems, and inertial measurement units (IMU) are used to collect data of different categories (features) to estimate the position of the vehicle or distinguish traffic conditions within the road network, which can improve learning performance.
  • an autoencoder combined with a classification neural network is generally used for feature extraction and classification of audio signals, while a convolutional neural network is generally used when processing image data.
  • This application considers different types of participating nodes and central nodes to form a heterogeneous network (heterogeneous network) for federated learning. Different types of participating nodes respectively train sub-models corresponding to different characteristics of the inference target in the intelligent model, which can improve the intelligence after training. model performance. However, there is an association relationship (or a coupling relationship) between different features.
  • this application proposes that the central node provide different types of participating nodes to represent the association relationship between features.
  • the inter-feature constraint variables to realize the decoupling between the features of the model, so that different types of participating nodes can train different feature models based on the inter-feature constraint variables and local feature data.
  • Different types of participating nodes follow the model training process.
  • the update of the parameters calculates the gradient of the inter-feature constraint variable and feeds it back to the central node, so that the central node can update the inter-feature constraint variable based on the gradient of the inter-feature constraint variable fed back by different types of participating nodes. It not only realizes the diversity of training data in federated learning, but also realizes model training for different characteristics. So as to achieve the effect of improving the performance of the trained model.
  • Fig. 2 is a schematic flow chart of the intelligent model training method provided by the embodiment of the present application.
  • the central node and multiple participating node groups jointly execute the training of the intelligent model.
  • the intelligent model is composed of multiple feature models corresponding to various features of the inference target.
  • the participating nodes of a participating node group can collect training data corresponding to a feature, based on The training data corresponding to the feature trains the feature model corresponding to the feature, and the intelligent model training method shown in FIG. 2 is executed by the first participating node in the multiple participating node groups.
  • the first participating node belongs to the first participating node group, and the model trained by the participating nodes in the first participating node group is the first feature model.
  • the intelligent model jointly trained by the central node and the multiple participating node groups includes M feature models, the M feature models are respectively trained by the M participating node groups, and a participating node in a participating node group trains a feature model.
  • the first participating node group is the m-th type of participating node group among the M participating node groups, or the participating node group corresponding to the characteristic model m, that is, the first characteristic model is the characteristic model m or called the m-th type characteristic model.
  • the first participating node may be the kth participating node in the first participating node group. That is, the first participating node can be called the kth participating node in the mth type participating node group.
  • FIG. 2 is a schematic flow chart of an intelligent model training method provided by an embodiment of the present application. The method includes but is not limited to the following steps:
  • the central node sends first information to a first participating node, where the first information includes an inter-feature constraint variable, and the inter-feature constraint variable is used to represent a constraint relationship between different features.
  • the first participating node receives the first information from the central node, and determines inter-feature constraint variables based on the first information.
  • the first information is broadcast information.
  • the participating nodes in each participating node group can receive the first information, and determine inter-feature constraint variables based on the first information.
  • the first participating node uses the gradient inference model to obtain the first gradient information according to the inter-feature constraint variables, the model parameters of the first feature model, and the first sample data, and the first gradient information is the gradient corresponding to the inter-feature constraint variables information.
  • the central node sends third information to the participating nodes in the first participating node group, where the third information is used to indicate the updated model parameters of the first feature model.
  • the updated model parameters are obtained by the central node based on model gradient information after model training fed back by participating nodes in the first participating node group.
  • the first participating node updates the parameters of the first feature model based on the model parameter information to obtain the first feature model after parameter updating.
  • the first participating node may use the model training method provided in FIG. 3 to train the first feature model with updated parameters. For details, reference may be made to the description of the embodiment shown in FIG. 3 below.
  • the participating nodes can use the gradient inference model to infer the gradient information corresponding to the inter-feature constraint variable based on the inter-feature constraint variable, model parameters of the feature model, and local sample data. So that the central node can obtain the gradient information corresponding to the inter-feature constraint variables fed back by the participating nodes in one or more participating node groups, and the central node can update the inter-feature constraint variables based on the acquired gradient information corresponding to the inter-feature constraint variables.
  • model parameters of the first feature model may be updated model parameters indicated by the third information received from the central node last time.
  • each participating node participating in the model training receives the inter-feature constraint variables, it infers the gradient information corresponding to the inter-feature constraint variables, and the central node updates the gradient information corresponding to the inter-feature constraint variables fed back by each participating node. Constraining variables between features.
  • some participating nodes participating in the model training may receive the inter-feature constraint variables and infer the gradient information corresponding to the inter-feature constraint variables, and the central node is based on the partial The gradient information corresponding to the inter-feature constraint variable fed back by the participating nodes updates the inter-feature constraint variable.
  • the first participating node may determine whether to infer gradient information corresponding to inter-feature constraint variables based on inter-feature constraint variables, model parameters of the first feature model, and local sample data in the following manner.
  • Method 1 The central node triggers some or all of the participating nodes in multiple participating node groups to infer (or calculate) the gradient information of the inter-feature constraint variables based on the inter-feature constraint variables, model parameters, and local sample data of the participating nodes.
  • the central node can select all or part of the participating nodes participating in the model training to infer the gradient information corresponding to the constraint variables between features. Since the feature models trained by the participating nodes in the same participating node group are the same, the central node can select one or more participating nodes in each participating node group to infer the gradient information corresponding to the constraint variables between features. However, the present application is not limited thereto, or the central node may select one or more participating nodes in a partial participating node group to infer gradient information corresponding to constraint variables between features based on the relationship between different features.
  • the central node may send a first identification set, which includes the identification of sample data of inter-feature constraint variables selected by the central node.
  • the first participating node After receiving the first identification set, the first participating node determines whether to infer the gradient information corresponding to the inter-feature constraint variables based on whether the sample data in the first participating node's sample data set corresponds to the identification in the first identification set.
  • the first participating node infers the gradient information corresponding to the inter-feature constraint variable based on the inter-feature constraint variable, the model parameters of the first feature model, and the first sample data.
  • the first participating node does not infer the gradient information corresponding to the constraint variable between features.
  • the central node may send inference indication information to the participating nodes, instructing some or all of the participating nodes to infer gradient information corresponding to constraint variables between features.
  • the central node may send inference instruction information to participating nodes that need to infer gradient information corresponding to constraint variables between features, and participating nodes that receive the inference instruction information perform inference on gradient information corresponding to constraint variables between features.
  • the central node may broadcast the inference instruction information, the inference instruction information includes the identification of one or more participating nodes, and the identification contained in the inference instruction information corresponds to the gradient information corresponding to the constraint variables between the inference features of the participating nodes.
  • the central node and the participating nodes are configured with the same sample data selector, and the central node and the participating nodes can determine the participating nodes that execute the gradient information corresponding to the constraint variables between the inference features based on the sample data selector.
  • the sample data selector may generate at least one identifier of the sample data, and the sample data corresponding to the at least one identifier is used for the gradient information corresponding to the constraint between the features of the current round of reasoning, if the sample data set of the first participating node includes the at least one The sample data corresponding to the identification in the identification (such as the first sample data corresponding to the first identification), then the first participating node executes the gradient information corresponding to the constraint variable between the inference features; if the sample data set of the first participating node does not include the If at least one of the identifiers corresponds to the sample data, the first participating node does not perform inference on the gradient information corresponding to the constraint variables between the features. Other participating nodes use the same method to determine whether to perform inference on the gradient information corresponding to the constraint variables between features.
  • the first participating node determines to perform inference on the gradient information corresponding to the inter-feature constraint variable, the first participating node infers the gradient information corresponding to the inter-feature constraint variable according to the inter-feature constraint variable, the model parameters of the first feature model, and the first sample data .
  • the current round (such as in the t-th inference) is used to infer the set of sample data identifiers corresponding to the gradient information of the constraint variables between features is I t , and the identifier i of the first sample data of the first participating node belongs to I t , namely i ⁇ I t , then the first participating node according to the inter-feature constraint variable corresponding to the first sample data (i.e. sample data i)
  • Model parameters for the first eigenmodel and the first sample data use the gradient inference model to reason about the constraint variables between features Corresponding gradient information, get the first gradient information
  • m the first feature model
  • i the first sample data
  • t-time inference gradient information is the updated model parameter obtained by the first participating node from the central node
  • b t is the bias parameter of the t-th training
  • auxiliary variable corresponding to the i-th training data in the t-th training is the bias parameter b t and the auxiliary variable from the central node.
  • the bias parameter b t and the auxiliary variable from the central node. is a function used to calculate the gradient information corresponding to the model parameters.
  • the first participating node After the first participating node obtains the gradient information corresponding to the inter-feature constraint variables, it may send the quantized first target gradient information to the central node.
  • the quantized first target gradient information can be written as
  • the first target gradient information is the above-mentioned first gradient information.
  • the first participating node After the first participating node obtains the first gradient information, the first participating node quantizes and encodes the first gradient information to obtain the quantized first gradient information The first participating node sends the quantized first gradient information to the central node. So that the central node can obtain the updated inter-feature constraint variable ⁇ t+1 based on the gradient information corresponding to the inter-feature constraint variable fed back by participating nodes.
  • the first target gradient information includes first gradient information and first residual gradient information, and the first residual gradient information is used to indicate that the first participating node did not send The residual amount of the gradient information corresponding to the inter-feature constraint variables of the central node.
  • the first target gradient information It can be expressed as:
  • ⁇ t ⁇ t-1 / ⁇ t
  • ⁇ t is the update step size of the model parameters in the t-th model training, that is, the learning rate of the t-th model training
  • ⁇ t-1 is the t-1th model
  • the update step size of the model parameters during training, that is, the learning rate of the t-1th model training is the first residual gradient information, that is, the residual amount in the gradient information corresponding to the inter-feature constraint variable that has not been sent to the central node before the t-th inference.
  • the first participating node sends the quantized first target gradient information to the central node, then the first participating node can update the residual gradient information, that is, the first participating node based on the first target gradient information and the quantized first target gradient information Obtain the second residual gradient information, the second residual gradient information is the residual amount of the gradient information that has not been sent to the central node before the gradient information corresponding to the constraint variable between the features of the t+1 inference of the first participating node
  • the second residual gradient information is used to represent the residual amount not sent to the central node in the first target gradient information.
  • the second residual gradient information is used as the residual amount of the gradient information corresponding to the inter-feature constraint variable not sent to the central node before the t+1th model training. That is to say, the first participating node sends the quantized first target gradient information to the central node, and the residual amount of the gradient information is updated to the residual amount in the first target gradient information that is not sent to the central node due to quantization and encoding.
  • the first participating node may determine whether to send the quantized target gradient information to the central node based on the scheduling strategy.
  • the above-mentioned first participating node sending the quantized first target gradient information to the central node is based on a scheduling strategy. After determining to send the quantized target gradient information to the central node, the first participating node sends the quantized first target gradient information to the central node.
  • the first participating node determines not to send the quantized target gradient information to the central node based on the scheduling policy. Then the first participating node determines the third residual gradient information.
  • the third residual gradient information is the first target gradient information. Then the third residual gradient information is the residual amount of the gradient information not sent to the central node before the first participating node performs the t+1th model training to obtain the gradient information
  • the first participating node determines not to send the first target gradient information to the central node based on the scheduling policy, the residual amount of the gradient information Including the first gradient information obtained from the tth inference And the amount of residuals not sent to the central node in the gradient information corresponding to the constraint variables between features obtained by reasoning before the tth legend
  • the scheduling policy may be notified by the central node to the first participating node.
  • the central node sends instruction information A to the first participating node
  • the instruction information A can indicate that the first participating node infers the gradient information corresponding to the constraint variable between features for the tth time and then sends the gradient information obtained after training to the central node, then the first participating node A participating node sends the quantized first target gradient information to the central node after the t-time reasoning, and calculates the residual amount of the gradient information after sending the quantized first target gradient information (ie, the second residual gradient information), is the second residual gradient information.
  • the indication information A may indicate that the first participating node does not send the obtained gradient information to the central node after the t-time model inference, then the first participating node does not send the obtained gradient information to the central node after the t-time inference obtains the first gradient information sending the gradient information, and adding the first gradient information to the residual gradient information to obtain the third residual gradient information, is the third residual gradient information.
  • the scheduling strategy may be determined by the first participating node based on quantization noise information, channel resource information, and first target gradient information.
  • the first participating node determines the scheduling strategy based on the quantization noise information, channel resource information, and target gradient information (such as the first target gradient information in this example) is described in detail in Embodiment 2.
  • the first participating node does not contain the sample data corresponding to the identifier in the sample data identifier set I t , the first participating node does not perform inference on the gradient information of the constraint variables between features.
  • the first participating node updates the residual gradient information to obtain
  • the central node receives target gradient information corresponding to inter-feature constraint variables from multiple participating nodes.
  • the first participating node may send the inferred quantized first target gradient information to the central node.
  • the central node receives the quantized first target gradient information from the first participating node
  • the first target gradient information is obtained after quantization and decoding, and the central node receives the corresponding
  • the target gradient information updates the inter-feature constraint variables corresponding to each sample data.
  • N b is the number of sample data
  • the central node can select N b sample data each time.
  • the participating nodes that store the selected sample data reason the gradient information of constraint variables between features based on the model parameters of the current feature model and the sample data, and feed back to the central node. Compared with the method in which each participating node participates in feeding back the gradient information of the constraint variables between features, resource overhead and implementation complexity can be reduced.
  • the central node also updates the bias parameter b t+1 based on the inter-feature constraint variables calculated in each round:
  • r( ) is a block-separable regularization function.
  • the central node also updates the auxiliary variables based on the inter-feature constraint variables obtained in each round of update
  • l i ( ) is the sampling loss function for the i-th data sample.
  • the central node After the central node obtains the updated inter-feature constraint variables, the updated bias parameter b t+1 , auxiliary variable
  • the inter-feature constraint variables are sent to participating nodes so that the participating nodes train feature models based on bias parameters, auxiliary variables, and inter-feature constraint variables.
  • different types of participating nodes calculate the gradient information corresponding to the constraint variables between features as the model parameters are updated during the model training process, and feed back to the central node.
  • the central node infers the features based on the participating nodes corresponding to different feature models
  • the gradient information of the inter-constraint variables update the inter-feature constraint variables that represent the relationship between features, so as to realize the decoupling between the features of the model, so that different types of participating nodes can train different feature models based on the inter-feature constraint variables and local feature data , so that the central node can update the inter-feature constraint variables based on the gradient of the inter-feature constraint variables fed back by different types of participating nodes. It realizes the diversity of training data and feature diversity of federated learning without transmitting the original data. That is, the original data leakage is avoided, and the performance of the trained model can be improved.
  • FIG. 3 is another schematic flowchart of the intelligent model training method provided by the embodiment of the present application. The method includes but is not limited to the following steps:
  • the central node sends first information to a first participating node, where the first information includes an inter-feature constraint variable, and the inter-feature constraint variable is used to represent a constraint relationship between different features.
  • the first participating node receives the first information from the central node, and determines inter-feature constraint variables based on the first information.
  • the first participating node trains the first feature model based on the inter-feature constraint variables and model training data to obtain second gradient information.
  • the first participating node After the first participating node obtains the inter-feature constraint variable, based on the inter-feature constraint variable and model training data, it executes the t-th model training of the first feature model.
  • the central node also sends third information to the participating nodes in the first participating node group, where the third information is used to indicate the updated model parameters of the first feature model.
  • the updated model parameters are obtained by the central node based on the model gradient information after the t-1th model training fed back by the participating nodes in the first participating node group.
  • the first participating node updates the parameters of the first intelligent model based on the model parameter information to obtain the updated first intelligent model.
  • the first participating node then executes the t-th model training, and trains the first intelligent model with updated parameters based on the inter-feature constraint variables and model training data. After the first participating node trains the first feature model for the tth time, the second gradient information is obtained.
  • the second gradient information can be written as Indicates the gradient information obtained by the k-th participating node (ie, the first participating node) in the m-type participating node group after the t-time training.
  • the second gradient information It can be expressed as:
  • a collection of index values of the training data selected for the first participating node in the t-time training. is the updated model parameter obtained by the first participating node from the central node, is the constraint variable between features, b t is the bias parameter of the t-th training, is the auxiliary variable corresponding to the i-th training data in the t-th training.
  • the bias parameter b t and the auxiliary variable is obtained by the first participating node from the central node. It is a function used to calculate the gradient information corresponding to the model parameters.
  • the second gradient information is used to update the model parameters of the first feature model. Specifically, the first participating node will feed back the gradient information to the central node, and the central node will based on the first participation The gradient information fed back by participating nodes in the node group determines the updated model parameters.
  • the first participating node obtains a gradient information after training the first feature model based on the training data corresponding to each index value in the index value set of the training data, and accumulates the gradient information obtained after training the model based on each training data.
  • An intelligent model is one of the M characteristic models of the intelligent model. Therefore, after dividing the accumulated gradient information by M, the second gradient information after the first participating node trains the first intelligent model for the t-th time is obtained But the present application is not limited thereto.
  • the first participating node After the first participating node executes the training of the first intelligent model for the tth time, it may send the quantized second target gradient information to the central node.
  • the quantized second target gradient information can be written as
  • the second target gradient information is the above-mentioned second gradient information.
  • the first participating node performs the t-time model training on the first intelligent model to obtain the second gradient information, and the first participating node quantizes and encodes the second gradient information to obtain the quantized second gradient information Send the quantized second gradient information to the central node. So that the central node obtains the updated parameters of the first feature model based on the gradient information obtained after the t-th training fed back by the participating nodes in the first participating node group.
  • the second target gradient information includes second gradient information and fourth residual gradient information, and the fourth residual gradient information is used to indicate that the first participating node did not send The residual amount of gradient information to the central node.
  • the second target gradient information It can be expressed as:
  • ⁇ t ⁇ t-1 / ⁇ t
  • ⁇ t is the update step size of the model parameters in the t-th model training, that is, the learning rate of the t-th model training
  • ⁇ t-1 is the t-1th model
  • the update step size of the model parameters during training, that is, the learning rate of the t-1th model training is the fourth residual gradient information, that is, the residual amount of the gradient information not sent to the central node in the gradient information obtained after the model training before the t-th model training.
  • the first participating node After the first participating node sends the quantized second target gradient information to the central node, the first participating node can update the residual gradient information, that is, the first participating node based on the second target gradient information and the quantized second target gradient information Obtaining the fifth residual gradient information is the residual amount of the gradient information not sent to the central node before the first participating node performs the t+1th model training to obtain the gradient information
  • the fifth residual gradient information is used to represent the residual amount not sent to the central node in the second target gradient information.
  • the fifth residual gradient information is used as a residual amount of the gradient information not sent to the central node among the gradient information obtained after the model training before the t+1th model training. That is to say, the first participating node sends the quantized second target gradient information to the central node, and the residual amount of the gradient information is the residual amount in the second target gradient information that is not sent to the central node due to quantization and encoding.
  • the first participating node may determine whether to send the quantized target gradient information to the central node based on the scheduling policy.
  • the above-mentioned first participating node sending the quantized second target gradient information to the central node is based on a scheduling strategy. After determining to send the quantized target gradient information to the central node, the first participating node sends the quantized second target gradient information to the central node.
  • Two target gradient information Two target gradient information
  • the first participating node determines not to send the quantized target gradient information to the central node based on the scheduling policy. Then the first participating node determines sixth residual gradient information.
  • the sixth residual gradient information is the second target gradient information. Then the sixth residual gradient information is the residual amount of the gradient information not sent to the central node before the first participating node performs the t+1th model training to obtain the gradient information
  • the first participating node determines not to send the second target gradient information to the central node based on the scheduling policy, the residual amount of the gradient information Including the second gradient information obtained from the t-th model training And the amount of residual that was not sent to the central node before the t-th model training obtained the second gradient information
  • the scheduling policy may be notified by the central node to the first participating node.
  • the central node sends instruction information A to the first participating node
  • the instruction information A can indicate that the first participating node sends the gradient information obtained after the training to the central node after the tth model training, then the first participating node at the tth time After training, send the quantized second target gradient information to the central node, and calculate the residual amount of the gradient information after sending the quantized second target gradient information (ie, the fifth residual gradient information), is the fifth residual gradient information.
  • the indication information A may indicate that the first participating node does not send the gradient information obtained after training to the central node after the t-time model training, then the first participating node does not send the second gradient information to the central node after the t-time training obtains the second gradient information
  • the central node sends gradient information, and accumulates the second gradient information into the residual gradient information to obtain sixth residual gradient information, is the sixth residual gradient information.
  • the scheduling strategy may be determined by the first participating node based on quantization noise information, channel resource information, and second target gradient information.
  • the specific implementation manner in which the first participating node determines the scheduling strategy based on the quantization noise information, channel resource information, and target gradient information is described in detail in Embodiment 2.
  • the central node receives the quantized target gradient information sent after the participating nodes in the first participating node group perform the t-th model training,
  • N b is the number of data samples
  • the embodiment of this application provides a method for participating nodes based on channel resource information and signals to be transmitted How to determine the scheduling policy.
  • the signal to be transmitted It can be the above-mentioned second target gradient information
  • the scheduling strategy is used by the first participating node to determine whether to send the second target gradient information to the central node.
  • the signal to be transmitted It can be the above-mentioned first target gradient information
  • the scheduling strategy is used by the first participating node to determine whether to send the first target gradient information to the central node.
  • the present application is not limited thereto, and the scheduling strategy may also be used in determining whether to transmit other signals to be transmitted.
  • the channel resource information includes channel state information and/or transmission time information.
  • the channel state information is state information h k of the channel between the first participating node and the central node
  • the transmission time information is the duration T 0 for the first participating node to occupy channel resources to transmit gradient information.
  • the first participating node may determine quantization noise information based on the channel resource information and the target information, where the quantization noise information is used to characterize the loss amount of the quantization coding and decoding of the target information.
  • the target information may be the above-mentioned first target gradient information, Or it can be the above-mentioned second target gradient information Alternatively, the target information may also be other information, which is not limited in this application.
  • the first participating node uses the quantization coding module to send the target information Perform quantization and encoding to obtain quantized target information
  • the first participating node sends the quantized target information to the central node through the transceiver module of the first participating node
  • the quantized target information is received by the central node through the transceiver module of the central node after being propagated through the channel, and the quantized target information received by the central node can be written as The central node through the quantization decoder pair
  • the recovered target information is obtained after quantization and decoding recovered Compared to The amount of loss is the quantization noise.
  • the first participating node may quantize and encode the target information and then perform quantization and decoding to obtain quantization noise information, where the quantization noise information is a difference between the target information and a signal obtained by quantizing, encoding and decoding the target information.
  • the first participating node may estimate quantization noise information of the target information based on the channel resource information and the target information.
  • the first participating node can estimate the loss amount of the quantized target information after channel transmission and quantized decoding of the central node, to obtain quantization noise information.
  • the channel resource information may include channel state information and/or channel occupation time information (that is, transmission time information).
  • the first participating node obtains quantization noise information of the target information according to the channel resource information, communication cost information, and target information.
  • the communication cost information is used to indicate the communication cost weight of the communication resource, where the communication resource may include transmission power and/or transmission bandwidth.
  • the central node may send second information to the first participating node, where the second information is used to indicate the communication cost information.
  • the first participating node receives the second information from the central node, and determines the communication cost information according to the second information.
  • the communication cost information may indicate the cost weight of transmission power ⁇ p and the cost weight of transmission bandwidth ⁇ B
  • the first participating node obtains the transmission communication cost information through the second information from the central node, and may be based on the transmission power cost weight ⁇ p
  • channel resource information namely channel state information h k
  • this parameter q k satisfies:
  • the first participating node can also solve the following formula to obtain the parameters
  • the first participating node according to the parameter and target information
  • the quantization noise information is obtained, and the quantization noise information is the covariance matrix of the quantization noise of the target information.
  • V k is the target information The covariance matrix of .
  • the first participating node can respectively determine the transmission bandwidth according to the quantization noise information, communication cost information, channel resource information and target information satisfy:
  • I is the identity matrix
  • det(A) represents the determinant of matrix A
  • log(x) represents the logarithm of x.
  • the first participating node can respectively determine the transmission power according to the quantization noise information, communication cost information, channel resource information and target information satisfy:
  • the first participating node gets the transmission bandwidth and transmission power After that, according to the transmission bandwidth transmission power Quantization noise information of target information And the communication cost, determine the threshold value the threshold satisfy:
  • tr(A) represents the trace of matrix A.
  • matrix A the sum of each element on the main diagonal (diagonal from upper left to lower right) of an n ⁇ n matrix A is called matrix A
  • the trace (or the number of traces) is generally recorded as tr(A).
  • the first participating node can compare target information The size of the metric value and the threshold value determine whether to send the quantized target information to the central node.
  • the metric value of the target information may be the norm of the target information
  • the norm of the target information is the l 2 norm of the target information; if the target information is a matrix, then the norm of the target information is the Frobeius norm of the target information.
  • the first participating node When the metric value of the target information is greater than the threshold like When , the first participating node sends the quantized target information to the central node, that is, the first participating node is in an active state; otherwise, when the metric value of the target information is less than or equal to the threshold value like When , the first participating node does not send the target information to the central node, that is, the first participating node is in an inactive state.
  • the participating nodes determine whether to send the target information to the central node based on the target information to be transmitted and the communication cost information broadcast by the central node, taking into account the quantified encoding and decoding loss of the target information, and realize the autonomous control of the channel environment by the participating nodes.
  • Adaptive scheduling can improve the reliability of target signal transmission and the utilization rate of channel resources.
  • the participating nodes may determine whether to send quantized target information to the central node based on the scheduling policy provided in the second embodiment.
  • the first participating node determines the first threshold value according to the first quantization noise information and channel resource information, wherein the first quantization noise information is used to represent the first target gradient information Quantization codec loss amount. As described above with reference to the first participating node can be based on the parameter and first target information The first quantization noise information is obtained. If the metric value of the first target gradient information is greater than the first threshold value, the first participating node sends the quantized first target gradient information to the central node; if the metric value of the first target gradient information is less than or equal to the first threshold value , the first participating node does not send the quantized first target gradient information to the central node.
  • the first participating node determines the second threshold value according to the second quantization noise information and channel resource information, wherein the second quantization noise information is used to represent The amount of loss for the quantization codec.
  • the second quantization noise information is obtained. If the metric value of the second target gradient information is greater than the second threshold value, the first participating node sends the quantized second target gradient information to the central node; if the metric value of the second target gradient information is less than or equal to the second threshold value , the first participating node does not send the quantized second target gradient information to the central node.
  • the involved nodes may perform some or all of the steps or operations related to the nodes. These steps or operations are only examples, and the present application may also perform other operations or modifications of various operations. In addition, various steps may be performed in a different order than presented herein, and it is possible that not all operations described herein are performed.
  • each network element may include a hardware structure and/or a software module, and implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above-mentioned functions is executed in the form of a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
  • Fig. 5 is a schematic block diagram of an intelligent model training device provided by an embodiment of the present application.
  • the training device 500 of the smart model may include a processing unit 510 and a transceiver unit 520.
  • the intelligent model training device 500 may correspond to the participating nodes in the above method embodiments, or a chip configured (or used) in the participating nodes, or other devices that can implement the participating nodes to execute The device, module, circuit or unit of the method.
  • the intelligent model training device 500 may correspond to the participating nodes in the method of the embodiment of the present application, and the intelligent model training device 500 may be included in the first device for performing the methods shown in Fig. 2 and Fig. 3 of each unit. Moreover, each unit and other operations and/or functions mentioned above in the smart model training device 500 are for realizing the corresponding processes of the methods shown in FIG. 2 and FIG. 3 .
  • the transceiver unit 520 is used to receive the first information from the central node, and the first information is used for Indicates the inter-feature constraint variable, and the inter-feature constraint variable is used to characterize the constraint relationship between different features.
  • the processing unit 510 is configured to use the gradient inference model to obtain the first gradient information according to the inter-feature constraint variables, the model parameters of the first feature model, and the first sample data, and the first gradient information is the feature Gradient information corresponding to constraint variables between.
  • the processing unit 510 is further configured to send the first gradient information to the central node.
  • the processing unit 510 is further configured to determine a threshold value according to quantization noise information and channel resource information, where the quantization noise information is used to represent a loss amount of quantization coding and decoding of target information.
  • the transceiving unit 520 is further configured to send the quantized target information when the metric value of the target information is greater than the threshold; if the metric value of the target information is less than or equal to the threshold In the case of a value, the quantized target information is not sent.
  • the transceiver unit 520 in the smart model training device 500 may be an input/output interface or circuit of the chip
  • the processing unit 510 in the intelligent model training device 500 may be a logic circuit in a chip.
  • the intelligent model training device 500 may correspond to the central node in the above method embodiment, for example, a chip configured (or used) in the central node, or other devices capable of implementing A device, module, circuit or unit, etc. of a method executed by the central node.
  • the training device 500 of the intelligent model may correspond to the central node in the methods shown in Fig. 2 and Fig. 3, and the training device 500 of the smart model may include a central Each unit of the node. Moreover, each unit and other operations and/or functions mentioned above in the smart model training device 500 are for realizing the corresponding processes of the methods shown in FIG. 2 and FIG. 3 .
  • the processing unit 510 is used to determine inter-feature constraint variables, and the inter-feature constraint variables are used to represent the different A constraint relationship between features: the transceiving unit 520 is configured to send first information to participating nodes in the plurality of participating node groups, where the first information includes the inter-feature constraint variable.
  • the processing unit 510 is further configured to determine a threshold value according to quantization noise information and channel resource information, where the quantization noise information is used to represent a loss amount of quantization coding and decoding of target information.
  • the transceiving unit 520 is further configured to send the quantized target information when the metric value of the target information is greater than the threshold; if the metric value of the target information is less than or equal to the threshold In the case of a value, the quantized target information is not sent.
  • the transceiver unit 520 in the smart model training device 500 may be an input/output interface or circuit of the chip
  • the processing unit 510 in the intelligent model training device 500 may be a logic circuit in a chip.
  • the smart model training device 500 may also include a storage unit 530, which may be used to store instructions or data, and the processing unit 510 may execute the instructions or data stored in the storage unit, so that the smart model The training device implements the corresponding operation.
  • the transceiver unit 520 in the training device 500 of the intelligent model can be realized through a communication interface (such as a transceiver or an input/output interface), for example, it can correspond to the transceiver 610 in the communication device 600 shown in FIG. 6 .
  • the processing unit 510 in the intelligent model training apparatus 500 may be implemented by at least one processor, for example, may correspond to the processor 620 in the communication device 600 shown in FIG. 6 .
  • the processing unit 510 in the intelligent model training device 500 can also be realized by at least one logic circuit.
  • the storage unit 530 in the smart model training apparatus 500 may correspond to the memory in the communication device 600 shown in FIG. 6 .
  • FIG. 6 is a schematic structural diagram of a terminal device 600 provided in an embodiment of the present application.
  • the communication device 600 may correspond to the participating node in the foregoing method embodiment.
  • the participating node 600 includes a processor 620 and a transceiver 610 .
  • the participating node 600 also includes a memory.
  • the processor 620, the transceiver 610, and the memory may communicate with each other through an internal connection path, and transmit control and/or data signals.
  • the memory is used to store computer programs, and the processor 620 is used to execute the computer programs in the memory to control the transceiver 610 to send and receive signals.
  • the communication device 600 shown in FIG. 6 can implement the processes involving participating nodes in the method embodiments shown in FIG. 2 and FIG. 3 .
  • the operations and/or functions of the various modules in the participating node 600 are to implement the corresponding processes in the foregoing method embodiments.
  • the communication device 600 may correspond to the central node in the foregoing method embodiments.
  • the central node 600 includes a processor 620 and a transceiver 610 .
  • the central node 600 also includes a memory.
  • the processor 620, the transceiver 610, and the memory may communicate with each other through an internal connection path, and transmit control and/or data signals.
  • the memory is used to store computer programs, and the processor 620 is used to execute the computer programs in the memory to control the transceiver 610 to send and receive signals.
  • the communication device 600 shown in FIG. 6 can implement the processes involving the central node in the method embodiments shown in FIGS. 2 and 3 .
  • the operations and/or functions of the various modules in the central node 600 are respectively for realizing the corresponding processes in the foregoing method embodiments.
  • the processor 620 and the memory may be combined into a processing device, and the processor 620 is configured to execute the program codes stored in the memory to realize the above functions.
  • the memory may also be integrated in the processor 620, or be independent of the processor 620.
  • the processor 620 may correspond to the processing unit in FIG. 5 .
  • the above-mentioned transceiver 610 may correspond to the transceiver unit in FIG. 5 .
  • the transceiver 610 may include a receiver (or called a receiver, a receiving circuit) and a transmitter (or called a transmitter, a transmitting circuit). Among them, the receiver is used to receive signals, and the transmitter is used to transmit signals.
  • the communication device 600 shown in FIG. 6 can implement the process involving the terminal device in the method embodiments shown in FIG. 2 and FIG. 3 .
  • the operations and/or functions of the various modules in the terminal device 600 are respectively for implementing the corresponding processes in the foregoing method embodiments.
  • the embodiment of the present application also provides a processing device, including a processor and a (communication) interface; the processor is configured to execute the method in any one of the above method embodiments.
  • the above processing device may be one or more chips.
  • the processing device may be a field programmable gate array (field programmable gate array, FPGA), an application specific integrated circuit (ASIC), or a system chip (system on chip, SoC). It can be a central processor unit (CPU), a network processor (network processor, NP), a digital signal processing circuit (digital signal processor, DSP), or a microcontroller (micro controller unit) , MCU), can also be a programmable controller (programmable logic device, PLD) or other integrated chips.
  • CPU central processor unit
  • NP network processor
  • DSP digital signal processor
  • microcontroller micro controller unit
  • PLD programmable logic device
  • the present application also provides a computer program product, the computer program product comprising: computer program code, when the computer program code is executed by one or more processors, the The device executes the method in the embodiment shown in FIG. 2 and FIG. 3 .
  • the technical solutions provided by the embodiments of the present application may be fully or partially implemented by software, hardware, firmware or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on the computer, all or part of the processes or functions according to the embodiments of the present invention will be generated.
  • the computer may be a general computer, a dedicated computer, a computer network, a network device, a terminal device, a core network device, a machine learning device or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center integrated with one or more available media.
  • the available medium may be a magnetic medium (for example, a floppy disk, a hard disk, or a magnetic tape), an optical medium (for example, a digital video disc (digital video disc, DVD)), or a semiconductor medium.
  • the present application also provides a computer-readable storage medium, the computer-readable storage medium stores program code, and when the program code is run by one or more processors, the processing includes the The device of the device executes the method in the embodiment shown in FIG. 2 and FIG. 3 .
  • the present application further provides a system, which includes the foregoing one or more first devices.
  • the system may further include the aforementioned one or more second devices.
  • the first device may be a network device or a terminal device
  • the second device may be a device that communicates with the first device through a wireless link.
  • the disclosed systems, devices and methods may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Neurology (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请提供了一种智能模型的训练方法和装置。该方法包括:中心节点与多个参与节点组联合执行智能模型的训练,该智能模型由推理目标的多种特征对应的多个特征模型组成,一个参与节点组中的参与节点训练一个特征模型,该训练方法由一个参与节点执行,包括:接收来自中心节点的用于指示特征间约束变量的第一信息,该特征间约束变量用于表征不同该特征之间的约束关系。根据该特征间约束变量、第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到特征间约束变量对应的梯度信息,并发送给中心节点。实现了联邦学习基于不同特征的分布式训练,能够提升训练得到的模型的性能。

Description

智能模型的训练方法和装置
本申请要求于2021年12月22日提交中国国家知识产权局、申请号为202111582987.9、申请名称为“智能模型的训练方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,并且更具体地,涉及一种智能模型的训练方法和装置。
背景技术
人工智能(artificial intelligence,AI)是未来无线通信网络(如物联网)中的一类非常重要的应用。其中,联邦学习(federated learning,FL)是一种分布式智能模型训练方法,服务器为多个设备提供模型参数,由多个设备各自基于各自的数据集执行智能模型训练后将损失函数的梯度信息反馈给服务器,服务器基于多个设备反馈的梯度信息更新的模型参数。
参与模型训练的多个设备中的模型与服务器的模型相同,参与模型训练的设备采用的训练数据的类型相同,如图像识别模型的训练中可以由多个图像采集设备使用各自采集到的图像数据对模型进行训练,这种方式可以提高训练数据的多样性,但未考虑推理目标的特征多样性,如何在联邦学习中实现特征多样性的模型训练以提升模型性能,目前还缺乏有效的解决方案。
发明内容
本申请提供了一种智能模型的训练方法和装置,实现了联邦学习基于不同特征的分布式训练,能够提升训练得到的模型的性能。
第一方面,提供了一种智能模型的训练方法,中心节点与多个参与节点组联合执行智能模型的训练,该智能模型由推理目标的多种特征对应的多个特征模型组成,一个该参与节点组中的参与节点训练一个该特征模型,该训练方法由该多个参与节点组中训练第一特征模型的第一参与节点执行,包括:接收来自该中心节点的第一信息,该第一信息用于指示特征间约束变量,该特征间约束变量用于表征不同该特征之间的约束关系。根据该特征间约束变量、该第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,该第一梯度信息为特征间约束变量对应的梯度信息。向该中心节点发送该第一梯度信息。
根据上述方案,不同类型的参与节点随着模型训练过程中模型参数的更新,计算特征间约束变量对应的梯度信息,并反馈给中心节点,中心节点基于不同特征模型对应的参与节点推理得到的特征间约束变量的梯度信息,更新表征特征间关联关系的特征间约束变量,以实现模型的特征间解耦,使得不同类型的参与节点可以基于特征间约束变量以及本地特征数据对不同特征模型进行训练,使得中心节点可以基于不同类型的参与节点反馈的特征间约束变量的梯度更新特征间约束变量。实现了在无需传输原始数据的情况下,实现了联邦学习的训练数据的多样性以及特征的多样性。即避免了原始数据泄露,由能够提升训练 后的模型性能的效果。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:接收来自该中心节点的第一标识集合,该第一标识集合包括中心节点选择的特征间约束变量的样本数据的标识。该根据该特征间约束变量、该第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,该第一梯度信息为特征间约束变量对应的梯度信息,包括:确定该第一参与节点的样本数据集合中包括第一标识对应的该第一样本数据,该第一标识属于该第一标识集合。根据该特征间约束变量、该第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,该第一梯度信息为特征间约束变量对应的梯度信息。
根据上述方案,中心节点通过第一标识集合选择了部分样本数据用于推理特征间约束变量的梯度信息。存储了被选择的样本数据的参与节点基于当前特征模型的模型参数以及样本数据推理特征间约束变量的梯度信息,并反馈给中心节点。相对于每个参与节点均参与反馈特征间约束变量的梯度信息的方式,能够减小资源开销和实现复杂度。
结合第一方面,在第一方面的某些实施方式中,该向该中心节点发送该第一梯度信息,包括:向该中心节点发送量化后的第一目标梯度信息,该第一目标梯度信息包括该第一梯度信息,或者该第一目标梯度信息包括该第一梯度信息和第一残差梯度信息,该第一残差梯度信息用于表征在得到该第一梯度信息之前未发送给该中心节点的特征间约束变量对应的梯度信息的残差量。
根据上述方案,参与节点可以在一次模型训练后,将训练得到的梯度信息以及本次模型训练之前未传输至中心节点的梯度信息的残差量发送给中心节点。使得中心节点能够获取到残差梯度信息,提高模型训练的效率。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:基于该第一目标梯度信息和量化后的该第一目标梯度信息,得到第二残差梯度信息,该第二残差梯度信息为该第一目标梯度信息中未发送给该中心节点的残差量。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:根据第一量化噪声信息和信道资源信息,确定第一门限值,其中,该第一量化噪声信息用于表征对该第一目标梯度信息的量化编解码的损失量。向该中心节点发送量化后的第一目标梯度信息,包括:确定该第一目标梯度信息的度量值大于该第一门限值;向该中心节点发送量化后的该第一目标梯度信息。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:若该第一目标梯度信息的度量值小于或等于该第一门限值,确定不向该中心节点发送量化后的该第一目标梯度信息。
根据上述方案,参与节点基于量化噪声信息和信道资源信息确定判断是否发送量化后的目标梯度信息的度量值的门限值。该方式考虑了目标信息的量化编解码损失量,确定是否向中心节点发送目标信息,实现了参与节点的对信道环境自适应调度,能够提高信号传输的可靠性以及信道资源的利用率。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:若该第一目标梯度信息的度量值小于该第一门限值,确定第三残差梯度信息,该第三残差梯度信息为该第一目标梯度信息。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:根据信道资源信息、 通信代价信息和第一目标梯度信息,得到该第一量化噪声信息,该通信代价信息用于指示通信资源的通信代价权重,该通信资源包括传输功率和/或传输带宽。
根据上述方案,参与节点可以基于信道资源信息、通信代价信息和第一目标梯度信息得到第一量化噪声信息,从而可以基于该第一量化噪声信息实现自适应调度,提高信号传输的可靠性以及信道资源的利用率。
结合第一方面,在第一方面的某些实施方式中,该根据第一量化噪声信息和信道资源信息,确定第一门限值,包括:根据该第一量化噪声信息、通信代价信息、该信道资源信息和该第一目标梯度信息,确定传输带宽和/或传输功率,该通信代价信息用于指示通信资源的通信代价权重,该通信资源包括传输功率和/或传输带宽;根据该第一量化噪声信息和该通信资源,确定该第一门限值。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:接收来自该中心节点的第二信息,该第二信息用于指示该通信代价信息。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:根据该特征间约束变量和模型训练数据,训练该第一特征模型,得到第二梯度信息。向该中心节点发送该第二梯度信息。
根据上述方案,通过特征间约束变量实现了模型的特征间解耦,使得不同类型的参与节点可以基于特征间约束变量以及本地特征数据对不同特征模型进行训练,既实现了联邦学习中训练数据的多样性,又实现了针对不同特征的模型训练。从而达到提升训练后的模型性能的效果。
结合第一方面,在第一方面的某些实施方式中,该向该中心节点发送该第二梯度信息,包括:向该中心节点发送量化后的第二目标梯度信息,该第二目标梯度信息包括该第二梯度信息,或者该第二目标梯度信息包括该第二梯度信息和第四残差梯度信息,该第四残差梯度信息用于表征在得到该第二梯度信息之前未发送给该中心节点的梯度信息的残差量。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:基于该第二目标梯度信息和量化后的该第二目标梯度信息,得到第五残差梯度信息,该第五残差梯度信息用于表征该第二目标梯度信息中未发送给该中心节点的残差量。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:根据第二量化噪声信息和信道资源信息,确定第二门限值,其中,该第二量化噪声信息用于表征对该第二目标梯度信息的量化编解码的损失量。向该中心节点发送量化后的第二目标梯度信息,包括:确定该第二目标梯度信息的度量值大于该第二门限值;向该中心节点发送量化后的该第二目标梯度信息。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:若该第二目标梯度信息的度量值小于或等于该第二门限值,确定不向该中心节点发送量化后的该第二目标梯度信息。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:若该第二目标梯度信息的度量值小于该第二门限值,确定第六残差梯度信息,该第六残差梯度信息为该第二目标梯度信息。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:根据信道资源信息、通信代价信息和第二目标梯度信息,得到该第二量化噪声信息,该通信代价信息用于指示通信资源的通信代价权重,该通信资源包括传输功率和/或传输带宽。
结合第一方面,在第一方面的某些实施方式中,该根据第二量化噪声信息和信道资源信息,确定第二门限值,包括:根据该第二量化噪声信息、通信代价信息、该信道资源信息和该第二目标梯度信息,确定传输带宽和/或传输功率,该通信代价信息用于指示通信资源的通信代价权重,该通信资源包括传输功率和/或传输带宽;根据该第二量化噪声信息和该通信资源,确定该第二门限值。
结合第一方面,在第一方面的某些实施方式中,该方法还包括:接收来自该中心节点的第三信息,该第三信息用于指示该第一特征模型的更新后的参数;根据该第三信息,更新该第一特征模型的参数。
第二方面,提供了一种智能模型的训练方法,中心节点与多个参与节点组联合执行智能模型的训练,该智能模型由推理目标的多种特征对应的多个特征模型组成,一个该参与节点组中的参与节点训练一个该特征模型,该训练方法由该中心节点执行,包括:确定特征间约束变量,该特征间约束变量用于表征不同的该特征之间的约束关系;向该多个参与节点组中的参与节点发送第一信息,该第一信息包括该特征间约束变量。
结合第二方面,在第二方面的某些实施方式中,该方法还包括:接收来自第一参与节点组中的参与节点的至少一个第二目标梯度信息,该多个参与节点组包括该第一参与节点组;根据该至少一个第二目标梯度信息,确定第一特征模型的更新后的模型参数,该第一特征模型为该第一参与节点组中的参与节点训练的特征模型;向该第一参与节点组发送该更新后的模型参数。
可选地,上述中心节点接收来自第一参与节点组中的参与节点的至少一个第二目标梯度信息,具体可以是,中心节点接收来自第一参与节点组中的参与节点的至少一个量化后的第二目标梯度信息,并对量化后的第二目标梯度信息量化解码后得到第二目标梯度信息。应理解,基于本文具体实施方式中的描述中心节点量化解码得到的第二目标梯度信息相较于参与节点量化编码前的第二目标梯度信息可能存在因量化编解码带来的损失量。
结合第二方面,在第二方面的某些实施方式中,该方法还包括:向该多个参与节点组中的参与节点发送第一标识集合,该第一标识集合包括中心节点选择的特征间约束变量的样本数据的标识。
结合第二方面,在第二方面的某些实施方式中,该方法还包括:接收来自多个该参与节点组中的参与节点的多个第一目标梯度信息,该第一目标梯度信息为该参与节点推理得到的特征间约束变量对应的梯度信息;以及,该确定特征间约束变量,包括:根据该多个第一目标梯度信息,确定该特征间约束变量。
结合第二方面,在第二方面的某些实施方式中,该方法还包括:向该多个参与节点组中的参与节点发送第二信息,该第二信息用于指示通信代价信息,该通信代价信息用于指示通信资源的通信代价权重,该通信资源包括传输功率和/或传输带宽。
第三方面,提供了一种通信方法,该方法包括:根据量化噪声信息、信道资源信息,确定门限值,其中,该量化噪声信息用于表征对目标信息的量化编解码的损失量;在该目标信息的度量值大于该门限值的情况下,发送量化后的该目标信息;在该目标信息的度量值小于或等于该门限值的情况下,不发送量化后的该目标信息。
根据上述方案,参与节点基于待传输的目标信息和中心节点广播的通信代价信息,考虑了目标信息的量化编解码损失量,确定是否向中心节点发送目标信息,实现了参与节点的对信道环境自适应调度,能够提高目标信号传输的可靠性以及信道资源的利用率。
结合第三方面,在第三方面的某些实施方式中,该目标信息包括第N次模型训练得到的梯度信息和第一目标残差信息,该第一目标残差信息为得到该梯度信息之前未发送的梯度信息的残差量。
结合第三方面,在第三方面的某些实施方式中,该方法还包括:在该目标信息的度量值大于该门限值的情况下,基于该目标信息和量化后的该目标信息,得到第二目标残差信息,该第二目标残差信息为该目标信息中未发送的残差量。
结合第三方面,在第三方面的某些实施方式中,该方法还包括:若该目标信息的度量值小于或等于该门限值,确定第三目标残差信息,该第三目标残差信息为该目标信息。
结合第三方面,在第三方面的某些实施方式中,该方法还包括:根据信道资源信息、通信代价信息和该目标信息,得到该量化噪声信息,该通信代价信息用于指示通信资源的通信代价权重,该通信资源包括传输功率和/或传输带宽。
结合第三方面,在第三方面的某些实施方式中,该根据量化噪声信息和信道资源信息,确定门限值,包括:根据该量化噪声信息、通信代价信息、该信道资源信息和该目标信息,确定传输带宽和/或传输功率,该通信代价信息用于指示通信资源的通信代价权重,该通信资源包括传输功率和/或传输带宽;根据该量化噪声信息和该通信资源,确定该门限值。
结合第三方面,在第三方面的某些实施方式中,该方法还包括:接收第二信息,该第二信息用于指示该通信代价信息。
第四方面,提供了一种智能模型的训练装置,包括:收发单元,用于接收来自中心节点的第一信息,该第一信息用于指示特征间约束变量,该特征间约束变量用于表征不同该特征之间的约束关系;处理单元,用于根据该特征间约束变量、该第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,该第一梯度信息为特征间约束变量对应的梯度信息;该收发单元,还用于向该中心节点发送该第一梯度信息。
第五方面,提供了一种智能模型的训练装置,包括:处理单元,用于确定特征间约束变量,该特征间约束变量用于表征不同的该特征之间的约束关系;收发单元,用于向多个参与节点组中的参与节点发送第一信息,该第一信息包括该特征间约束变量。
第六方面,提供了一种智通信装置,包括:处理单元,用于根据量化噪声信息、信道资源信息,确定门限值,其中,该量化噪声信息用于表征对目标信息的量化编解码的损失量。收发单元,用于在该目标信息的度量值大于该门限值的情况下,发送量化后的该目标信息;在该目标信息的度量值小于或等于该门限值的情况下,不发送量化后的该目标信息。
第七方面,提供了一种通信装置,包括处理器。该处理器可以实现上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法,或实现上述第三方面以及第三方面中任一种可能实现方式中的方法。
可选地,该通信装置还包括存储器,该处理器与该存储器耦合,可用于执行存储器中的指令,以实现上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法,或实现上述第三方面以及第三方面中任一种可能实现方式中的方法。
可选地,该通信装置还包括通信接口,处理器与通信接口耦合。本申请实施例中,通信接口可以是收发器、管脚、电路、总线、模块或其它类型的通信接口,不予限制。
在一种实现方式中,该通信装置为通信设备。当该通信装置为通信设备时,该通信接口可以是收发器,或,输入/输出接口。
在另一种实现方式中,该通信装置为配置于通信设备中的芯片。当该通信装置为配置于通信设备中的芯片时,该通信接口可以是输入/输出接口,该处理器可以是逻辑电路。
可选地,该收发器可以为收发电路。可选地,该输入/输出接口可以为输入/输出电路。
第八方面,提供了一种处理器,包括:输入电路、输出电路和处理电路。该处理电路用于通过该输入电路接收信号,并通过该输出电路发射信号,使得该处理器执行第一方面以及第一方面中任一种可能实现方式中的方法。
在具体实现过程中,上述处理器可以为一个或多个芯片,输入电路可以为输入管脚,输出电路可以为输出管脚,处理电路可以为晶体管、门电路、触发器和各种逻辑电路等。输入电路所接收的输入的信号可以是由例如但不限于接收器接收并输入的,输出电路所输出的信号可以是例如但不限于输出给发射器并由发射器发射的,且输入电路和输出电路可以是同一电路,该电路在不同的时刻分别用作输入电路和输出电路。本申请实施例对处理器及各种电路的具体实现方式不做限定。
第九方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序(也可以称为代码,或指令),当该计算机程序被运行时,使得计算机执行上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法,或实现上述第三方面以及第三方面中任一种可能实现方式中的方法。
第十方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序(也可以称为代码,或指令)当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法,或实现上述第三方面以及第三方面中任一种可能实现方式中的方法。
第十一方面,提供了一种通信系统,包括前述的多个参与节点和至少一个中心节点。
上述第二方面至第十一方面中任一方面及其任一方面中任意一种可能的实现可以达到的技术效果,请参照上述第一方面及其第一方面中相应实现可以带来的技术效果描述,这里不再重复赘述。
附图说明
图1是适用于本申请实施例的通信系统的一个示意图;
图2是本申请实施例提供的智能模型训练方法的一个示意性流程图;
图3是本申请实施例提供的智能模型训练方法的另一个示意性流程图;
图4是本申请实施例提供的多个参与节点共享传输资源的一个示意图;
图5是本申请的通信装置的一例的示意性框图;
图6是本申请的通信设备的一例的示意性结构图。
具体实施方式
在本申请实施例中,“/”可以表示前后关联的对象是一种“或”的关系,例如,A/B可以表示A或B;“和/或”可以用于描述关联对象存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。为了便于描述本申请实施例的技术方案,在本申请实施例中,可以采用“第一”、“第二”等字样对功能相同或相似的技术特征进行区分。该“第一”、“第二”等字样并不对数量和执行 次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。在本申请实施例中,“示例性的”或者“例如”等词用于表示例子、例证或说明,被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
在本申请实施例中,至少一个(种)还可以描述为一个(种)或多个(种),多个(种)可以是两个(种)、三个(种)、四个(种)或者更多个(种),本申请不做限制。
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种通信系统,例如:长期演进(long term evolution,LTE)系统、LTE频分双工(frequency division duplex,FDD)系统、LTE时分双工(time division duplex,TDD)、第五代(5th generation,5G)通信系统、未来的通信系统(如第六代(6th generation,6G)通信系统)、或者多种通信系统融合的系统等,本申请实施例不做限定。其中,5G还可以称为新无线(new radio,NR)。
图1是适用于本申请实施例的通信系统的示意图。
如图1所示,适用于本申请实施例的通信系统可以包括至少一个中心节点,以及多个参与节点组,中心节点与多个参与节点组执行智能模型的联邦学习,智能模型由推理目标的多种特征对应的多个特征模型组成,一个参与节点组中的参与节点训练一个特征模型。同一参与节点组中的参与节点用于训练特征模型的训练样本(或称为训练数据、样本数据)不同,或者说同一参与节点组中的参与节点的训练样本所属的样本空间不同;不同参与节点组中的参与节点用于训练特征模型的训练样本的样本特征不同,或者说不同参与节点组中的参与节点的训练样本所属的特征空间不同。中心节点与参与节点可以基于本申请实施例提供的智能模型训练方法实现将智能模型基于不同特征解耦为多个特征模型,由不同的参与节点组基于相应的特征样本数据训练不同的特征模型,再由中心节点对参与节点训练后反馈的梯度信息进行聚合,并更新智能模型的参数。实现了特征多样性的模型训练,能够提升模型的性能。
本申请实施例提供的中心节点可以是网络设备,例如,服务器、基站等。中心节点可以是一种部署在无线接入网中能够与参与节点进行直接或间接通信的设备。
本申请实施例提供的参与节点可以是一种具有收发功能的设备,如终端、终端设备,示例性地,参与节点可以是传感器或具有数据采集功能的设备。参与节点可以被部署在陆地上,包括室内、室外、手持、和/或车载;也可以被部署在水面上(如轮船等);参与节点还可以被部署在空中(例如飞机、气球和卫星上等)。参与节点可以是用户设备(user equipment,UE),UE包括具有无线通信功能的手持式设备、车载设备、可穿戴设备或计算设备。示例性地,UE可以是手机(mobile phone)、平板电脑或带无线收发功能的电脑。终端设备还可以是虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制中的无线终端、无人驾驶中的无线终端、远程医疗中的无线终端、智能电网中的无线终端、智慧城市(smart city)中的无线终端、和/或智慧家庭(smart home)中的无线终端等等。
本申请实施例提供的技术方案可以用于在多种场景中,例如,智能零售、智慧家庭、视频监控(video surveillance)、车辆网(如自动驾驶、无人驾驶等)、以及工业无线传感器网络(industrial wireless sens or network,IWSN)等。但本申请不限于此。
在一种实施方式中,本申请提供的技术方案可以应用于智能家庭,实现基于客户需求 为客户提供个性化服务。中心节点可以是基站或服务器,参与节点可以是设置在各个家庭中的客户端设备。基于本申请提供的技术方案,客户端设备仅向服务器提供基于本地数据进行模型训练后通过路由器将合成梯度信息,能够在保护客户数据隐私的同时与服务器共享训练结果信息。服务器获取多个客户端设备提供的合成梯度信息的聚合梯度信息,确定更新后的模型参数并通知各个客户端设备,继续智能模型的训练,完成模型训练后客户端设备应用训练后的模型为客户提供个性化服务。
在另一种实施方式中,本申请提供的技术方案可以应用于工业无线传感器网络,实现工业智能化。中心节点可以是服务器,参与节点可以是工厂内的多个传感器(例如,可移动智能机器人等),传感器基于本地数据进行模型训练后向服务器发送合成梯度信息,并由服务器获基于传感器提供的合成梯度信息的聚合梯度信息,确定更新后的模型参数并通知各个传感器,继续智能模型的训练,完成模型训练后传感器应用训练后的模型为执行工厂任务,例如,传感器为可移动智能机器人,可以基于训练后的模型获取移动路线,完成工厂搬运任务、快递分拣任务等。
为了更好地理解本申请实施例,下面对本文中涉及到的术语做简单说明。
1、人工智能AI
人工智能AI是让机器具有学习能力,能够积累经验,从而能够解决人类通过经验可以解决的诸如自然语言理解、图像识别和/或下棋等问题。
2、神经网络(neural network,NN):作为人工智能的重要分支,是一种模仿动物神经网络行为特征进行信息处理的网络结构。神经网络的结构由大量的节点(或称神经元)相互联接构成。神经网络基于特定运算模型,通过对输入信息进行学习和训练达到处理信息的目的。一个神经网络包括输入层、隐藏层及输出层。输入层负责接收输入信号,输出层负责输出神经网络的计算结果,隐藏层负责特征表达等复杂的功能。隐藏层的功能由权重矩阵和对应的激活函数来表征。
深度神经网络(deep neural network,DNN)一般为多层结构。增加神经网络的深度和宽度,可以提高它的表达能力,为复杂系统提供更强大的信息提取和抽象建模能力。神经网络的深度可以表示为神经网络的层数。对于其中一层,神经网络的宽度可以表示为该层包括的神经元的个数。
DNN可以有多种构建方式,例如包括但不限于,递归神经网络(recurrent neural network,RNN)、卷积神经网络(convolutional neural network,CNN)以及全连接神经网络等。
3、训练(training)或学习
训练是指对模型(或称为训练模型)的处理过程。在该处理过程中通过优化该模型中的参数,如加权值,使该模型学会执行某项特定的任务。本申请实施例适用于但不限于以下一种或多种训练方法:监督学习、无监督学习、强化学习、和迁移学习等。有监督学习是利用一组具有已经打好正确标签的训练样本来训练。其中,已经打好正确标签是指每个样本有一个期望的输出值。与有监督学习不同,无监督学习是指一种方法,该方法没有给定事先标记过的训练样本,自动对输入的数据进行分类或分群。
4、推理
推理是指利用训练后的模型(训练后的模型可以称为推理模型)执行数据处理。将实际数据输入推理模型进行处理,得到对应的推理结果。推理还可以称为预测或决策,推理结果还可以称为预测结果、或决策结果等。
5、传统联邦学习(federated learning)
一种分布式AI训练方法,将AI算法的训练过程放在多个设备上进行,而不是聚合到一个服务器上,能够解决集中式AI训练时收集数据导致的耗时和大量通信开销问题。同时,由于不用将设备数据发送到服务器,也能够减少隐私安全问题。具体过程如下:中心节点向多个参与节点发送AI模型,参与节点基于自己的数据进行AI模型训练,并将自己训练的AI模型以梯度的方式上报给中心节点。中心节点对多个参与节点反馈的梯度信息进行聚合,得到新的AI模型的参数。中心节点可以将AI模型的更新后的参数发送给多个参与节点,参与节点再次执行对AI模型的训练。不同次联邦学习过程中,中心节点选择的参与节点可能相同,也可能不同,本申请对此不做限定。
在传统的联邦学习中,训练的多个参与节点中的模型与服务器的模型相同,参与模型训练的设备采用的训练数据的类型相同,可以称为同质网络(homogeneous network)。如图像识别模型的训练中可以由多个图像采集设备使用各自采集到的图像数据对模型进行训练,这种方式可以提高训练数据的多样性,但未考虑推理目标的特征多样性。例如,当同时基于动物的图像和音频进行分类时,猫和狗之间的分类可以更准确。再例如,在车联网中摄像机、定位系统和惯性测量单元(inertial measurement unit,IMU)用于收集不同类别(特征)的数据,以估计车辆的位置或区分道路网内的交通状况,能够提高学习性能。并且对于不同特征数据不同模型的训练效果不同,如一般使用结合分类神经网络的自动编码器来进行音频信号的特征提取和分类,而在处理图像数据时一般使用卷积神经网络。本申请考虑由不同类型的参与节点与中心节点组成异构网络(heterogeneous network)进行联邦学习,不同类型的参与节点分别训练智能模型中推理目标的不同特征对应的子模型,能够提升训练后的智能模型的性能。然而,不同特征之间具有关联关系(或者说具有耦合关系),为了实现不同参与节点组分别独立的训练不同的子模型,本申请提出由中心节点为不同类型的参与节点提供表征特征间关联关系的特征间约束变量,以实现模型的特征间解耦,使得不同类型的参与节点可以基于特征间约束变量以及本地特征数据对不同特征模型进行训练,不同类型的参与节点随着模型训练过程中模型参数的更新,计算特征间约束变量的梯度,并反馈给中心节点,使得中心节点可以基于不同类型的参与节点反馈的特征间约束变量的梯度更新特征间约束变量。既实现了联邦学习中训练数据的多样性,又实现了针对不同特征的模型训练。从而达到提升训练后的模型性能的效果。
下面结合附图对本申请实施例提供的智能模型的训练方法进行说明。
图2是本申请实施例提供的智能模型的训练方法的一个示意性流程图。中心节点与多个参与节点组联合执行智能模型的训练,智能模型由推理目标的多种特征对应的多个特征模型组成,一个参与节点组的参与节点可以采集一种特征对应的训练数据,基于该特征对应的训练数据训练该特征对应的特征模型,图2所示的智能模型的训练方法由多个参与节点组中的第一参与节点执行。该第一参与节点属于第一参与节点组,该第一参与节点组中的参与节点训练的模型为第一特征模型。
例如,中心节点与该多个参与节点组联合训练的智能模型包括M个特征模型,该M个特征模型分别由M个参与节点组进行训练,一个参与节点组中的参与节点训练一个特征模型。其中第一参与节点组为M个参与节点组中的第m类参与节点组,或者说特征模型m对应的参与节点组,即第一特征模型为特征模型m或称为第m类特征模型。第一参与节点可以是该第一参与节点组中的第k个参与节点。即第一参与节点可以称为第m类 参与节点组中的第k个参与节点。
图2为本申请实施例提供的智能模型的训练方法的一个示意性流程图。该方法包括但不限于以下步骤:
S201,中心节点向第一参与节点发送第一信息,该第一信息包括特征间约束变量,该特征间约束变量用于表征不同的特征之间的约束关系。
相应地,该第一参与节点接收来自中心节点的该第一信息,基于该第一信息确定特征间约束变量。
作为示例非限定,该第一信息为广播信息。每个参与节点组中的参与节点均能够接收到该第一信息,并基于该第一信息确定特征间约束变量。
S202,第一参与节点根据特征间约束变量、第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,该第一梯度信息为特征间约束变量对应的梯度信息。
中心节点向该第一参与节点组中的参与节点发送第三信息,该第三信息用于指示第一特征模型更新后的模型参数。该更新后的模型参数为中心节点基于第一参与节点组中的参与节点反馈的模型训练后的模型梯度信息得到的。该第一参与节点基于该模型参数信息,更新第一特征模型的参数,得到参数更新后的第一特征模型。第一参与节点可以采用图3中提供的模型训练方法训练参数更新后的该第一特征模型。具体可以参考下文对图3所示实施例的描述。
参与节点接收到该特征间约束变量后,可以基于该特征间约束变量、特征模型的模型参数和本地样本数据,利用梯度推理模型,推理特征间约束变量对应的梯度信息。以便中心节点可以获取一组或多组参与节点组中的参与节点反馈的特征间约束变量对应的梯度信息,中心节点可以基于获取到的特征间约束变量对应的梯度信息,更新特征间约束变量。
其中,第一特征模型的模型参数可以是最近一次接收到的来自中心节点的第三信息指示的更新后的模型参数。
一种实施方式中,参与模型训练的每个参与节点接收到特征间约束变量后,推理特征间约束变量对应的梯度信息,中心节点基于每个参与节点反馈的特征间约束变量对应的梯度信息更新特征间约束变量。
另一种实施方式中,由于同组的参与节点训练的特征模型相同,可以是参与模型训练的部分参与节点接收到特征间约束变量后,推理特征间约束变量对应的梯度信息,中心节点基于部分参与节点反馈的特征间约束变量对应的梯度信息更新特征间约束变量。
第一参与节点可以基于以下方式确定是否基于特征间约束变量、第一特征模型的模型参数以及本地样本数据,推理特征间约束变量对应的梯度信息。
方式一,由中心节点触发多个参与节点组中的部分或全部参与节点基于特征间约束变量、模型参数和参与节点的本地样本数据,推理(或计算)特征间约束变量的梯度信息。
也就是说,可以由中心节点选择全部或部分参与模型训练的参与节点推理特征间约束变量对应的梯度信息。由于同一参与节点组中的参与节点训练的特征模型相同,中心节点可以选择每个参与节点组中的一个或多个参与节点推理特征间约束变量对应的梯度信息。但本申请不限于此,或者中心节点可以基于不同特征之间的关系,选择部分参与节点组中的一个或多个参与节点推理特征间约束变量对应的梯度信息。
一个示例中,中心节点可以发送第一标识集合,该第一标识集合中包括中心节点选择 的特征间约束变量的样本数据的标识。
第一参与节点接收到该第一标识集合后,基于该第一参与节点的样本数据集合中是否该第一标识集合中的标识对应的样本数据,确定是否推理特征间约束变量对应的梯度信息。
若第一参与节点的样本数据集合中包含第一标识集合中的标识对应的样本数据,如第一参与节点的样本数据集合中包括第一标识集合中的第一标识对应的第一样本数据,则第一参与节点基于特征间约束变量、第一特征模型的模型参数和该第一样本数据,推理特征间约束变量对应的梯度信息。
若第一参与节点的样本数据集合中不包含第一标识集合中的标识对应的样本数据,则第一参与节点不推理特征间约束变量对应的梯度信息。
其他参与节点采用相同的方式确定是否执行推理特征间约束变量对应的梯度信息。
另一个示例中,中心节点可以向参与节点发送推理指示信息,指示部分或全部参与节点推理特征间约束变量对应的梯度信息。
例如,中心节点可以向需要执行推理特征间约束变量对应的梯度信息的参与节点发送推理指示信息,接收到该推理指示信息的参与节点执行推理特征间约束变量对应的梯度信息。
再例如,中心节点可以广播该推理指示信息,该推理指示信息包括一个或多个参与节点的标识,推理指示信息中包含的标识对应的参与节点执行推理特征间约束变量对应的梯度信息。
方式二,中心节点与参与节点配置有相同的样本数据选择器,中心节点和参与节点可以基于样本数据选择器确定执行推理特征间约束变量对应的梯度信息的参与节点。
例如,该样本数据选择器可以生成样本数据的至少一个标识,该至少一个标识对应的样本数据用于本轮推理特征间约束对应的梯度信息,若第一参与节点的样本数据集合包括该至少一个标识中的标识对应的样本数据(如第一标识对应的第一样本数据),则第一参与节点执行推理特征间约束变量对应的梯度信息;若第一参与节点的样本数据集合不包括该至少一个标识中的标识对应的样本数据,则第一参与节点不执行推理特征间约束变量对应的梯度信息。其他参与节点采用相同的方式确定是否执行推理特征间约束变量对应的梯度信息。
若第一参与节点确定执行推理特征间约束变量对应的梯度信息,第一参与节点根据特征间约束变量、第一特征模型的模型参数和第一样本数据,推理特征间约束变量对应的梯度信息。
例如,本轮(如第t次推理中)用于推理特征间约束变量对应的梯度信息的样本数据标识集合为I t,第一参与节点的第一样本数据的标识i属于I t,即i∈I t,则第一参与节点根据第一样本数据(即样本数据i)对应的特征间约束变量
Figure PCTCN2022140797-appb-000001
第一特征模型的模型参数
Figure PCTCN2022140797-appb-000002
和该第一样本数据,利用梯度推理模型,推理特征间约束变量
Figure PCTCN2022140797-appb-000003
对应的梯度信息,得到第一梯度信息
Figure PCTCN2022140797-appb-000004
Figure PCTCN2022140797-appb-000005
其中,
Figure PCTCN2022140797-appb-000006
表示训练特征模型m(即第一特征模型)的参与节点在第t次推理中基于标识i对应的样本数据(即第一样本数据)推理得到的特征间约束变量
Figure PCTCN2022140797-appb-000007
的梯度信息。
Figure PCTCN2022140797-appb-000008
为第一参与节点从中心节点获取到的更新后的模型参数,b t为第t次训练的偏置参数,
Figure PCTCN2022140797-appb-000009
为第t次训练中的第i个训练数据对应的辅助变量。该偏置参数b t和辅助变量
Figure PCTCN2022140797-appb-000010
来自中心节点。
Figure PCTCN2022140797-appb-000011
为用于计算模型参数对应的梯度信息的函数。
第一参与节点得到特征间约束变量对应的梯度信息后,可以向中心节点发送量化后的第一目标梯度信息。量化后的第一目标梯度信息可以记作
Figure PCTCN2022140797-appb-000012
一种实施方式中,该第一目标梯度信息为上述第一梯度信息。
第一参与节点得到第一梯度信息后,第一参与节点对该第一梯度信息进行量化编码,得到量化后的第一梯度信息
Figure PCTCN2022140797-appb-000013
第一参与节点向中心节点发送该量化后的第一梯度信息。以便中心节点基于参与节点反馈的特征间约束变量对应的梯度信息,得到更新后的特征间约束变量λ t+1
另一种实施方式中,该第一目标梯度信息包括第一梯度信息和第一残差梯度信息,该第一残差梯度信息用于表征第一参与节点在得到该第一梯度信息之前未发送给中心节点的特征间约束变量对应的梯度信息的残差量。
该第一目标梯度信息
Figure PCTCN2022140797-appb-000014
可以表示为:
Figure PCTCN2022140797-appb-000015
其中,β t=τ t-1t,τ t为第t次模型训练中模型参数的更新步长,即第t次模型训的学习率,τ t-1为第t-1次模型训练中模型参数的更新步长,即第t-1次模型训的学习率,
Figure PCTCN2022140797-appb-000016
为第一残差梯度信息,即第t次推理特征间约束变量对应的梯度信息之前未发送给中心节点的特征间约束变量对应的梯度信息中的残差量。
第一参与节点向中心节点发送量化后的第一目标梯度信息,则该第一参与节点可以更新残差梯度信息,即第一参与节点基于第一目标梯度信息
Figure PCTCN2022140797-appb-000017
和量化后的第一目标梯度信息
Figure PCTCN2022140797-appb-000018
得到第二残差梯度信息,该第二残差梯度信息为第一参与节点第t+1次推理特征间约束变量对应的梯度信息之前未发送给中心节点的梯度信息的残差量
Figure PCTCN2022140797-appb-000019
Figure PCTCN2022140797-appb-000020
该第二残差梯度信息用于表征第一目标梯度信息中未发送给中心节点的残差量。该第二残差梯度信息作为第t+1次模型训练之前未发送给中心节点的特征间约束变量对应的梯度信息的残差量。也就是说,第一参与节点向中心节点发送了量化后的第一目标梯度信息,梯度信息的残差量更新为第一目标梯度信息中因量化编码而未发送给中心节点的残差量。
可选地,第一参与节点可以基于调度策略,确定是否向中心节点发送量化后的目标梯度信息。
例如,上述第一参与节点向中心节点发送量化后的第一目标梯度信息是基于调度策略,确定向中心节点发送量化后的目标梯度信息后,第一参与节点向中心节点发送该量化后的第一目标梯度信息
Figure PCTCN2022140797-appb-000021
若第一参与节点基于调度策略,确定不向中心节点发送量化后的目标梯度信息。则第一参与节点确定第三残差梯度信息。该第三残差梯度信息为该第一目标梯度信息。则该第三残差梯度信息为第一参与节点执行第t+1次模型训练得到梯度信息之前未发送给中心节点的梯度信息的残差量
Figure PCTCN2022140797-appb-000022
Figure PCTCN2022140797-appb-000023
也就是说,若第一参与节点基于调度策略确定不向中心节点发送该第一目标梯度信息,则梯度信息的残差量
Figure PCTCN2022140797-appb-000024
包括第t次推理得到的第一梯度信息
Figure PCTCN2022140797-appb-000025
以及第t次图例之前推理得到特征间约束变量对应的梯度信息中未发送给中心节点的残差量
Figure PCTCN2022140797-appb-000026
一个示例中,调度策略可以是中心节点通知第一参与节点的。
例如,中心节点向第一参与节点发送指示信息A,该指示信息A可以指示第一参与节点第t次推理特征间约束变量对应的梯度信息后向中心节点发送训练后得到的梯度信息,则第一参与节点在第t次推理后向中心节点发送量化后的第一目标梯度信息,并计算发送量化后的第一目标梯度信息后梯度信息的残差量(即第二残差梯度信息),
Figure PCTCN2022140797-appb-000027
为第二残差梯度信息。
或者,该指示信息A可以指示第一参与节点在第t次模型推理后不向中心节点发送得到的梯度信息,则该第一参与节点在第t次推理得到第一梯度信息后不向中心节点发送梯度信息,并将该第一梯度信息累加到残差梯度信息中得到第三残差梯度信息,
Figure PCTCN2022140797-appb-000028
为第三残差梯度信息。
另一个示例中,该调度策略可以是第一参与节点基于量化噪声信息、信道资源信息和第一目标梯度信息确定的。
第一参与节点基于量化噪声信息、信道资源信息和目标梯度信息(如本示例中的第一目标梯度信息)确定调度策略的具体实施方式在实施例二中进行了详细说明。
若第一参与节点不包含样本数据标识集合I t中的标识对应的样本数据,则第一参与节点不执行推理特征间约束变量的梯度信息。第一参与节点更新残差梯度信息,得到
Figure PCTCN2022140797-appb-000029
Figure PCTCN2022140797-appb-000030
S203,中心节点接收来自多个参与节点的特征间约束变量对应的目标梯度信息。
以该多个参与节点中的第一参与节点为例,第一参与节点可以向中心节点发送推理得到的量化后的第一目标梯度信息。中心节点接收来自第一参与节点的量化后的第一目标梯度信息后,通过量化解码后得到第一目标梯度信息,中心节点根据接收到的该多个参与节点各自反馈的特征间约束变量对应的目标梯度信息,更新每个样本数据对应的特征间约束变量。
Figure PCTCN2022140797-appb-000031
其中,N b是样本数据的个数,中心节点可以每次选择N b个样本数据。存储了被选择的样本数据的参与节点基于当前特征模型的模型参数以及样本数据推理特征间约束变量的梯度信息,并反馈给中心节点。相对于每个参与节点均参与反馈特征间约束变量的梯度信息的方式,能够减小资源开销和实现复杂度。
中心节点还基于每轮计算得到的特征间约束变量,更新偏置参数b t+1
Figure PCTCN2022140797-appb-000032
其中,
Figure PCTCN2022140797-appb-000033
r(·)是块可分离(block-separable)的正则化函数。以及,中心节点还基于每轮更新得到的特征间约束变量,更新辅助变量
Figure PCTCN2022140797-appb-000034
Figure PCTCN2022140797-appb-000035
其中,
Figure PCTCN2022140797-appb-000036
l i(·)是第i个数据样本的采样损失函数。
中心节点得到更新后的特征间约束变量后,将更新后的偏置参数b t+1、辅助变量
Figure PCTCN2022140797-appb-000037
特征间约束变量发送给参与节点,以便参与节点基于偏置参数、辅助变量和特征间约束变量训练特征模型。
根据上述方案,不同类型的参与节点随着模型训练过程中模型参数的更新,计算特征间约束变量对应的梯度信息,并反馈给中心节点,中心节点基于不同特征模型对应的参与节点推理得到的特征间约束变量的梯度信息,更新表征特征间关联关系的特征间约束变量,以实现模型的特征间解耦,使得不同类型的参与节点可以基于特征间约束变量以及本地特征数据对不同特征模型进行训练,使得中心节点可以基于不同类型的参与节点反馈的特征间约束变量的梯度更新特征间约束变量。实现了在无需传输原始数据的情况下,实现了联邦学习的训练数据的多样性以及特征的多样性。即避免了原始数据泄露,由能够提升训练后的模型性能的效果。
图3为本申请实施例提供的智能模型的训练方法的另一个示意性流程图。该方法包括但不限于以下步骤:
S301,中心节点向第一参与节点发送第一信息,该第一信息包括特征间约束变量,该特征间约束变量用于表征不同的特征之间的约束关系。
相应地,该第一参与节点接收来自中心节点的该第一信息,基于该第一信息确定特征间约束变量。
S302,该第一参与节点基于该特征间约束变量和模型训练数据,训练第一特征模型,得到第二梯度信息。
第一参与节点获取到该特征间约束变量后,基于该特征间约束变量和模型训练数据,执行对第一特征模型的第t次模型训练。
中心节点还向该第一参与节点组中的参与节点发送第三信息,该第三信息用于指示第一特征模型更新后的模型参数。该更新后的模型参数为中心节点基于第一参与节点组中的参与节点反馈的第t-1次模型训练后的模型梯度信息得到的。
该第一参与节点基于该模型参数信息,更新第一智能模型的参数,得到更新后的第一智能模型。第一参与节点再执行第t次模型训练,基于该特征间约束变量和模型训练数据,训练参数更新后的该第一智能模型。第一参与节点第t次训练第一特征模型后,得到第二梯度信息。
例如,第二梯度信息可以记作
Figure PCTCN2022140797-appb-000038
表示m类参与节点组中的第k个参与节点(即第一参与节点)在第t次训练后得到的梯度信息。该第二梯度信息
Figure PCTCN2022140797-appb-000039
可以表示为:
Figure PCTCN2022140797-appb-000040
其中,
Figure PCTCN2022140797-appb-000041
为第一参与节点在第t次训练选定的训练数据的索引值集合。
Figure PCTCN2022140797-appb-000042
为第一参与节点从中心节点获取到的更新后的模型参数,
Figure PCTCN2022140797-appb-000043
为特征间约束变量,b t为第t次训练的偏置参数,
Figure PCTCN2022140797-appb-000044
为第t次训练中的第i个训练数据对应的辅助变量。该偏置参数b t和辅助变量
Figure PCTCN2022140797-appb-000045
是第一参与节点从中心节点获取到的。
Figure PCTCN2022140797-appb-000046
为用于计算模型参数对应的梯度信息的函数,该第二梯度信息用于更新第一特征模型的模型参数,具体第一参与节点将向中心节点反馈该梯度信息,由中心节点基于第一参与节点组中的参与节点反馈的梯度信息,确定更新后的模型参数。
第一参与节点基于训练数据的索引值集合中每个索引值对应的训练数据训练第一特征模型后得到一个梯度信息,并累加基于每个训练数据训练模型后得到的梯度信息,其中,由于第一智能模型为智能模型的M个特征模型中的一个,因此,累加得到的梯度信息除以M后,得到第一参与节点第t次训练第一智能模型后的第二梯度信息
Figure PCTCN2022140797-appb-000047
但本申请不限于此。
第一参与节点第t次执行对第一智能模型的训练后,可以向中心节点发送量化后的第二目标梯度信息。量化后的第二目标梯度信息可以记作
Figure PCTCN2022140797-appb-000048
一种实施方式中,该第二目标梯度信息为上述第二梯度信息。
第一参与节点对第一智能模型执行第t次模型训练后得到第二梯度信息,第一参与节点对该第二梯度信息进行量化编码,得到量化后的第二梯度信息
Figure PCTCN2022140797-appb-000049
向中心节点发送该量化后的第二梯度信息。以便中心节点基于第一参与节点组中的参与节点反馈的第t次训练后得到的梯度信息,得到更新的第一特征模型的参数。
另一种实施方式中,该第二目标梯度信息包括第二梯度信息和第四残差梯度信息,该第四残差梯度信息用于表征第一参与节点在得到该第二梯度信息之前未发送给中心节点的梯度信息的残差量。
该第二目标梯度信息
Figure PCTCN2022140797-appb-000050
可以表示为:
Figure PCTCN2022140797-appb-000051
其中,α t=η t-1t,η t为第t次模型训练中模型参数的更新步长,即第t次模型训的学习率,η t-1为第t-1次模型训练中模型参数的更新步长,即第t-1次模型训的学习率,
Figure PCTCN2022140797-appb-000052
为第四残差梯度信息,即第t次模型训练之前的模型训练后得到的梯度信息中未发送给中心节点的梯度信息的残差量。
第一参与节点向中心节点发送量化后的第二目标梯度信息后,该第一参与节点可以更新残差梯度信息,即第一参与节点基于第二目标梯度信息
Figure PCTCN2022140797-appb-000053
和量化后的第二目标梯度信息
Figure PCTCN2022140797-appb-000054
得到第五残差梯度信息为第一参与节点执行第t+1次模型训练得到梯度信息之前未发送给中心节点的梯度信息的残差量
Figure PCTCN2022140797-appb-000055
Figure PCTCN2022140797-appb-000056
该第五残差梯度信息用于表征第二目标梯度信息中未发送给中心节点的残差量。该第五残差梯度信息作为第t+1次模型训练之前的模型训练后得到的梯度信息中未发送给中心节点的梯度信息的残差量。也就是说,第一参与节点向中心节点发送了量化后的第二目标梯度信息,梯度信息的残差量为第二目标梯度信息中因量化编码而未发送给中心节点的残差量。
可选地,第一参与节点可以基于调度策略,确定是否向中心节点发送量化后的目标梯度信息。
例如,上述第一参与节点向中心节点发送量化后的第二目标梯度信息是基于调度策略,确定向中心节点发送量化后的目标梯度信息后,第一参与节点向中心节点发送该量化后的第二目标梯度信息
Figure PCTCN2022140797-appb-000057
若第一参与节点基于调度策略,确定不向中心节点发送量化后的目标梯度信息。则第一参与节点确定第六残差梯度信息。该第六残差梯度信息为该第二目标梯度信息。则该第六残差梯度信息为第一参与节点执行第t+1次模型训练得到梯度信息之前未发送给中心节点的梯度信息的残差量
Figure PCTCN2022140797-appb-000058
Figure PCTCN2022140797-appb-000059
也就是说,若第一参与节点基于调度策略确定不向中心节点发送该第二目标梯度信息,则梯度信息的残差量
Figure PCTCN2022140797-appb-000060
包括第t次模型训练得到的第二梯度信息
Figure PCTCN2022140797-appb-000061
以及第t次模型训练得到第二梯度信息之前未发送给中心节点的残差量
Figure PCTCN2022140797-appb-000062
一个示例中,调度策略可以是中心节点通知第一参与节点的。
例如,中心节点向第一参与节点发送指示信息A,该指示信息A可以指示第一参与节点第t次模型训练后向中心节点发送训练后得到的梯度信息,则第一参与节点在第t次训练后向中心节点发送量化后的第二目标梯度信息,并计算发送量化后的第二目标梯度信息后梯度信息的残差量(即第五残差梯度信息),
Figure PCTCN2022140797-appb-000063
为第五残差梯度信息。
或者,该指示信息A可以指示第一参与节点在第t次模型训练后不向中心节点发送训练后得到的梯度信息,则该第一参与节点在第t次训练得到第二梯度信息后不向中心节点发送梯度信息,并将该第二梯度信息累加到残差梯度信息中得到第六残差梯度信息,
Figure PCTCN2022140797-appb-000064
为第六残差梯度信息。
另一个示例中,该调度策略可以是第一参与节点基于量化噪声信息、信道资源信息和第二目标梯度信息确定的。
第一参与节点基于量化噪声信息、信道资源信息和目标梯度信息确定调度策略的具体实施方式在实施例二中进行了详细说明。
中心节点接收来自第一参与节点组中的参与节点执行第t次模型训练后发送的量化后的目标梯度信息,
Figure PCTCN2022140797-appb-000065
其中,
Figure PCTCN2022140797-appb-000066
N b是数据采样的个数
实施例二
本申请实施例提供了一种参与节点基于信道资源信息和待传输信号
Figure PCTCN2022140797-appb-000067
确定调度策略的方式。该待传输信号
Figure PCTCN2022140797-appb-000068
可以是上述第二目标梯度信息
Figure PCTCN2022140797-appb-000069
该调度策略用于第一参与节点确定是否向中心节点发送该第二目标梯度信息。该待传输信号
Figure PCTCN2022140797-appb-000070
可以是上述第一目标梯度信息
Figure PCTCN2022140797-appb-000071
该调度策略用于第一参与节点确定是否向中心节点发送该第一目标梯度信息。但本申请不限于此,该调度策略还可以用于其他待传输信号是否传输的决策中。
作为示例非限定,该信道资源信息包括信道状态信息和/或传输时间信息。
其中,信道状态信息为第一参与节点与中心节点之间信道的状态信息h k,该传输时间信息为该第一参与节点占用信道资源传输梯度信息的持续时间T 0
第一参与节点可以基于信道资源信息和目标信息确定量化噪声信息,该量化噪声信息用于表征目标信息的量化编解码的损失量。例如,该目标信息可以是上述第一目标梯度信息,
Figure PCTCN2022140797-appb-000072
或者可以是上述第二目标梯度信息
Figure PCTCN2022140797-appb-000073
或者该目标信息还可以是其他信息,本申请对此不作限定。
例如图4所示,第一参与节点采用量化编码模块对待发送的目标信息
Figure PCTCN2022140797-appb-000074
进行量化编码,得到量化后的目标信息
Figure PCTCN2022140797-appb-000075
该第一参与节点通过第一参与节点的收发模块向中心节点发送该量化后的目标信息
Figure PCTCN2022140797-appb-000076
该量化后的目标信息经过信道传播后由中心节点通过中心节点的收发模块接收到,由中心节点接收到的量化后的目标信息可以记作
Figure PCTCN2022140797-appb-000077
中心节点通过量化 解码器对
Figure PCTCN2022140797-appb-000078
进行量化解码后得到恢复的目标信息
Figure PCTCN2022140797-appb-000079
恢复得到的
Figure PCTCN2022140797-appb-000080
相较于
Figure PCTCN2022140797-appb-000081
的损失量即为量化噪声。
一个示例中,第一参与节点可以对目标信息量化编码再进行量化解码,得到量化噪声信息,该量化噪声信息为目标信息与对目标信息量化编码又解码得到的信号的差值。
另一个示例中,第一参与节点可以基于信道资源信息和目标信息估计目标信息的量化噪声信息。
也就是说,第一参与节点可以基于获取到的信道资源信息,估计量化后的目标信息经过信道传输以及中心节点的量化解码后该目标信息的损失量,得到量化噪声信息。可选地,信道资源信息可以包括信道状态信息和/或信道占用时间信息(即传输时间信息)。
可选地,第一参与节点根据信道资源信息、通信代价信息和目标信息,得到目标信息的量化噪声信息。其中,该通信代价信息用于指示通信资源的通信代价权重,其中,通信资源可以包括传输功率和/或传输带宽。
可选地,中心节点可以向第一参与节点发送第二信息,该第二信息用于指示该通信代价信息。相应地,第一参与节点接收来自中心节点的该第二信息,根据该第二信息确定该通信代价信息。
例如,通信代价信息可以指示传输功率的代价权重γ p和传输带宽的代价权重γ B,第一参与节点通过来自中心节点的第二信息获取到该传输通信代价信息,可以基于传输功率代价权重γ p、信道资源信息即信道状态信息h k和传输时间信息T 0以及噪声功率谱密度N 0计算得到参数q k,该参数q k满足:
Figure PCTCN2022140797-appb-000082
第一参与节点还可以求解下式得到参数
Figure PCTCN2022140797-appb-000083
Figure PCTCN2022140797-appb-000084
第一参与节点根据参数
Figure PCTCN2022140797-appb-000085
和目标信息
Figure PCTCN2022140797-appb-000086
得到量化噪声信息,该量化噪声信息为目标信息的量化噪声的协方差矩阵。
Figure PCTCN2022140797-appb-000087
其中,V k为目标信息
Figure PCTCN2022140797-appb-000088
的协方差矩阵。
第一参与节点可以根据该量化噪声信息、通信代价信息、信道资源信息和目标信息可以分别确定传输带宽
Figure PCTCN2022140797-appb-000089
满足:
Figure PCTCN2022140797-appb-000090
其中,I是单位矩阵,det(A)表示矩阵A的行列式,log(x)表示计算x的对数。
以及,第一参与节点可以根据该量化噪声信息、通信代价信息、信道资源信息和目标信息可以分别确定传输功率
Figure PCTCN2022140797-appb-000091
满足:
Figure PCTCN2022140797-appb-000092
第一参与节点得到传输带宽
Figure PCTCN2022140797-appb-000093
和传输功率
Figure PCTCN2022140797-appb-000094
后,可以根据传输带宽
Figure PCTCN2022140797-appb-000095
传输功率
Figure PCTCN2022140797-appb-000096
目标信息的量化噪声信息
Figure PCTCN2022140797-appb-000097
以及通信代价,确定门限值
Figure PCTCN2022140797-appb-000098
该门限值
Figure PCTCN2022140797-appb-000099
满足:
Figure PCTCN2022140797-appb-000100
其中,tr(A)表示矩阵A的迹,在线性代数中,一个n×n矩阵A的主对角线(从左上方至右下方的对角线)上各个元素的总和被称为矩阵A的迹(或迹数),一般记作tr(A)。
第一参与节点可以比较目标信息
Figure PCTCN2022140797-appb-000101
的度量值与门限值的大小,确定是否向中心节点发送量化后的目标信息。
作为示例非限定,该目标信息的度量值可以是该目标信息的范数
Figure PCTCN2022140797-appb-000102
若该目标信息为向量,则该目标信息的范数
Figure PCTCN2022140797-appb-000103
为该目标信息的l 2范数;若该目标信息为矩阵,则该目标信息的范数
Figure PCTCN2022140797-appb-000104
为该目标信息的Frobeius范数。
当目标信息的度量值大于门限值
Figure PCTCN2022140797-appb-000105
Figure PCTCN2022140797-appb-000106
时,第一参与节点向中心节点发送量化后的目标信息,即该第一参与节点处于活跃状态,否则当目标信息的度量值小于或等于门限值
Figure PCTCN2022140797-appb-000107
Figure PCTCN2022140797-appb-000108
时,第一参与节点不向中心节点发送该目标信息,即该第一参与节点处于非活跃状态。
根据上述方案,参与节点基于待传输的目标信息和中心节点广播的通信代价信息,考虑了目标信息的量化编解码损失量,确定是否向中心节点发送目标信息,实现了参与节点的对信道环境自适应调度,能够提高目标信号传输的可靠性以及信道资源的利用率。
上述图2、图3所示的实施例中,参与节点可以基于该实施例二提供的调度策略,确定是否向中心节点发送量化后的目标信息。
例如,在图2所示的示例中,第一参与节点根据第一量化噪声信息、信道资源信息,确定第一门限值,其中,第一量化噪声信息用于表征对第一目标梯度信息的量化编解码的损失量。如参考前文中的描述第一参与节点可以根据参数
Figure PCTCN2022140797-appb-000109
和第一目标信息
Figure PCTCN2022140797-appb-000110
得到第一量化噪声信息。若第一目标梯度信息的度量值大于第一门限值,第一参与节点向中心节点发送量化后的第一目标梯度信息;若第一目标梯度信息的度量值小于或等于第一门限值,第一参与节点不向中心节点发送量化后的第一目标梯度信息。
再例如,在图3所示的示例中,第一参与节点根据第二量化噪声信息、信道资源信息,确定第二门限值,其中,第二量化噪声信息用于表征对第二目标梯度信息的量化编解码的损失量。如参考前文中的描述第一参与节点可以根据参数
Figure PCTCN2022140797-appb-000111
和第二目标信息
Figure PCTCN2022140797-appb-000112
得到第二量化噪声信息。若第二目标梯度信息的度量值大于第二门限值,第一参与节点向中心节点发送量化后的第二目标梯度信息;若第二目标梯度信息的度量值小于或等于第二门限值,第一参与节点不向中心节点发送量化后的第二目标梯度信息。
在本申请的各个示例中,如果没有特殊说明以及逻辑冲突,不同的示例之间的术语和/或描述具有一致性、且可以相互引用,不同的示例中的技术特征根据其内在的逻辑关系可以组合形成新的示例。
本申请中,所涉及的节点可以执行该节点相关的部分或全部步骤或操作。这些步骤或操作仅是示例,本申请还可以执行其它操作或者各种操作的变形。此外,各个步骤可以按照本申请呈现的不同的顺序来执行,并且有可能并非要执行本申请中的全部操作。
以上,结合图2至图4详细说明了本申请实施例提供的方法。以下详细说明本申请实施例提供的装置。为了实现上述本申请实施例提供的方法中的各功能,各网元可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
图5是本申请实施例提供的智能模型的训练装置的示意性框图。如图5所示,该智能 模型的训练装置500可以包括处理单元510和收发单元520。
在一种可能的设计中,该智能模型的训练装置500可对应于上文方法实施例中的参与节点,或者配置于(或用于)参与节点中的芯片,或者是其他能够实现参与节点执行的方法的装置、模块、电路或单元等。
应理解,该智能模型的训练装置500可对应于本申请实施例的方法中的参与节点,该智能模型的训练装置500可以包括用于执行图2、图3所示的方法的第一设备中的各个单元。并且,该智能模型的训练装置500中的各单元和上述其他操作和/或功能分别为了实现图2、图3所示的方法的相应流程。
当该智能模型的训练装置500用于实现上述方法实施例中的参与节点执行的相应流程时,该收发单元520,用于接收来自所述中心节点的第一信息,所述第一信息用于指示特征间约束变量,所述特征间约束变量用于表征不同所述特征之间的约束关系。该处理单元510,用于根据所述特征间约束变量、所述第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,所述第一梯度信息为特征间约束变量对应的梯度信息。该处理单元510,还用于向所述中心节点发送所述第一梯度信息。
可选地,该处理单元510还用于根据量化噪声信息、信道资源信息,确定门限值,其中,所述量化噪声信息用于表征对目标信息的量化编解码的损失量。该收发单元520,还用于在所述目标信息的度量值大于所述门限值的情况下,发送量化后的所述目标信息;在所述目标信息的度量值小于或等于所述门限值的情况下,不发送量化后的所述目标信息。
还应理解,该智能模型的训练装置500为配置于(或用于)参与节点中的芯片时,该智能模型的训练装置500中的收发单元520可以为芯片的输入/输出接口或电路,该智能模型的训练装置500中的处理单元510可以为芯片中的逻辑电路。
在另一种可能的设计中,该智能模型的训练装置500可对应于上文方法实施例中的中心节点,例如,或者配置于(或用于)中心节点中的芯片,或者是其他能够实现中心节点执行的方法的装置、模块、电路或单元等。
应理解,该智能模型的训练装置500可对应于图2、图3所示的方法中的中心节点,该智能模型的训练装置500可以包括用于执行图2、图3所示的方法的中心节点的各个单元。并且,该智能模型的训练装置500中的各单元和上述其他操作和/或功能分别为了实现图2、图3所示的方法的相应流程。
当该智能模型的训练装置500用于实现上述方法实施例中的中心节点执行的相应流程时,该处理单元510用于确定特征间约束变量,所述特征间约束变量用于表征不同的所述特征之间的约束关系;该收发单元520用于向所述多个参与节点组中的参与节点发送第一信息,所述第一信息包括所述特征间约束变量。
可选地,该处理单元510还用于根据量化噪声信息、信道资源信息,确定门限值,其中,所述量化噪声信息用于表征对目标信息的量化编解码的损失量。该收发单元520,还用于在所述目标信息的度量值大于所述门限值的情况下,发送量化后的所述目标信息;在所述目标信息的度量值小于或等于所述门限值的情况下,不发送量化后的所述目标信息。
还应理解,该智能模型的训练装置500为配置于(或用于)中心节点中的芯片时,该智能模型的训练装置500中的收发单元520可以为芯片的输入/输出接口或电路,该智能模型的训练装置500中的处理单元510可以为芯片中的逻辑电路。可选地,智能模型的训练装置500还可以包括存储单元530,该存储单元530可以用于存储指令或者数据,处理 单元510可以执行该存储单元中存储的指令或者数据,以使该智能模型的训练装置实现相应的操作。
应理解,该智能模型的训练装置500中的收发单元520为可通过通信接口(如收发器或输入/输出接口)实现,例如可对应于图6中示出的通信设备600中的收发器610。该智能模型的训练装置500中的处理单元510可通过至少一个处理器实现,例如可对应于图6中示出的通信设备600中的处理器620。该智能模型的训练装置500中的处理单元510还可以通过至少一个逻辑电路实现。该智能模型的训练装置500中的存储单元530可对应于图6中示出的通信设备600中的存储器。
还应理解,各单元执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为了简洁,在此不再赘述。
图6是本申请实施例提供的终端设备600的结构示意图。
该通信设备600可对应于上述方法实施例中的参与节点,如图6所示,该参与节点600包括处理器620和收发器610。可选地,该参与节点600还包括存储器。其中,处理器620、收发器610和存储器之间可以通过内部连接通路互相通信,传递控制和/或数据信号。该存储器用于存储计算机程序,该处理器620用于执行该存储器中的该计算机程序,以控制该收发器610收发信号。
应理解,图6所示的通信设备600能够实现图2、图3所示方法实施例中涉及参与节点的过程。参与节点600中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见上述方法实施例中的描述,为避免重复,此处适当省略详细描述。
该通信设备600可对应于上述方法实施例中的中心节点,如图6所示,该中心节点600包括处理器620和收发器610。可选地,该中心节点600还包括存储器。其中,处理器620、收发器610和存储器之间可以通过内部连接通路互相通信,传递控制和/或数据信号。该存储器用于存储计算机程序,该处理器620用于执行该存储器中的该计算机程序,以控制该收发器610收发信号。
应理解,图6所示的通信设备600能够实现图2、图3所示方法实施例中涉及中心节点的过程。中心节点600中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见上述方法实施例中的描述,为避免重复,此处适当省略详细描述。
上述处理器620可以和存储器可以合成一个处理装置,处理器620用于执行存储器中存储的程序代码来实现上述功能。具体实现时,该存储器也可以集成在处理器620中,或者独立于处理器620。该处理器620可以与图5中的处理单元对应。
上述收发器610可以与图5中的收发单元对应。收发器610可以包括接收器(或称接收机、接收电路)和发射器(或称发射机、发射电路)。其中,接收器用于接收信号,发射器用于发射信号。
应理解,图6所示的通信设备600能够实现图2、图3所示方法实施例中涉及终端设备的过程。终端设备600中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见上述方法实施例中的描述,为避免重复,此处适当省略详细描述。
本申请实施例还提供了一种处理装置,包括处理器和(通信)接口;所述处理器用于 执行上述任一方法实施例中的方法。
应理解,上述处理装置可以是一个或多个芯片。例如,该处理装置可以是现场可编程门阵列(field programmable gate array,FPGA),可以是专用集成芯片(application specific integrated circuit,ASIC),还可以是系统芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)或其他集成芯片。
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码由一个或多个处理器执行时,使得包括该处理器的装置执行图2、图3所示实施例中的方法。
本申请实施例提供的技术方案可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、终端设备、核心网设备、机器学习设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质等。
根据本申请实施例提供的方法,本申请还提供一种计算机可读存储介质,该计算机可读存储介质存储有程序代码,当该程序代码由一个或多个处理器运行时,使得包括该处理器的装置执行图2、图3所示实施例中的方法。
根据本申请实施例提供的方法,本申请还提供一种系统,其包括前述的一个或多个第一设备。还系统还可以进一步包括前述的一个或多个第二设备。
可选地,第一设备可以是网络设备或终端设备,第二设备可以是与第一设备通过无线链路进行通信的设备。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (67)

  1. 一种智能模型的训练方法,其特征在于,中心节点与多个参与节点组联合执行智能模型的训练,所述智能模型由推理目标的多种特征对应的多个特征模型组成,一个所述参与节点组中的参与节点训练一个所述特征模型,所述训练方法由所述多个参与节点组中训练第一特征模型的第一参与节点执行,包括:
    接收来自所述中心节点的第一信息,所述第一信息用于指示特征间约束变量,所述特征间约束变量用于表征不同所述特征之间的约束关系;
    根据所述特征间约束变量、所述第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,所述第一梯度信息为特征间约束变量对应的梯度信息;
    向所述中心节点发送所述第一梯度信息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收来自所述中心节点的第一标识集合,所述第一标识集合包括中心节点选择的特征间约束变量的样本数据的标识;
    所述根据所述特征间约束变量、所述第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,所述第一梯度信息为特征间约束变量对应的梯度信息,包括:
    确定所述第一参与节点的样本数据集合中包括第一标识对应的所述第一样本数据,所述第一标识属于所述第一标识集合;
    根据所述特征间约束变量、所述第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,所述第一梯度信息为特征间约束变量对应的梯度信息。
  3. 根据权利要求1或2所述的方法,其特征在于,所述向所述中心节点发送所述第一梯度信息,包括:
    向所述中心节点发送量化后的第一目标梯度信息,所述第一目标梯度信息包括所述第一梯度信息,或者所述第一目标梯度信息包括所述第一梯度信息和第一残差梯度信息,所述第一残差梯度信息用于表征在得到所述第一梯度信息之前未发送给所述中心节点的特征间约束变量对应的梯度信息的残差量。
  4. 根据权利要求3所述的方法,其特征在于,所述方法还包括:
    基于所述第一目标梯度信息和量化后的所述第一目标梯度信息,得到第二残差梯度信息,所述第二残差梯度信息为所述第一目标梯度信息中未发送给所述中心节点的残差量。
  5. 根据权利要求3或4所述的方法,其特征在于,所述方法还包括:
    根据第一量化噪声信息和信道资源信息,确定第一门限值,其中,所述第一量化噪声信息用于表征对所述第一目标梯度信息的量化编解码的损失量;
    向所述中心节点发送量化后的第一目标梯度信息,包括:
    确定所述第一目标梯度信息的度量值大于所述第一门限值;
    向所述中心节点发送量化后的所述第一目标梯度信息。
  6. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    若所述第一目标梯度信息的度量值小于或等于所述第一门限值,确定不向所述中心节点发送量化后的所述第一目标梯度信息。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    若所述第一目标梯度信息的度量值小于所述第一门限值,确定第三残差梯度信息,所述第三残差梯度信息为所述第一目标梯度信息。
  8. 根据权利要求5至7中任一项所述的方法,其特征在于,所述方法还包括:
    根据信道资源信息、通信代价信息和第一目标梯度信息,得到所述第一量化噪声信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  9. 根据权利要求5至8中任一项所述的方法,其特征在于,所述根据第一量化噪声信息和信道资源信息,确定第一门限值,包括:
    根据所述第一量化噪声信息、通信代价信息、所述信道资源信息和所述第一目标梯度信息,确定传输带宽和/或传输功率,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽;
    根据所述第一量化噪声信息和所述通信资源,确定所述第一门限值。
  10. 根据权利要求8或9所述的方法,其特征在于,所述方法还包括:
    接收来自所述中心节点的第二信息,所述第二信息用于指示所述通信代价信息。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述特征间约束变量和模型训练数据,训练所述第一特征模型,得到第二梯度信息;
    向所述中心节点发送所述第二梯度信息。
  12. 根据权利要求11所述的方法,其特征在于,所述向所述中心节点发送所述第二梯度信息,包括:
    向所述中心节点发送量化后的第二目标梯度信息,所述第二目标梯度信息包括所述第二梯度信息,或者所述第二目标梯度信息包括所述第二梯度信息和第四残差梯度信息,所述第四残差梯度信息用于表征在得到所述第二梯度信息之前未发送给所述中心节点的梯度信息的残差量。
  13. 根据权利要求12所述的方法,其特征在于,所述方法还包括:
    基于所述第二目标梯度信息和量化后的所述第二目标梯度信息,得到第五残差梯度信息,所述第五残差梯度信息用于表征所述第二目标梯度信息中未发送给所述中心节点的残差量。
  14. 根据权利要求12或13所述的方法,其特征在于,所述方法还包括:
    根据第二量化噪声信息和信道资源信息,确定第二门限值,其中,所述第二量化噪声信息用于表征对所述第二目标梯度信息的量化编解码的损失量;
    向所述中心节点发送量化后的第二目标梯度信息,包括:
    确定所述第二目标梯度信息的度量值大于所述第二门限值;
    向所述中心节点发送量化后的所述第二目标梯度信息。
  15. 根据权利要求14所述的方法,其特征在于,所述方法还包括:
    若所述第二目标梯度信息的度量值小于或等于所述第二门限值,确定不向所述中心节点发送量化后的所述第二目标梯度信息。
  16. 根据权利要求15所述的方法,其特征在于,所述方法还包括:
    若所述第二目标梯度信息的度量值小于所述第二门限值,确定第六残差梯度信息,所述第六残差梯度信息为所述第二目标梯度信息。
  17. 根据权利要求14至16中任一项所述的方法,其特征在于,所述方法还包括:
    根据信道资源信息、通信代价信息和第二目标梯度信息,得到所述第二量化噪声信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  18. 根据权利要求14至17中任一项所述的方法,其特征在于,所述根据第二量化噪声信息和信道资源信息,确定第二门限值,包括:
    根据所述第二量化噪声信息、通信代价信息、所述信道资源信息和所述第二目标梯度信息,确定传输带宽和/或传输功率,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽;
    根据所述第二量化噪声信息和所述通信资源,确定所述第二门限值。
  19. 根据权利要求1至18中任一项所述的方法,其特征在于,所述方法还包括:
    接收来自所述中心节点的第三信息,所述第三信息用于指示所述第一特征模型的更新后的参数;
    根据所述第三信息,更新所述第一特征模型的参数。
  20. 一种智能模型的训练方法,其特征在于,中心节点与多个参与节点组联合执行智能模型的训练,所述智能模型由推理目标的多种特征对应的多个特征模型组成,一个所述参与节点组中的参与节点训练一个所述特征模型,所述训练方法由所述中心节点执行,包括:
    确定特征间约束变量,所述特征间约束变量用于表征不同的所述特征之间的约束关系;
    向所述多个参与节点组中的参与节点发送第一信息,所述第一信息包括所述特征间约束变量。
  21. 根据权利要求20所述的方法,其特征在于,所述方法还包括:
    接收来自第一参与节点组中的参与节点的至少一个第二目标梯度信息,所述多个参与节点组包括所述第一参与节点组;
    根据所述至少一个第二目标梯度信息,确定第一特征模型的更新后的模型参数,所述第一特征模型为所述第一参与节点组中的参与节点训练的特征模型;
    向所述第一参与节点组发送所述更新后的模型参数。
  22. 根据权利要求20或21所述的方法,其特征在于,所述方法还包括:
    向所述多个参与节点组中的参与节点发送第一标识集合,所述第一标识集合包括中心节点选择的特征间约束变量的样本数据的标识。
  23. 根据权利要求20至22中任一项所述的方法,其特征在于,所述方法还包括:
    接收来自多个所述参与节点组中的参与节点的多个第一目标梯度信息,所述第一目标梯度信息为特征间约束变量对应的梯度信息;
    以及,所述确定特征间约束变量,包括:
    根据所述多个第一目标梯度信息,确定所述特征间约束变量。
  24. 根据权利要求20至23中任一项所述的方法,其特征在于,所述方法还包括:
    向所述多个参与节点组中的参与节点发送第二信息,所述第二信息用于指示通信代价信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  25. 一种智能模型的训练方法,其特征在于,包括:
    根据量化噪声信息和信道资源信息,确定门限值,其中,所述量化噪声信息用于表征对目标信息的量化编解码的损失量;
    在所述目标信息的度量值大于所述门限值的情况下,发送量化后的所述目标信息;
    在所述目标信息的度量值小于或等于所述门限值的情况下,不发送量化后的所述目标信息。
  26. 根据权利要求25所述的方法,其特征在于,所述目标信息包括第N次模型训练得到的梯度信息和第一目标残差信息,所述第一目标残差信息为得到所述梯度信息之前未发送的梯度信息的残差量。
  27. 根据权利要求25或26所述的方法,其特征在于,所述方法还包括:
    在所述目标信息的度量值大于所述门限值的情况下,基于所述目标信息和量化后的所述目标信息,得到第二目标残差信息,所述第二目标残差信息为所述目标信息中未发送的残差量。
  28. 根据权利要求25至27中任一项所述的方法,其特征在于,所述方法还包括:
    若所述目标信息的度量值小于或等于所述门限值,确定第三目标残差信息,所述第三目标残差信息为所述目标信息。
  29. 根据权利要求25至28中任一项所述的方法,其特征在于,所述方法还包括:
    根据信道资源信息、通信代价信息和所述目标信息,得到所述量化噪声信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  30. 根据权利要求25至29中任一项所述的方法,其特征在于,所述根据量化噪声信息和信道资源信息,确定门限值,包括:
    根据所述量化噪声信息、通信代价信息、所述信道资源信息和所述目标信息,确定传输带宽和/或传输功率,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽;
    根据所述量化噪声信息和所述通信资源,确定所述门限值。
  31. 根据权利要求29或30所述的方法,其特征在于,所述方法还包括:
    接收第二信息,所述第二信息用于指示所述通信代价信息。
  32. 一种智能模型的训练装置,其特征在于,包括:
    收发单元,用于接收来自中心节点的第一信息,所述第一信息用于指示特征间约束变量,所述特征间约束变量用于表征不同所述特征之间的约束关系;
    处理单元,用于根据所述特征间约束变量、第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,所述第一梯度信息为特征间约束变量对应的梯度信息;
    所述收发单元,还用于向所述中心节点发送所述第一梯度信息。
  33. 根据权利要求32所述的装置,其特征在于,
    所述收发单元还用于接收来自所述中心节点的第一标识集合,所述第一标识集合包括中心节点选择的特征间约束变量的样本数据的标识;
    以及,所述处理单元具体用于:
    确定第一参与节点的样本数据集合中包括第一标识对应的所述第一样本数据,所述第一标识属于所述第一标识集合,所述训练装置配置于所述第一参与节点;
    根据所述特征间约束变量、所述第一特征模型的模型参数和第一样本数据,利用梯度推理模型,得到第一梯度信息,所述第一梯度信息为特征间约束变量对应的梯度信息。
  34. 根据权利要求32或33所述的装置,其特征在于,
    所述收发单元具体用于向所述中心节点发送量化后的第一目标梯度信息,所述第一目标梯度信息包括所述第一梯度信息,或者所述第一目标梯度信息包括所述第一梯度信息和第一残差梯度信息,所述第一残差梯度信息用于表征在得到所述第一梯度信息之前未发送给所述中心节点的特征间约束变量对应的梯度信息的残差量。
  35. 根据权利要求34所述的装置,其特征在于,
    所述处理单元还用于基于所述第一目标梯度信息和量化后的所述第一目标梯度信息,得到第二残差梯度信息,所述第二残差梯度信息为所述第一目标梯度信息中未发送给所述中心节点的残差量。
  36. 根据权利要求34或35所述的装置,其特征在于,
    所述处理单元还用于根据第一量化噪声信息和信道资源信息,确定第一门限值,其中,所述第一量化噪声信息用于表征对所述第一目标梯度信息的量化编解码的损失量;
    所述处理单元还用于确定所述第一目标梯度信息的度量值大于所述第一门限值;
    所述收发单元具体用于在所述第一目标梯度信息的度量值大于所述第一门限值的情况下,向所述中心节点发送量化后的所述第一目标梯度信息。
  37. 根据权利要求36所述的装置,其特征在于,所述处理单元还用于在所述第一目标梯度信息的度量值小于或等于所述第一门限值的情况下,确定不向所述中心节点发送量化后的所述第一目标梯度信息。
  38. 根据权利要求37所述的装置,其特征在于,所述处理单元还用于在所述第一目标梯度信息的度量值小于所述第一门限值的情况下,确定第三残差梯度信息,所述第三残差梯度信息为所述第一目标梯度信息。
  39. 根据权利要求36至38中任一项所述的装置,其特征在于,所述处理单元还用于根据信道资源信息、通信代价信息和第一目标梯度信息,得到所述第一量化噪声信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  40. 根据权利要求36至39中任一项所述的装置,其特征在于,所述处理单元具体用于:
    在根据所述第一量化噪声信息、通信代价信息、所述信道资源信息和所述第一目标梯度信息,确定传输带宽和/或传输功率,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽;以及,
    根据所述第一量化噪声信息和所述通信资源,确定所述第一门限值。
  41. 根据权利要求39或40所述的装置,其特征在于,所述收发单元还用于接收来自所述中心节点的第二信息,所述第二信息用于指示所述通信代价信息。
  42. 根据权利要求32至41中任一项所述的装置,其特征在于,所述处理单元还用于根据所述特征间约束变量和模型训练数据,训练所述第一特征模型,得到第二梯度信息;以及,
    所述收发单元还用于向所述中心节点发送所述第二梯度信息。
  43. 根据权利要求42所述的装置,其特征在于,所述收发单元具体用于向所述中心节 点发送量化后的第二目标梯度信息,所述第二目标梯度信息包括所述第二梯度信息,或者所述第二目标梯度信息包括所述第二梯度信息和第四残差梯度信息,所述第四残差梯度信息用于表征在得到所述第二梯度信息之前未发送给所述中心节点的梯度信息的残差量。
  44. 根据权利要求43所述的装置,其特征在于,所述处理单元还用于基于所述第二目标梯度信息和量化后的所述第二目标梯度信息,得到第五残差梯度信息,所述第五残差梯度信息用于表征所述第二目标梯度信息中未发送给所述中心节点的残差量。
  45. 根据权利要求43或44所述的装置,其特征在于,所述处理单元还用于根据第二量化噪声信息和信道资源信息,确定第二门限值,其中,所述第二量化噪声信息用于表征对所述第二目标梯度信息的量化编解码的损失量;
    所述处理单元还用于确定所述第二目标梯度信息的度量值大于所述第二门限值;
    所述收发单元具体用于在所述第二目标梯度信息的度量值大于所述第二门限值的情况下,向所述中心节点发送量化后的所述第二目标梯度信息。
  46. 根据权利要求45所述的装置,其特征在于,所述处理单元还用于在所述第二目标梯度信息的度量值小于或等于所述第二门限值的情况下,确定不向所述中心节点发送量化后的所述第二目标梯度信息。
  47. 根据权利要求46所述的装置,其特征在于,所述处理单元还用于在所述第二目标梯度信息的度量值小于所述第二门限值的情况下,确定第六残差梯度信息,所述第六残差梯度信息为所述第二目标梯度信息。
  48. 根据权利要求45至47中任一项所述的装置,其特征在于,所述处理单元还用于在根据信道资源信息、通信代价信息和第二目标梯度信息,得到所述第二量化噪声信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  49. 根据权利要求45至48中任一项所述的装置,其特征在于,所述处理单元具体用于:
    根据所述第二量化噪声信息、通信代价信息、所述信道资源信息和所述第二目标梯度信息,确定传输带宽和/或传输功率,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽;
    根据所述第二量化噪声信息和所述通信资源,确定所述第二门限值。
  50. 根据权利要求32至49中任一项所述的装置,其特征在于,
    所述收发单元还用于接收来自所述中心节点的第三信息,所述第三信息用于指示所述第一特征模型的更新后的参数;
    所述处理单元还用于根据所述第三信息,更新所述第一特征模型的参数。
  51. 一种智能模型的训练装置,其特征在于,所述训练装置配置于中心节点,包括:
    处理单元,用于确定特征间约束变量,所述特征间约束变量用于表征不同的所述特征之间的约束关系;
    收发单元,用于向多个参与节点组中的参与节点发送第一信息,所述第一信息包括所述特征间约束变量。
  52. 根据权利要求51所述的装置,其特征在于,
    所述收发单元还用于接收来自第一参与节点组中的参与节点的至少一个第二目标梯度信息,所述多个参与节点组包括所述第一参与节点组;
    所述处理单元还用于根据所述至少一个第二目标梯度信息,确定第一特征模型的更新后的模型参数,所述第一特征模型为所述第一参与节点组中的参与节点训练的特征模型;
    所述收发单元还用于向所述第一参与节点组发送所述更新后的模型参数。
  53. 根据权利要求51或52所述的装置,其特征在于,所述收发单元还用于向所述多个参与节点组中的参与节点发送第一标识集合,所述第一标识集合包括中心节点选择的特征间约束变量的样本数据的标识。
  54. 根据权利要求51至53中任一项所述的装置,其特征在于,所述收发单元还用于接收来自多个所述参与节点组中的参与节点的多个第一目标梯度信息,所述第一目标梯度信息为特征间约束变量对应的梯度信息;
    所述处理单元具体用于根据所述多个第一目标梯度信息,确定所述特征间约束变量。
  55. 根据权利要求51至54中任一项所述的装置,其特征在于,所述收发单元还用于向所述多个参与节点组中的参与节点发送第二信息,所述第二信息用于指示通信代价信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  56. 一种智能模型的训练装置,其特征在于,包括:
    处理单元,用于根据量化噪声信息和信道资源信息,确定门限值,其中,所述量化噪声信息用于表征对目标信息的量化编解码的损失量;
    收发单元,用于在所述目标信息的度量值大于所述门限值的情况下,发送量化后的所述目标信息;
    所述收发单元还用于在所述目标信息的度量值小于或等于所述门限值的情况下,不发送量化后的所述目标信息。
  57. 根据权利要求56所述的装置,其特征在于,所述目标信息包括第N次模型训练得到的梯度信息和第一目标残差信息,所述第一目标残差信息为得到所述梯度信息之前未发送的梯度信息的残差量。
  58. 根据权利要求56或57所述的装置,其特征在于,所述处理单元还用于在所述目标信息的度量值大于所述门限值的情况下,基于所述目标信息和量化后的所述目标信息,得到第二目标残差信息,所述第二目标残差信息为所述目标信息中未发送的残差量。
  59. 根据权利要求56至58中任一项所述的装置,其特征在于,所述处理单元还用于在所述目标信息的度量值小于或等于所述门限值的情况下,确定第三目标残差信息,所述第三目标残差信息为所述目标信息。
  60. 根据权利要求56至59中任一项所述的装置,其特征在于,所述处理单元还用于根据信道资源信息、通信代价信息和所述目标信息,得到所述量化噪声信息,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽。
  61. 根据权利要求56至60中任一项所述的装置,其特征在于,所述处理单元具体用于:
    根据所述量化噪声信息、通信代价信息、所述信道资源信息和所述目标信息,确定传输带宽和/或传输功率,所述通信代价信息用于指示通信资源的通信代价权重,所述通信资源包括传输功率和/或传输带宽;
    根据所述量化噪声信息和所述通信资源,确定所述门限值。
  62. 根据权利要求60或61所述的装置,其特征在于,所述收发单元还用于接收第二 信息,所述第二信息用于指示所述通信代价信息。
  63. 一种通信装置,其特征在于,包括至少一个处理器,与存储器耦合;
    所述存储器用于存储程序或指令;
    所述至少一个处理器用于执行所述程序或指令,以使所述装置实现如权利要求1至19中任一项所述的方法,或者实现如权利要求20至24中任一项所述的方法,或者实现如权利要求25至31中任一项所述的方法。
  64. 一种芯片,其特征在于,包括至少一个逻辑电路和输入输出接口;
    所述逻辑电路用于控制所述输入输出接口并执行如权利要求1至19中任一项所述的方法,或者实现如权利要求20至24中任一项所述的方法,或者实现如权利要求25至31中任一项所述的方法。
  65. 一种计算机可读存储介质,其特征在于,存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1至19中任一项所述的方法,或者实现如权利要求20至24中任一项所述的方法,或者实现如权利要求25至31中任一项所述的方法。
  66. 一种计算机程序产品,其特征在于,包括指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1至19中任一项所述的方法,或者实现如权利要求20至24中任一项所述的方法,或者实现如权利要求25至31中任一项所述的方法。
  67. 一种通信系统,包括如权利要求32至50中任一项所述的训练装置,和/或如权利要求56至62中任一项所述的训练装置;以及还包括如权利要求51至55中任一项所述的训练装置。
PCT/CN2022/140797 2021-12-22 2022-12-21 智能模型的训练方法和装置 WO2023116787A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22910099.5A EP4435675A1 (en) 2021-12-22 2022-12-21 Intelligent model training method and apparatus
US18/750,688 US20240346329A1 (en) 2021-12-22 2024-06-21 Method and apparatus for training intelligent model

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111582987.9A CN116362334A (zh) 2021-12-22 2021-12-22 智能模型的训练方法和装置
CN202111582987.9 2021-12-22

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/750,688 Continuation US20240346329A1 (en) 2021-12-22 2024-06-21 Method and apparatus for training intelligent model

Publications (1)

Publication Number Publication Date
WO2023116787A1 true WO2023116787A1 (zh) 2023-06-29

Family

ID=86901289

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/140797 WO2023116787A1 (zh) 2021-12-22 2022-12-21 智能模型的训练方法和装置

Country Status (4)

Country Link
US (1) US20240346329A1 (zh)
EP (1) EP4435675A1 (zh)
CN (1) CN116362334A (zh)
WO (1) WO2023116787A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110138475A (zh) * 2019-05-08 2019-08-16 南京邮电大学 一种基于lstm神经网络的自适应门限信道占用状态预测方法
WO2020199914A1 (zh) * 2019-04-03 2020-10-08 华为技术有限公司 训练神经网络的方法和装置
CN111860829A (zh) * 2020-06-19 2020-10-30 光之树(北京)科技有限公司 联邦学习模型的训练方法及装置
CN112149266A (zh) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 确定网络模型量化策略的方法、装置、设备以及存储介质
CN113591145A (zh) * 2021-07-28 2021-11-02 西安电子科技大学 基于差分隐私和量化的联邦学习全局模型训练方法
CN113642740A (zh) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 模型训练方法及装置、电子设备和介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020199914A1 (zh) * 2019-04-03 2020-10-08 华为技术有限公司 训练神经网络的方法和装置
CN110138475A (zh) * 2019-05-08 2019-08-16 南京邮电大学 一种基于lstm神经网络的自适应门限信道占用状态预测方法
CN111860829A (zh) * 2020-06-19 2020-10-30 光之树(北京)科技有限公司 联邦学习模型的训练方法及装置
CN112149266A (zh) * 2020-10-23 2020-12-29 北京百度网讯科技有限公司 确定网络模型量化策略的方法、装置、设备以及存储介质
CN113591145A (zh) * 2021-07-28 2021-11-02 西安电子科技大学 基于差分隐私和量化的联邦学习全局模型训练方法
CN113642740A (zh) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 模型训练方法及装置、电子设备和介质

Also Published As

Publication number Publication date
EP4435675A1 (en) 2024-09-25
CN116362334A (zh) 2023-06-30
US20240346329A1 (en) 2024-10-17

Similar Documents

Publication Publication Date Title
Wang et al. Transfer learning promotes 6G wireless communications: Recent advances and future challenges
WO2021244334A1 (zh) 一种信息处理方法及相关设备
Shi et al. Machine learning for large-scale optimization in 6g wireless networks
US20230262728A1 (en) Communication Method and Communication Apparatus
Alnawayseh et al. Smart congestion control in 5g/6g networks using hybrid deep learning techniques
CN116582871B (zh) 一种基于拓扑优化的无人机集群联邦学习模型优化方法
Chua et al. Resource allocation for mobile metaverse with the Internet of Vehicles over 6G wireless communications: A deep reinforcement learning approach
Elbir et al. Hybrid federated and centralized learning
CN117389290A (zh) 基于图神经网络的通感一体多无人机路径规划方法
US20220172054A1 (en) Intermediate network node and method performed therein for handling data of communication networks
WO2024017001A1 (zh) 模型的训练方法及通信装置
WO2023116787A1 (zh) 智能模型的训练方法和装置
CN109194504A (zh) 面向动态网络的时序链路预测方法及计算机可读存储介质
WO2023098860A1 (zh) 通信方法和通信装置
Lin et al. Heuristic-learning-based network architecture for device-to-device user access control
CN115965078A (zh) 分类预测模型训练方法、分类预测方法、设备及存储介质
CN117693021A (zh) 一种波束管理方法
WO2023279967A1 (zh) 智能模型的训练方法和装置
WO2023179675A1 (zh) 信息处理方法和通信装置
WO2022028793A1 (en) Instantiation, training, and/or evaluation of machine learning models
Elbir et al. A Family of Hybrid Federated and CentralizedLearning Architectures in Machine Learning
US20240121622A1 (en) System and method for aerial-assisted federated learning
CN115834580B (zh) 面向海洋大数据的分布式数据处理方法、装置和设备
US20240144009A1 (en) Machine Learning
WO2024140353A1 (zh) 模型使用方法及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22910099

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022910099

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022910099

Country of ref document: EP

Effective date: 20240620

NENP Non-entry into the national phase

Ref country code: DE