WO2023179675A1 - 信息处理方法和通信装置 - Google Patents

信息处理方法和通信装置 Download PDF

Info

Publication number
WO2023179675A1
WO2023179675A1 PCT/CN2023/083106 CN2023083106W WO2023179675A1 WO 2023179675 A1 WO2023179675 A1 WO 2023179675A1 CN 2023083106 W CN2023083106 W CN 2023083106W WO 2023179675 A1 WO2023179675 A1 WO 2023179675A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
gradient
global
compressed
kernel
Prior art date
Application number
PCT/CN2023/083106
Other languages
English (en)
French (fr)
Inventor
马梦瑶
薛烨
苏立群
刘坚能
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023179675A1 publication Critical patent/WO2023179675A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • G06F18/2135Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on approximation criteria, e.g. principal component analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/30Services specially adapted for particular environments, situations or purposes
    • H04W4/38Services specially adapted for particular environments, situations or purposes for collecting sensor information

Definitions

  • the present application relates to the field of communications, and more specifically, to an information processing method and a communications device.
  • federated learning FL
  • the central node provides the model parameters of the intelligent model to multiple participating nodes in federated learning, and the multiple participating nodes execute based on their own data sets.
  • the intelligent model is trained, the gradient information of the loss function is fed back to the central node, and the central node updates the model parameters based on the gradient information fed back by multiple participating nodes. It can solve the time-consuming and communication overhead problems caused by collecting data during centralized model training. At the same time, since there is no need to transmit training data, privacy and security issues can also be reduced.
  • Gradient sparsification is a compression method that only sends gradient vector elements that exceed a certain threshold.
  • gradient sparsification only selects some elements in the gradient vector, the gradient vector obtained in each training cannot be completely processed. transmission, resulting in poor inference performance of the final trained intelligent model.
  • This application provides an information processing method and a communication device to improve the compression rate of gradient information and improve the transmission efficiency of gradient information.
  • an information processing method includes: performing t-th model training to obtain first gradient information, where t is an integer greater than 1. And, determine the first compression kernel information based on the gradient information obtained from previous model trainings.
  • the previous model trainings include the t-th model training and at least one model training before the t-th model training.
  • the gradients obtained from the previous model trainings are The information includes the first gradient information.
  • the first information is received from the central node.
  • the first information is used to indicate the first global compression kernel information.
  • the first global compression kernel information is used to calculate the model after training.
  • the obtained gradient information is compressed.
  • the correlation between the gradient information obtained by model training is used.
  • participating nodes dynamically track and learn to compress the kernel information as the model is trained, and feed it back to the central node, so that the central node can based on multiple From the compression kernel information provided by the participating nodes, the global compression kernel information is obtained and notified to the participating nodes.
  • the participating nodes compress the gradient information based on the global compression kernel information. It can improve the compression rate of gradient information and improve the transmission efficiency of gradient information on the basis of reducing the adverse impact of information compression on training results.
  • determining the first compression kernel information based on gradient information obtained from previous model trainings includes: determining the first compression kernel information based on the first gradient information and the second center vector information.
  • One center vector information, the first center vector information is used to represent the mean vector of the gradient information obtained by the previous model training, the The first compression kernel information includes the first center vector information, and the second center vector information is the center vector information determined after the t-1th model training.
  • the participating nodes determine the center vector information after the t-th model training (i.e. the first center vector information) through the gradient information obtained after the t-th model training and the second center vector information determined after the t-1 model training.
  • Vector information which can avoid using all the gradient information obtained from previous model trainings for iterative calculation after each model training, which can reduce storage overhead and improve the efficiency of information processing.
  • determining the first compression kernel information based on the gradient information obtained from previous model trainings further includes: obtaining the relationship between the first gradient information and the first center vector information.
  • the mean matrix of the covariance matrix information of the obtained deviation, the second covariance mean information is the covariance mean matrix information determined after the t-1th model training; obtain the first main covariance mean matrix information of the first covariance mean matrix information Feature matrix information, the first main feature matrix information is used to represent the feature vectors corresponding to M eigenvalues of the first covariance mean matrix information, the first compression kernel information includes the first main feature matrix information, M is greater than an integer of 1.
  • the participating nodes determine the covariance mean matrix information after this model training based on the gradient information obtained after the t-th model training and the covariance mean matrix information obtained after the previous model training, which can avoid each model training. All gradient information obtained from historical model training is then used for iterative calculation after each model training, which can reduce storage overhead and improve the efficiency of information processing.
  • the method further includes: compressing the second gradient information based on the first global compression kernel information to obtain second compressed gradient information, where the second gradient information is Gradient information obtained after the t+i-th model training, i is a positive integer; send the second compressed gradient information.
  • the first global compression kernel information includes global center vector information and global main feature matrix information
  • the second gradient information is calculated based on the first global compression kernel information.
  • Performing compression to obtain second compressed gradient information includes: obtaining the second compressed gradient information based on the deviation between the second gradient information and the global center vector information and the global main feature matrix information.
  • PCA is used to compress the gradient information obtained after model training based on the first global compression gradient information, which can improve the compression rate of the gradient information and improve the gradient while reducing the adverse impact of information compression on the training results.
  • Information transmission efficiency is used to improve the compression rate of the gradient information and improve the gradient while reducing the adverse impact of information compression on the training results.
  • the method before receiving the first information from the central node, the method further includes: compressing the first gradient information based on the second global compression kernel information, The first compression gradient information is obtained, and the second global compression kernel information is obtained from the central node.
  • sending the first compressed kernel information to the central node includes: sending the first compressed kernel information to the central node on a first resource.
  • sending the first compressed kernel information to the central node includes: when t/T 1 is an integer, sending the first compressed kernel information to the central node, Among them, T 1 is an integer greater than 1.
  • the participating nodes update the first compression kernel information after each model training, and after updating the compression kernel information every T times, send the latest updated compression kernel information to the central node, which can reduce the compression of each update.
  • the kernel information is then sent to the central node, resulting in transmission overhead.
  • an information processing method includes: first, receiving information from multiple participating nodes. First compress the kernel information to obtain first global compressed kernel information. Secondly, the first information is sent. The first information is used to indicate the first global compression kernel information. The first global compression kernel information is used to compress the gradient information obtained after model training.
  • receiving the first compressed kernel information from multiple participating nodes to obtain the first global compressed kernel information includes: receiving the first compressed kernel information from the multiple participating nodes on the first resource.
  • the first compressed kernel information of the participating nodes is used to obtain aggregated compressed kernel information.
  • the aggregated compressed kernel information is the compressed kernel information obtained by superimposing the first compressed kernel information of the multiple participating nodes on the first resource. And, according to the aggregated compression kernel information, the first global compression kernel information is obtained.
  • the method further includes: receiving compressed gradient information from multiple participating nodes. And, according to the compressed gradient information of the multiple parameter nodes, compressed gradient mean information is obtained, and the compressed gradient mean information is used to represent the mean vector of the compressed gradient information of the multiple parameter nodes. Then, based on the first global compression kernel information, the compressed gradient mean information is decompressed to obtain decompressed gradient mean information. Finally, based on the gradient mean information, the model parameters of the intelligent model are updated.
  • the first compression kernel information includes first center vector information
  • the first global compression kernel information includes global center vector information
  • the global center vector information is used to characterize The mean vector of the first center vector information of the multiple participating nodes.
  • the first compression kernel information includes first main feature matrix information.
  • the first global compression kernel information includes global main feature matrix information.
  • the global main feature matrix information is used to represent M global main feature vectors, where M is an integer greater than 1. .
  • decompressing the compressed gradient mean information based on the first global compression kernel information to obtain decompressed gradient mean information includes: based on the global center vector information and the global main feature matrix information, decompress the compressed gradient mean information, and obtain the compressed gradient mean information.
  • a communication device including: a processing unit configured to perform the t-th model training to obtain first gradient information, where t is an integer greater than 1; the processing unit is also used to obtain the first gradient information based on previous model trainings.
  • Gradient information determines the first compression kernel information.
  • the previous model training includes the t-th model training and at least one model training before the t-th model training.
  • the gradient information obtained by the previous model training includes the first gradient information;
  • the transceiver unit is used to send the first compression kernel information to the central node; the transceiver unit is also used to receive the first information from the central node, the first information is used to indicate the first global compression kernel information, the first global compression kernel information.
  • the compression kernel information is used to compress the gradient information obtained after model training.
  • the processing unit is specifically configured to determine first center vector information based on the first gradient information and the second center vector information, and the first center vector information is used to The mean vector representing the gradient information obtained from previous model trainings, the first compression kernel information includes the first center vector information, and the second center vector information is the center vector information determined after the t-1th model training.
  • the processing unit is specifically configured to obtain covariance matrix information of the deviation between the first gradient information and the first center vector information. And, based on the covariance matrix information and the second covariance mean matrix information, determine the first covariance mean matrix information.
  • the first covariance mean matrix information is used to represent the covariance of the deviation amount obtained after the previous model training.
  • the second covariance mean information is the covariance mean matrix information determined after the t-1th model training.
  • the first main characteristic matrix information is used to represent the eigenvectors corresponding to the M eigenvalues of the first covariance mean matrix information.
  • the kernel information includes the first main feature matrix information, and M is an integer greater than 1.
  • the processing unit is further configured to compress the second gradient information based on the first global compression kernel information to obtain the second compressed gradient information.
  • the second gradient information is the gradient information obtained after the t+i-th model training, and i is a positive integer.
  • the transceiver unit is also used to send the second compression gradient information.
  • the first global compression kernel information includes global center vector information and global main feature matrix information
  • the processing unit is further configured to combine the second gradient information with the global The deviation between the center vector information and the global main feature matrix information is used to obtain the second compressed gradient information.
  • the processing unit is further configured to perform the first gradient information on the second global compression kernel information based on the second global compression kernel information before receiving the first information from the central node. Compress to obtain the first compression gradient information, and the second global compression kernel information is obtained from the central node.
  • the transceiver unit is specifically configured to send the first compressed kernel information to the central node on the first resource.
  • the transceiver unit is specifically configured to send the first compressed kernel information to the central node when t/T 1 is an integer, where T 1 is greater than an integer of 1.
  • a communication device including: a processing unit configured to obtain first global compression kernel information based on received first compression kernel information from multiple participating nodes.
  • the transceiver unit is also used to send first information.
  • the first information is used to indicate first global compression kernel information.
  • the first global compression kernel information is used to compress gradient information obtained after model training.
  • the transceiver unit is further configured to receive first compressed kernel information from the plurality of participating nodes on the first resource to obtain aggregate compressed kernel information.
  • the kernel information is the compressed kernel information obtained by superimposing the first compressed kernel information of the plurality of participating nodes on the first resource.
  • the processing unit is specifically configured to obtain the first global compression kernel information based on the aggregated compression kernel information.
  • the transceiver unit is further configured to receive compression gradient information from multiple participating nodes; and, the processing unit is further configured to perform compression gradient information based on the multiple parameter nodes. information to obtain the compressed gradient mean information, which is used to represent the mean vector of the compressed gradient information of the multiple parameter nodes. And, the processing unit is also used to decompress the compressed gradient mean information based on the first global compression kernel information to obtain decompressed gradient mean information, and then update the model parameters of the intelligent model based on the gradient mean information.
  • the first compression kernel information includes first center vector information
  • the first global compression kernel information includes global center vector information
  • the global center vector information is used to characterize The mean vector of the first center vector information of the multiple participating nodes.
  • the first compression kernel information includes first main feature matrix information.
  • the first global compression kernel information includes global main feature matrix information.
  • the global main feature matrix information is used to represent M global main feature vectors, where M is an integer greater than 1. .
  • the processing unit is specifically configured to decompress the compressed gradient mean information based on the global center vector information and the global main feature matrix information to obtain the compressed gradient mean information.
  • a communication device including a processor.
  • the processor can implement the method in the above-mentioned first aspect and any possible implementation manner of the first aspect, or implement the method in the above-mentioned second aspect and any possible implementation manner of the second aspect.
  • the communication device further includes a memory
  • the processor is coupled to the memory and can be used to execute instructions in the memory to implement the above-mentioned first aspect and the method in any possible implementation manner of the first aspect, or to implement The above second aspect and the method in any possible implementation manner of the second aspect.
  • the communication device further includes a communication interface, and the processor is coupled to the communication interface.
  • the communication interface may be a transceiver, a pin, a circuit, a bus, a module or other types of communication interfaces, which is not limited in this application.
  • the communication device is a communication device.
  • the communication interface may be a transceiver, or an input/output interface.
  • the communication device is a chip configured in a communication device.
  • the communication interface may be an input/output interface
  • the processor may be a logic circuit.
  • the transceiver may be a transceiver circuit.
  • the input/output interface may be an input/output circuit.
  • a processor including: an input circuit, an output circuit and a processing circuit.
  • the processing circuit is configured to receive a signal through the input circuit and transmit a signal through the output circuit, so that the processor executes the first aspect and the method in any possible implementation manner of the first aspect.
  • the above-mentioned processor can be one or more chips
  • the input circuit can be an input pin
  • the output circuit can be an output pin
  • the processing circuit can be a transistor, a gate circuit, a flip-flop and various logic circuits, etc.
  • the input signal received by the input circuit may be received and input by, for example, but not limited to, the receiver, and the signal output by the output circuit may be, for example, but not limited to, output to and transmitted by the transmitter, and the input circuit and the output A circuit may be the same circuit that functions as an input circuit and an output circuit at different times.
  • the embodiments of this application do not limit the specific implementation methods of the processor and various circuits.
  • a computer program product includes: a computer program (which may also be called a code, or an instruction).
  • a computer program which may also be called a code, or an instruction.
  • the computer program When the computer program is run, it causes the computer to execute the above first aspect and the first aspect.
  • the method in any possible implementation manner, or the method in any possible implementation manner of implementing the above second aspect and the second aspect.
  • a computer-readable storage medium stores a computer program (which may also be called code, or instructions) that when run on a computer causes the computer to execute the above-mentioned first aspect and The method in any possible implementation of the first aspect, or the method in any of the above second aspects and any possible implementation of the second aspect, or the method in any of the above third aspects and the third possible implementation Methods in the implementation.
  • a computer program which may also be called code, or instructions
  • a communication system including the aforementioned multiple participating nodes and at least one central node.
  • Figure 1 is a schematic diagram of a communication system suitable for embodiments of the present application.
  • Figure 2 is a schematic flow chart of the information processing method provided by the embodiment of the present application.
  • FIG. 3 is another schematic flow chart of the information processing method provided by the embodiment of the present application.
  • Figure 4 is a schematic block diagram of an example of the communication device of the present application.
  • FIG. 5 is a schematic structural diagram of another example of the communication device of the present application.
  • “/" can indicate that the related objects are in an "or” relationship.
  • A/B can Indicates A or B; “and/or” can be used to describe the existence of three relationships between associated objects.
  • a and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone. Where A and B can be singular or plural.
  • words such as “first” and “second” may be used to distinguish technical features with the same or similar functions. The words “first”, “second” and other words do not limit the quantity and execution order, and the words “first” and “second” do not limit the number and execution order.
  • words such as “exemplary” or “for example” are used to express examples, illustrations or illustrations, and any embodiment or design solution described as “exemplary” or “for example” shall not be interpreted. To be more preferred or advantageous than other embodiments or designs.
  • the use of words such as “exemplary” or “such as” is intended to present related concepts in a concrete manner that is easier to understand.
  • At least one (species) can also be described as one (species) or multiple (species), and the plurality (species) can be two (species), three (species), four (species) ) or more (species), this application does not limit it.
  • the technical solutions of the embodiments of this application can be applied to various communication systems, such as: long term evolution (long term evolution, LTE) system, LTE frequency division duplex (FDD) system, LTE time division duplex (time division duplex) , TDD), fifth generation (5th generation, 5G) communication system, future communication system (such as sixth generation (6th generation, 6G) communication system), wireless fidelity system (wireless fidelity, Wi-Fi), ultra-wideband (Ultra wide band, UWB) system or a system integrating multiple communication systems, etc., are not limited by the embodiments of this application.
  • 5G can also be called new radio (NR).
  • Figure 1 is a schematic diagram of a communication system suitable for embodiments of the present application.
  • a communication system suitable for embodiments of the present application may include at least one central node and at least one participating node, such as participating nodes 1, 2, and N as shown in Figure 1.
  • the central node may provide each participating node with Model parameters.
  • the local data set is used to train the updated model respectively.
  • participating node 1 uses local data set 1 to train the model
  • participating node 2 uses local data set 2 to train the model
  • participating node N uses local data set N to train the model.
  • multiple participating nodes After multiple participating nodes perform model training, they send the gradient information of the loss function obtained during this training to the central node.
  • the central node determines the aggregated gradient information of the gradient information from multiple participating nodes, determines the updated model parameters based on the aggregated gradient information, and notifies each participating node, and each participating node performs the next model training.
  • the participating nodes can use the information processing method provided by this application to compress the gradient information and send it to the central node.
  • the central node can use the information processing method provided by this application to decompress the received compressed gradient information.
  • the central node provided by the embodiment of this application may be a network device, such as a server, a base station, an access point (AP) in a Wi-Fi system, etc.
  • the central node may be a device deployed in the wireless access network that can communicate directly or indirectly with participating nodes.
  • the participating node may be a device with a transceiver function, such as a terminal or a terminal device.
  • the participating node may be a sensor or a device with a data collection function.
  • Participating nodes can be deployed on land, including indoors, outdoors, handheld, and/or vehicle-mounted; they can also be deployed on water (such as ships, etc.); participating nodes can also be deployed in the air (such as aircraft, balloons, and satellites) wait).
  • the participating node may be a user equipment (UE), and the UE includes a handheld device, a vehicle-mounted device, a wearable device or a computing device with wireless communication capabilities.
  • UE user equipment
  • the UE may be a mobile phone, a tablet computer, or a computer with wireless transceiver functions.
  • the terminal device can also be a virtual reality (VR) terminal device, an augmented reality (AR) terminal device, a wireless terminal in industrial control, a wireless terminal in driverless driving, a wireless terminal in telemedicine, or a smart terminal.
  • the technical solution provided by this application can be applied to smart homes to provide customers with personalized services based on their needs.
  • the central node can be a base station or a server, and the participating nodes can be client devices installed in each home.
  • the client device only provides the server with model training based on local data and then synthesizes gradient information through the router, which can protect the privacy of customer data while sharing training result information with the server.
  • the server obtains the aggregated gradient information of the synthetic gradient information provided by multiple client devices, determines the updated model parameters and notifies each client device, continues the training of the intelligent model, and after completing the model training, the client device applies the trained model to the client. Provide personalized service.
  • the technical solution provided by this application can be applied to industrial wireless sensor networks to achieve industrial intelligence.
  • the central node can be a server, and the participating nodes can be multiple sensors in the factory (for example, mobile intelligent robots, etc.).
  • the sensors conduct model training based on local data and then send synthetic gradient information to the server, and the server obtains synthetic gradient information based on the sensors. Aggregate the gradient information of the gradient information, determine the updated model parameters and notify each sensor to continue the training of the intelligent model.
  • the sensor applies the trained model to perform factory tasks.
  • the sensor is a movable intelligent robot, which can be based on The trained model obtains movement routes and completes factory handling tasks, express delivery sorting tasks, etc.
  • AI Artificial intelligence
  • machines to learn and accumulate experience, thereby being able to solve problems that humans can solve through experience, such as natural language understanding, image recognition, and/or chess playing.
  • Training refers to the process of processing a model (or training model). During this process, the model learns to perform a specific task by optimizing the parameters in the model, such as weighting values.
  • the embodiments of this application are applicable to, but are not limited to, one or more of the following training methods: supervised learning, unsupervised learning, reinforcement learning, and transfer learning, etc.
  • Supervised learning uses a set of training samples that have been correctly labeled for training. Among them, having the correct label means that each sample has an expected output value.
  • unsupervised learning refers to a method that automatically classifies or groups input data without being given pre-labeled training samples.
  • Inference refers to using a trained model (the trained model can be called an inference model) to perform data processing. Input the actual data into the inference model for processing and obtain the corresponding inference results.
  • Reasoning can also be called prediction or decision-making, and the reasoning results can also be called prediction results, decision-making results, etc.
  • a distributed AI training method that places the AI algorithm training process on multiple devices instead of aggregating it on one server, which can solve the problem of time-consuming and large communication overhead caused by collecting data during centralized AI training.
  • the process of federated learning may include but is not limited to: the central node sends the model parameters of the AI model to multiple participating nodes.
  • the participating nodes train the AI model based on the local data of the participating nodes, and obtain the gradient information of the loss function after the model training, and send central node.
  • the gradient information fed back by the central node to multiple participating nodes is based on the gradient information fed back by multiple participating nodes. information and update the parameters of the AI model.
  • the central node can send the updated parameters of the AI model to multiple participating nodes, and the participating nodes execute the training of the AI model again.
  • the participating nodes selected by the central node may be the same or different, and this application does not limit this.
  • PCA principal component analysis
  • participating nodes can dynamically track the learning compression kernel information with the gradient information obtained from model training, and feed it back to The central node can obtain the global compression kernel information based on the compression kernel information provided by multiple participating nodes and notify the participating nodes.
  • the participating nodes use PCA to compress the gradient information based on the global compression kernel information. Improve the compression rate of gradient information, and be able to compress complete gradient information, reduce the adverse impact of information compression on training results, and improve the transmission efficiency of gradient information.
  • FIG. 2 is a schematic flow chart of the information processing method provided by the embodiment of the present application.
  • the central node jointly performs the training of the intelligent model with multiple participating nodes.
  • the information processing method shown in Figure 2 is executed by one participating node k among the multiple participating nodes.
  • the participating nodes perform the t-th model training and obtain the first gradient information.
  • t is an integer greater than 1.
  • the participating node k updates the intelligent model based on the model weight of the most recently updated intelligent model of the central node, and performs the t-th model training on the updated intelligent model to obtain the first gradient information.
  • the first gradient information is the gradient information of the loss function obtained after the t-th model training.
  • is the target model parameter of federated learning
  • is a real vector.
  • the first gradient information includes the gradient information of the loss function obtained after the t-th model training. and the residual gradient information that the participating node k did not send to the central node before the t-th model training.
  • first gradient information It can be expressed as follows:
  • this application proposes to include in the feedback gradient information that was not fed back to the center before the t-th model training.
  • the residual gradient information of nodes can improve the convergence speed of federated learning training.
  • the first gradient information is An example is used to illustrate the information processing method provided by this application. It should be understood that this application is not limited to this.
  • the first gradient information may be the gradient information to be compressed after the participating node k performs the t-th model training.
  • the first gradient information is also Can be the above When the first gradient information is When, in the specific implementation, the following can be Replace with For the sake of brevity, no further details will be given here.
  • the participating node k After the participating node k obtains the first gradient information, it compresses the first gradient information based on the latest global compression kernel information (denoted as the second global compression kernel information) updated by the central node (such as the lth time) to obtain the first compression gradient information.
  • the second global compression kernel information is sent by the central node to the participating nodes.
  • the participating nodes provide compressed kernel information to the central node through the information processing method provided by the embodiment shown in Figure 2.
  • the central node can update the global compressed kernel information based on the obtained compressed kernel information from multiple participating nodes and send it to the participating nodes. Used to compress gradient information of participating nodes.
  • the central node obtains the global compression kernel information please refer to the description of the embodiment shown in Figure 3 below.
  • the second global compression kernel information includes second global center vector information ⁇ l and second global main feature matrix information, and the second global main feature information is used to characterize M ⁇ global main feature vectors.
  • the participating node k can obtain the first compressed gradient information based on the deviation between the first gradient information and the second global center vector information ⁇ l and the second global main feature matrix information U l .
  • participating node k can transfer the first gradient information The deviation amount from the second global center vector information ⁇ l Project onto (U l ) T to obtain the first compressed gradient information
  • (x) T represents the transpose of x.
  • R x ⁇ y represents a set of real matrices with dimensions x ⁇ y.
  • the participating node k After the participating node k obtains the first compression gradient information, the participating node k can send the first compression gradient information to the central node. Specifically, participating node k can obtain the transmission coefficient and use the transmission coefficient transmission The participating node k sends the first compression gradient information weighted by the transmission coefficient to the central node, that is,
  • the participating node k can be based on the transmission power gain ⁇ , ⁇ R, and the channel fading between the participating node k and the central node corresponding to the t-th model training. Determine the transmission coefficient like satisfy:
  • the participating nodes can use the truncation threshold ⁇ and the channel fading The relationship between the transmission coefficient The value of , thereby determining whether to send the weighted first compressed gradient information to the central node, where ⁇ R. like satisfy:
  • the transmission coefficient is The participating node k sends the first compression gradient information weighted by the transmission coefficient to the central node, that is, Otherwise, when When , the transmission coefficient is 0, and the participating node k does not send the first compression gradient information to the central node.
  • the participating node k can determine whether to send the first compression gradient information to the central node based on the channel condition. When the channel condition does not meet the truncation threshold, the participating node k does not send the first gradient information to the central node. It can reduce the probability of gradient information transmission errors or transmission failures due to poor channel conditions, and can reduce the power consumption of participating nodes. Participating node k updates residual gradient information Not issued before the t+1th model training Residual gradient information sent to the central node.
  • the central node can obtain the first compression gradient information sent by multiple participating nodes after the t-th model training, as well as the second global center vector information ⁇ l and the second global main feature matrix information U based on the second global compression kernel information. l Decompress the information to obtain the gradient mean information.
  • the central node determines the model parameters ⁇ t of the intelligent model based on the gradient mean information.
  • the central node sends the updated model parameters ⁇ t to the K participating nodes. After updating the model parameters of the intelligent model based on ⁇ t , the K participating nodes perform t+1 model training on the intelligent model after updating the model parameters.
  • the central node receives the compressed gradient information sent from the participating nodes after the t-th model training and decompresses the information to obtain the compressed mean information based on the second global compression kernel information
  • S307 in the embodiment shown in Figure 3.
  • the central node receives the compressed gradient information sent from the participating nodes after the t+i-th model training, and decompresses the information based on the first global compression kernel information to obtain the compressed mean information, such as adding each expression in S307 to Just replace t+i with t, and replace l+1 with l.
  • the participating node k After the t-th model training, the participating node k, in addition to compressing the first gradient information and sending it to the central node as described above, updates the compression kernel information based on the first gradient information in S202.
  • the participating nodes determine the first compression kernel information based on the gradient information obtained after previous model trainings.
  • previous model trainings include the t-th model training and at least one model training before the t-th model training.
  • the previous model training includes the t-th model training and the t-1 model training before the t-th model training, for a total of t model trainings, as an example. It should be understood that this The application is not limited to this.
  • the participating nodes select the t-th model training and the gradient information obtained from at least one model training before the t-th model training to determine the compression kernel information. implementation.
  • the first compression kernel information may include first center vector information and/or first principal feature matrix information.
  • the first compression kernel information may include first center vector information, where the first center vector information is used to represent a mean vector including gradient information obtained from previous model trainings.
  • the participating node k After the t-th model training, the participating node k obtains the mean value of the t gradient information, that is, the first center vector information, based on the t gradient information obtained after the previous t model training.
  • the participating node k can obtain the first center vector information based on t gradient information obtained from t times of model training.
  • the first center vector information Satisfy the following formula:
  • the participating node k determines the first center vector information based on the first gradient information and the second center vector information, where the second center vector information is the center vector information determined after the t-1th model training. .
  • participating node k can be based on the first gradient information obtained after the t-th model training. and the second center vector information determined after the t-1th model training Obtain the first center vector information through the following formula
  • the participating node k uses the gradient information obtained after the t-th model training and the second center vector information determined after the t-1th model training to determine the center vector information after the t-th model training (i.e. First center vector information), it can avoid using all the gradient information obtained from previous model trainings for iterative calculation after each model training, which can reduce storage overhead and improve the efficiency of information processing.
  • the first compression kernel information includes first principal characteristic matrix information of first covariance mean matrix information.
  • the first covariance mean matrix information is used to represent the mean matrix of the covariance matrix of the deviation between the gradient information and the center vector information obtained from previous model trainings.
  • the participating node k After the t-th model training, the participating node k obtains the first covariance mean matrix information based on the t gradient information obtained after the previous t-times of model training.
  • the participating node k can obtain the first covariance mean matrix information based on t gradient information obtained from t times of model training. Satisfy the following formula:
  • the participating node k obtains the deviation between the first gradient information and the first center vector information and the second covariance mean matrix information, and determines the first covariance mean matrix information, where the second covariance mean matrix
  • the information is the covariance mean matrix information obtained after the t-1th model training.
  • participating node k obtains the first gradient information with the first center vector information
  • Participating node k is then based on covariance matrix information and the second covariance mean matrix information Determine the first covariance mean matrix information like satisfy:
  • the covariance mean matrix information obtained after participating node k performs the first model training is the gradient information with center vector information
  • the participating node k determines the covariance mean matrix information this time based on the gradient information obtained after this model training and the covariance mean matrix information obtained after the previous model training, which can avoid All gradient information obtained by using historical model training is iteratively calculated after each model training, which can reduce storage overhead and improve the efficiency of information processing.
  • the first center vector information and the first covariance mean matrix information is calculated using the averaging method, but this application is not limited to this.
  • the first center vector information and the first covariance mean matrix information can also be expressed in a more general way, such as:
  • ⁇ and ⁇ are weighting coefficients
  • ⁇ and ⁇ are floating point numbers less than 1.
  • the participating node k uses eigenvalue decomposition (EVD) to decompose the first covariance mean matrix information to obtain the first covariance mean matrix information.
  • ETD eigenvalue decomposition
  • M ⁇ eigenvectors corresponding to M ⁇ eigenvalues that is optionally, include M ⁇ eigenvectors corresponding to the M ⁇ eigenvalues with the largest amplitude, that is include M ⁇ principal eigenvalues correspond to M ⁇ principal eigenvectors.
  • the participating nodes send the first compressed kernel information to the central node.
  • the central node receives the first compressed kernel information from the participating nodes.
  • the central node may configure a first resource for sending the first compressed kernel information to the participating node, and the participating node sends the first compressed kernel information to the central node on the first resource.
  • the central node can configure multiple participating nodes to all send the first compressed kernel information on the first resource, so that the compressed kernel information sent by the multiple participating nodes can be overlaid on the first resource, and the central node can Overlaid compressed kernel information can be received.
  • the participating nodes periodically send compressed kernel information to the central node with T 1 as a period, where T 1 is an integer greater than 1. Therefore, the participating node sends the first compressed kernel information to the central node, including: when t/T 1 is an integer, the participating node sends the first compressed kernel information to the central node.
  • Participating nodes can update the compression kernel information in the manner described in S202 after each model training. After updating the compressed kernel information every T 1 times, the participating nodes send the latest updated compressed kernel information to the central node, so that the central node determines the global compressed kernel information based on the first compressed kernel information received from multiple participating nodes, And sent to the participating nodes, so that the participating nodes can compress the gradient information to be sent based on the global compression kernel information. It can improve the compression rate of gradient information.
  • each time a participating node updates the compression kernel information it sends updated compression kernel information to the central node, so that the central node determines the global compression kernel based on the received first compression kernel information of multiple participating nodes.
  • the information is sent to the participating nodes so that the participating nodes can compress the gradient information to be sent based on the global compression kernel information. It can improve the compression rate of gradient information.
  • the specific way in which the central node determines the first global compression kernel information can refer to the description in the embodiment shown in FIG. 3 .
  • the participating node k After the participating node k obtains the first compressed kernel information, the participating node k can send the first compressed kernel information to the central node. Specifically, the participating node k can obtain the transmission coefficient and use the transmission coefficient to transmit the first compression kernel information.
  • the participating node k sends the first center vector information weighted by the transmission coefficient to the central node, that is, satisfy:
  • the participating node k can be based on the truncation threshold ⁇ and the channel fading
  • the relationship between the transmission coefficient value to determine whether to send the weighted first center vector information to the central node satisfy:
  • the participating node k when When , the participating node k sends the first center vector information weighted by the transmission coefficient to the central node; otherwise, the first center vector information is not sent.
  • the participating node k sends the first main characteristic matrix information weighted by the transmission coefficient to the central node, that is, satisfy:
  • the participating node k can be based on the truncation threshold ⁇ and the channel fading
  • the relationship between the transmission coefficient value to determine whether to send the weighted first principal feature matrix information to the central node satisfy:
  • the participating node k when When , the participating node k sends the first main characteristic matrix information weighted by the transmission coefficient to the central node; otherwise, the first main characteristic matrix information is not sent.
  • the participating nodes receive the first information from the central node.
  • the first information is used to indicate the first global compression kernel information.
  • the first global compression kernel information is used to compress the gradient information obtained after model training.
  • the participating nodes After receiving the first information from the central node, the participating nodes compress the gradient information to be sent based on the first global compression kernel information to obtain compressed gradient information.
  • the above-mentioned second global compression kernel information used to compress the first gradient information is the global compression kernel information updated by the central node for the lth time, then the first global compression kernel information is the global compression kernel information updated by the central node for the l+1th time.
  • the first global compression kernel information includes first global center vector information ⁇ l+1 and first global main feature matrix information U l+1 .
  • the participating node k after the participating node k receives the first information and determines that the central node updates the global compression kernel to the first global compression kernel information, the participating node k performs the t+i-th model training to obtain the second gradient information.
  • i is a positive integer.
  • the participating node k compresses the second gradient information based on the first global compression kernel information to obtain the second compressed gradient information. satisfy:
  • the compressed kernel information includes first center vector information and/or first principal characteristic matrix information
  • first center vector information and/or first principal characteristic matrix information
  • the participating nodes periodically update the compressed kernel information with a period of T 1. After the participating nodes obtain the compressed kernel information updated t+i times, if (t+i)/T 1 is not an integer, then The participating nodes do not send the compressed kernel information updated for the t+i times to the central node; if (t+i)/T 1 is an integer, the participating nodes send the compressed kernel information updated for the t+i times to the central node.
  • each time a participating node updates the compressed kernel information it sends updated compressed kernel information to the central node, and then the participating node sends the compressed kernel information to the central node.
  • a gradient information compression method suitable for federated learning is proposed by utilizing the correlation between gradient information obtained by model training.
  • Participating nodes dynamically track and learn compression kernel information as the model trains, and feed it back to the central node, so that the central node obtains global compression kernel information based on the compression kernel information provided by multiple participating nodes and notifies the participating nodes.
  • the participating nodes are based on the global compression kernel information.
  • Use PCA to compress gradient information. It can improve the compression rate of gradient information on the basis of reducing the adverse impact of information compression on training results. Improve the transmission efficiency of gradient information.
  • FIG 3 is a schematic flow chart of the information processing method 300 provided by the embodiment of the present application.
  • the central node and K participating nodes perform joint training of the intelligent model, where K is an integer greater than or equal to 2.
  • the K participating nodes include but are not limited to participating node 1 and participating node 2 shown in Figure 3. If the K participating nodes also include other participating nodes shown in Figure 3, you can refer to the participating nodes shown in Figure 3. implementation.
  • Participating node 1 and participating node 2 send compressed kernel information to the central node respectively.
  • participating node 1 obtains the gradient information.
  • Participating node 2 gets gradient information Participating node 1 gets gradient information Based on the gradient information obtained from previous model trainings, the compression kernel information is determined.
  • the compression kernel information may include center vector information. and/or principal feature matrix information Participating node 2 gets gradient information Based on the gradient information obtained from previous model trainings, the compression kernel information is determined.
  • the compression kernel information may include center vector information. and/or principal feature matrix information
  • the participating nodes may send the compressed kernel information to the central node after each update of the compressed kernel information.
  • participating node 1 and participating node 2 respectively send the compressed kernel information updated after the t-th model training to the central node.
  • the participating nodes periodically send compressed kernel information to the central node with T 1 as a period, and t/T 1 is an integer, then participating node 1 and participating node 2 respectively send the updated model after the t-th model training to the central node. Compressed kernel information. This application does not limit this.
  • both participating node 1 and participating node 2 send their respective obtained compressed kernel information to the central node on the first resource.
  • the first resource may be pre-configured by the central node to the participating nodes.
  • the central node receives the compressed kernel information from multiple participating nodes and obtains the first global compressed kernel information.
  • the plurality of participating nodes include participating node 1 and participating node 2.
  • multiple participating nodes all send their respective compressed kernel information on the first resource
  • the central node receives the compressed kernel information from the multiple participating nodes on the first resource to obtain aggregate compressed kernel information
  • the aggregated compressed kernel information is the compressed kernel information obtained by superimposing the compressed kernel information of the plurality of participating nodes on the first resource.
  • the multiple participating nodes all send compressed kernel information on the first resource, so that the compressed kernel information of the multiple participating nodes is superimposed in the wireless channel (or on the air interface of the central node), so that the central node Aggregated compressed kernel information obtained by superimposing multiple compressed kernel information is received on the first resource.
  • the central node After the central node obtains the aggregated compression kernel information, it obtains the first global compression kernel information based on the aggregated compression kernel information.
  • the first global compression kernel information includes the first global center vector information ⁇ l+1 and the first global main feature matrix information U l+1 .
  • the first global compression kernel information is the global compression obtained by the l+1th update of the central node. Kernel information.
  • the aggregated compression kernel information obtained by the participating nodes includes aggregated center vector information y t, ⁇ .
  • the aggregated center vector information yt , ⁇ is the center vector after the center vector information of multiple participating nodes is superimposed on the first resource and has undergone channel propagation. information.
  • C x represents a set of complex vectors with dimension x.
  • the central node determines the first global center vector information ⁇ t+1 based on the aggregate center vector information y t, ⁇ received on the first resource.
  • Re(x) means taking the real part of x.
  • the central node and the participating nodes can reach a consensus on whether the participating nodes send center vector information through information exchange, then the central node can determine the number K ⁇ of participating nodes that have sent center vector information among the K participating nodes, and the center The node can determine the first global center vector ⁇ l+1 based on K ⁇ , and ⁇ l+1 satisfies:
  • the participating nodes send transmission coefficient information to the central node before sending the compressed kernel information.
  • the transmission coefficient is used to indicate the transmission coefficient. Whether it is 0.
  • the central node can determine that the transmission coefficients of K ⁇ participating nodes are not 0 based on the transmission coefficient information of K participating nodes, that is, the K ⁇ participating nodes have sent center vector information.
  • the central node can determine the first global center vector information ⁇ l+1 according to K ⁇ .
  • channel fading It is estimated based on the reference signal transmitted between the central node and participating nodes.
  • the central node and participating nodes can reach a consensus on channel fading through information exchange. Both participating nodes and central nodes are based on the truncation threshold ⁇ and channel fading.
  • the relationship between the transmission coefficient The value of allows the central node to determine the number K ⁇ of participating nodes that have sent central vector information based on the transmission coefficient of each participating node.
  • the central node can send a reference signal to the participating nodes, and the participating nodes estimate the channel fading based on the reference signal. Based on the reciprocity of the channel, the participating nodes can use the channel fading as The participating nodes can feed back the channel status information to the central node, which is used to notify the central node of the channel fading estimated by the participating node. Make the channel fading between the central node and participating nodes Reach a consensus.
  • the participating nodes can send a reference signal to the central node, and the central node estimates the channel fading corresponding to the channel in the direction from the participating node to the central node based on the reference signal. The central node sends channel status information to the participating nodes to notify the participating nodes of the estimated channel fading. Make the channel fading between the central node and participating nodes Reach a consensus.
  • Both participating nodes and central nodes are based on the truncation threshold ⁇ and channel fading.
  • the relationship between the transmission coefficient value. Therefore, the central node can determine that the K ⁇ participating nodes that have sent the center vector information have sent the center vector information based on the transmission coefficients of the K participating nodes, where,
  • the central node determines the first global center vector information ⁇ l+1 based on the aggregated center vector information y t, ⁇ , and ⁇ l+1 satisfies:
  • the above describes how the central node obtains the global center vector information based on the received compressed kernel information when the compressed kernel information sent by the participating nodes includes center vector information.
  • the compressed kernel information sent by participating nodes may not include center vector information.
  • the compressed kernel information only includes main feature matrix information.
  • the central node can determine the center vector information corresponding to the participating node k based on the gradient information obtained from the participating node k after performing previous model trainings on the participating node k.
  • the specific central node determines the center vector information
  • the central node can calculate the mean vector of the center vector information of multiple participating nodes to obtain the first global center vector information ⁇ l+1 . This application does not limit this.
  • the core information includes aggregated main feature matrix information y t,U , which is the main feature matrix information after the main feature matrix information of multiple participating nodes is superimposed on the first resource and has undergone channel propagation.
  • C x ⁇ y represents a set of complex matrices with dimensions x ⁇ y.
  • the central node determines the first global main feature matrix information U l+1 based on the aggregated main feature matrix information y t,U , where the first global main feature matrix information is the global main feature obtained by the l+1th update of the central node matrix information.
  • the first global main feature matrix information U l+1 is used to represent M ⁇ global main feature vectors.
  • the central node and the participating nodes can reach a consensus on whether the participating nodes send the main feature matrix information through information exchange, and then the central node can determine the number K U of participating nodes among the K participating nodes that have sent the main feature matrix information. , the central node can determine the first global main feature matrix information U l+1 based on K U , U l+1 satisfies:
  • the above describes how the central node obtains the global main feature matrix information based on the received compressed kernel information when the compressed kernel information sent by the participating nodes includes main feature matrix information.
  • the compressed kernel information sent by the participating nodes may not include main feature matrix information.
  • the compressed kernel information only includes center vector information.
  • the central node can determine the main feature matrix information corresponding to the participating node k based on the gradient information obtained from the participating node k after performing previous model trainings on the participating node k.
  • the specific central node determines the main feature matrix information
  • the central node determines the main feature matrix information for a method, reference can be made to the method for the participating node k to determine the main feature matrix information in the embodiment shown in Figure 2, which will not be described again here.
  • the central node determines the main feature matrix information of each participating node
  • the central node can calculate the mean matrix of the main feature matrix information of multiple participating nodes, and obtain the first global main feature matrix information U l+1 . This application does not limit this.
  • the central node receives the compressed kernel information from the multiple participating nodes respectively, and the central node superimposes and aggregates the compressed kernel information of the multiple participating nodes to obtain aggregated compressed kernel information. The central node then compresses the kernel information based on the aggregated information to obtain the first global compressed kernel information.
  • the above-mentioned method of all participating nodes sending compressed kernel information on the first resource may not be adopted.
  • the participating nodes may send the compressed kernel information to the central node on different resources, and the central node will perform overlay and aggregation.
  • the aggregated compression kernel information is obtained, and then the first global compression kernel information is obtained based on the aggregated compression kernel information.
  • the central node sends first information, which is used to indicate the first global compression kernel information.
  • the central node may broadcast the first information, or may send the first information to K participating nodes respectively, which is not limited in this application.
  • K participating nodes receive the first global compression kernel information from the central node. After receiving the first global compression kernel information, the participating nodes compress the gradient information based on the first global compression kernel information. Before the participating node receives the l+1th updated first global compressed kernel information (including the first global center vector information ⁇ l+1 and the first global main feature matrix information U l+1 ) of the central node, the participating node Based on the l-th updated second global compressed kernel information from the central node (including the second global center vector information ⁇ l and the second global main feature matrix information U l ) compresses the gradient information obtained after model training. Please refer to the description in the embodiment shown in Figure 2 . Before sending the first global compression kernel information, the central node decompresses the received compression gradient information based on the most recently sent global compression kernel information.
  • the central node before the central node sends the first global compression kernel information, it receives the first compression gradient information from the participating nodes (that is, the compression gradient information obtained after the t-th model training), then the central node based on the second global compression kernel information
  • the second global center vector information ⁇ l and the second global main feature matrix information U l in decompress the first compressed gradient information.
  • the central node decompresses the compressed gradient information obtained from the t+i-th model training from the participating nodes based on the first global compression kernel information, that is, t+i in the corresponding formula is replaced by t can be.
  • Participating node 1 and participating node 2 respectively perform the t+i-th model training to obtain gradient information.
  • Participating node 1 performs the t+ith model training and obtains gradient information. Participating node 2 performs the t+ith model training and obtains gradient information.
  • Participating node 1 and participating node 2 respectively compress the gradient information based on the first global compression kernel information to obtain compressed gradient information.
  • Participating node 1 pairs the gradient information based on the first global center vector information ⁇ l+1 and the first global main feature matrix information U l+1 in the first global compression kernel information. Compress to obtain compressed gradient information In the same way, participating node 2 can obtain the compressed gradient information
  • Participating node 1 and participating node 2 can send compressed gradient information to the central node after obtaining the compressed gradient information, or participating node 1 and participating node 2 can determine whether to send compressed gradient information to the central node based on the truncation threshold and respective channel fading.
  • multiple participating nodes all send compression gradient information to the central node on the second resource, and the central node receives the compression gradient information from multiple participating nodes on the second resource to obtain aggregated compression gradient information.
  • the compressed gradient information is the compressed gradient information obtained by superimposing the compressed gradient information of multiple participating nodes on the second resource.
  • the central node is based on the mean value of compressed gradient information based on y t+i Estimation is performed to obtain the compressed gradient mean information.
  • the central node can estimate the compressed gradient mean information based on the least square (LS) method. satisfy:
  • ⁇ a ⁇ 2 represents the l 2 norm (l 2 -norm) of a.
  • the central node and the participating nodes can reach a consensus on whether the participating nodes send compressed gradient information through information exchange. Then the central node can determine the number K s of participating nodes that have sent compressed gradient information among the K participating nodes.
  • the center Nodes can estimate compressed gradient mean information based on K s satisfy:
  • the central node obtains the compressed gradient mean information Finally, based on the first global center vector information ⁇ l+1 and the first global main feature matrix information U l+1 , the compressed gradient mean information is decompressed to obtain the estimated gradient mean information ⁇ t+i :
  • the central node sends the updated model parameters to K participating nodes. After updating the model parameters of the intelligent model based on ⁇ t+i+1 , the K participating nodes perform model training t+i+1 times.
  • multiple participating nodes send compression gradient information on different resources respectively
  • the central node receives the compression gradient information from the multiple participating nodes on different resources respectively, and compresses the gradient information based on the first global compression gradient information. After decompressing the obtained compressed gradient information respectively, the decompressed gradient information is obtained. The central node then averages the decompressed gradient information corresponding to multiple participating nodes to obtain the gradient mean information. The central node then determines the model parameters ⁇ t+i+1 of the intelligent model based on the mean gradient information and sends them to the K participating nodes.
  • a gradient information compression method suitable for federated learning is proposed by utilizing the correlation between gradient information obtained by model training.
  • Participating nodes dynamically track and learn compression kernel information as the model trains, and feed it back to the central node.
  • the central node Based on the compression kernel information provided by multiple participating nodes, the central node obtains the global compression kernel information and notifies the participating nodes.
  • the participating nodes utilize the global compression kernel information based on PCA compresses gradient information. It can improve the compression rate of gradient information and improve the transmission efficiency of gradient information on the basis of reducing the adverse impact of information compression on training results.
  • each network element may include a hardware structure and/or a software module to implement the above functions in the form of a hardware structure, a software module, or a hardware structure plus a software module. Whether one of the above functions is performed as a hardware structure, a software module, or a hardware structure plus a software module depends on the specific application and design constraints of the technical solution.
  • Figure 4 is a schematic block diagram of a communication device provided by an embodiment of the present application.
  • the communication device 400 may include a processing unit 410 and a transceiver unit 420 .
  • the communication device 400 may correspond to the participating node in the above method embodiment, or be configured in (or used for) a chip in the participating node, or other capable of implementing the method executed by the participating node.
  • the communication device 400 may correspond to a participating node in the method of the embodiment of the present application, and the communication device 400 may include various units in the first device for executing the methods shown in FIG. 2 and FIG. 3 . Moreover, each unit in the communication device 400 and the above-mentioned other operations and/or functions are respectively intended to implement the corresponding processes of the methods shown in FIG. 2 and FIG. 3 .
  • the transceiver unit 420 in the communication device 400 may be an input/output interface or circuit of the chip, and the processing in the communication device 400 Unit 410 may be a logic circuit in a chip.
  • the communication device 400 may correspond to the central node in the above method embodiment, for example, or a chip configured in (or used for) the central node, or other devices that can enable the central node to execute The device, module, circuit or unit of the method.
  • the communication device 400 may correspond to the central node in the methods shown in FIGS. 2 and 3 , and the communication device 400 may include various units of the central node for executing the methods shown in FIGS. 2 and 3 . Moreover, each unit in the communication device 400 and the above-mentioned other operations and/or functions are respectively intended to implement the corresponding processes of the methods shown in FIG. 2 and FIG. 3 .
  • the transceiver unit 420 in the communication device 400 may be an input/output interface or circuit of the chip, and the processing in the communication device 400 Unit 410 may be a logic circuit in a chip.
  • the communication device 400 may also include a storage unit 430, which may be used to store instructions or data, and the processing unit 410 may execute the instructions or data stored in the storage unit. data, so that the communication device can implement corresponding operations.
  • the transceiver unit 420 in the communication device 400 can be implemented through a communication interface (such as a transceiver or an input/output interface), and can, for example, correspond to the transceiver 510 in the communication device 500 shown in FIG. 5 .
  • the processing unit 410 in the communication device 400 may be implemented by at least one processor, for example, may correspond to the processor 520 in the communication device 500 shown in FIG. 5 .
  • the processing unit 410 in the communication device 400 can also be implemented by at least one logic circuit.
  • the storage unit 430 in the communication device 400 may correspond to the memory in the communication device 500 shown in FIG. 5 .
  • FIG. 5 is a schematic structural diagram of a communication device 500 provided by an embodiment of the present application.
  • the communication device 500 may correspond to a participating node in the above method embodiment.
  • the participating node 500 includes a processor 520 and a transceiver 510 .
  • the participating node 500 further includes a memory.
  • the processor 520, the transceiver 510 and the memory can communicate with each other through internal connection paths to transmit control and/or data signals.
  • the memory is used to store computer programs, and the processor 520 is used to execute the computer program in the memory to control the transceiver 510 to send and receive signals.
  • the communication device 500 shown in Figure 5 can implement the processes involving participating nodes in the method embodiments shown in Figures 2 and 3.
  • the operations and/or functions of each module in the participating node 500 are respectively intended to implement the corresponding processes in the above method embodiments.
  • the communication device 500 may correspond to the central node in the above method embodiment.
  • the central node 500 includes a processor 520 and a transceiver 510 .
  • the central node 500 also includes a memory.
  • the processor 520, the transceiver 510 and the memory can communicate with each other through internal connection paths to transmit control and/or data signals.
  • the memory is used to store computer programs, and the processor 520 is used to execute the computer program in the memory to control the transceiver 510 to send and receive signals.
  • the communication device 500 shown in Figure 5 can implement the processes involving the central node in the method embodiments shown in Figures 2 and 3.
  • the operations and/or functions of each module in the central node 500 are respectively intended to implement the corresponding processes in the above method embodiments.
  • the above-mentioned processor 520 and the memory can be combined into one processing device, and the processor 520 is used to execute the program code stored in the memory to implement the above functions.
  • the memory may also be integrated in the processor 520 or independent of the processor 520 .
  • the processor 520 may correspond to the processing unit in FIG. 4 .
  • the above-mentioned transceiver 510 may correspond to the transceiver unit in FIG. 4 .
  • the transceiver 510 may include a receiver (or receiver, receiving circuit) and a transmitter (or transmitter, transmitting circuit). Among them, the receiver is used to receive signals, and the transmitter is used to transmit signals.
  • the communication device 500 shown in Figure 5 can implement the processes involving terminal equipment in the method embodiments shown in Figures 2 and 3.
  • the operations and/or functions of each module in the terminal device 500 are respectively intended to implement the corresponding processes in the above method embodiments.
  • An embodiment of the present application also provides a processing device, including a processor and a (communication) interface; the processor is configured to execute the method in any of the above method embodiments.
  • the processing device may be one or more chips.
  • the processing device can be a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), a system on chip (SoC), or It can be a central processor unit (CPU), a network processor (NP), or It can be a digital signal processor (DSP), a microcontroller unit (MCU), a programmable logic device (PLD) or other integrated chips.
  • FPGA field programmable gate array
  • ASIC application specific integrated circuit
  • SoC system on chip
  • CPU central processor unit
  • NP network processor
  • DSP digital signal processor
  • MCU microcontroller unit
  • PLD programmable logic device
  • the present application also provides a computer program product.
  • the computer program product includes: computer program code.
  • the computer program code When the computer program code is executed by one or more processors, the computer program code includes the processor.
  • the device executes the method in the embodiment shown in FIG. 2 and FIG. 3 .
  • the technical solutions provided by the embodiments of this application can be implemented in whole or in part through software, hardware, firmware, or any combination thereof.
  • software When implemented using software, it may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions.
  • the computer program instructions When the computer program instructions are loaded and executed on a computer, the processes or functions described in accordance with the embodiments of the present invention are generated in whole or in part.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, a network device, a terminal device, a core network device, a machine learning device, or other programmable devices.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another, e.g., the computer instructions may be transferred from a website, computer, server, or data center Transmission to another website, computer, server or data center through wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) means.
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more available media integrated.
  • the available media may be magnetic media (eg, floppy disk, hard disk, tape), optical media (eg, digital video disc (digital video disc, DVD)), or semiconductor media, etc.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores program code.
  • the process includes The device of the device performs the method in the embodiment shown in Figures 2 and 3.
  • this application also provides a system, which includes one or more participating nodes mentioned above.
  • the system may further include one or more central nodes as mentioned above.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Bioethics (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Hardware Design (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

本申请提供了一种信息处理方法和通信装置。该方法包括:参与节点执行第t次模型训练得到第一梯度信息后,基于历次模型训练得到的梯度信息,确定第一压缩内核信息。其中,该历次模型训练包括该第t次模型训练和该第t次模型训练之前的至少一次模型训练,该历次模型训练得到的梯度信息包括该第一梯度信息。参与节点向中心节点发送该第一压缩内核信息,再接收来自该中心节点的第一信息,该第一信息用于指示第一全局压缩内核信息,该第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。能够提高梯度信息的压缩率,提高梯度信息的传输效率。

Description

信息处理方法和通信装置
本申请要求于2022年03月25日提交中国专利局、申请号为202210303188.1、申请名称为“信息处理方法和通信装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及通信领域,并且更具体地,涉及一种信息处理方法和通信装置。
背景技术
人工智能(artificial intelligence,AI)是未来无线通信网络(如物联网)中的一类非常重要的应用。其中,联邦学习(federated learning,FL)是一种分布式智能模型训练方法,由中心节点为联邦学习的多个参与节点提供智能模型的模型参数,由多个参与节点各自基于各自的数据集执行智能模型训练后将损失函数的梯度信息反馈给中心节点,中心节点基于多个参与节点反馈的梯度信息更新模型参数。能够解决集中式模型训练时收集数据导致的耗时和通信开销问题。同时,由于不需要传输训练数据,也能够减少隐私安全问题。
为了减小联邦学习中损失函数的梯度向量传输开销,需要对梯度信息进行压缩后传输,以降低通信成本。梯度稀疏化是一种通过只发送超过一定阈值的梯度向量元素的压缩方法,但由于梯度稀疏化仅选择了梯度向量中的部分元素,这使得每次训练得到的梯度向量均不能够完整地被传输,导致最终训练得到的智能模型的推理性能较差。目前,还缺少有效的梯度压缩机制。
发明内容
本申请提供了一种信息处理方法和通信装置,提高梯度信息的压缩率,提高梯度信息的传输效率。
第一方面,提供了一种信息处理方法,该方法包括:执行第t次模型训练,得到第一梯度信息,t为大于1的整数。以及,基于历次模型训练得到的梯度信息,确定第一压缩内核信息,该历次模型训练包括该第t次模型训练和该第t次模型训练之前的至少一次模型训练,该历次模型训练得到的梯度信息包括该第一梯度信息。向中心节点发送该第一压缩内核信息后,再接收来自该中心节点的第一信息,该第一信息用于指示第一全局压缩内核信息,该第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
根据上述方案,利用模型训练得到的梯度信息之间的相关性,参与节点基于模型训练得到的梯度信息,随着模型训练动态跟踪学习压缩内核信息,并反馈给中心节点,以便中心节点基于多个参与节点提供的压缩内核信息,得到全局压缩内核信息并通知参与节点,参与节点基于全局压缩内核信息对梯度信息进行压缩。能够在减小信息压缩对训练结果产生的不利影响的基础上,提高梯度信息的压缩率,提高梯度信息的传输效率。
结合第一方面,在第一方面的某些实施方式中,该基于历次模型训练得到的梯度信息,确定第一压缩内核信息,包括:基于该第一梯度信息和第二中心向量信息,确定第一中心向量信息,该第一中心向量信息用于表征该历次模型训练得到的梯度信息的均值向量,该 第一压缩内核信息包括该第一中心向量信息,该第二中心向量信息是第t-1次模型训练后确定的中心向量信息。
根据上述方案,参与节点通过第t次模型训练后得到的梯度信息和第t-1次模型训练后确定的第二中心向量信息,确定第t次模型训练后的中心向量信息(即第一中心向量信息),能够避免每次模型训练后使用历次模型训练得到的所有梯度信息在每次模型训练后进行迭代计算,能够减小存储开销,以及提高信息处理的效率。
结合第一方面,在第一方面的某些实施方式中,该基于历次模型训练得到的梯度信息,确定第一压缩内核信息,还包括:获取该第一梯度信息与第一中心向量信息之间的偏差量的协方差矩阵信息;基于该协方差矩阵信息和第二协方差均值矩阵信息,确定第一协方差均值矩阵信息,该第一协方差均值矩阵信息用于表征该历次模型训练后获取到的偏差量的协方差矩阵信息的均值矩阵,该第二协方差均值信息为第t-1次模型训练后确定的协方差均值矩阵信息;获取该第一协方差均值矩阵信息的第一主特征矩阵信息,该第一主特征矩阵信息用于表征该第一协方差均值矩阵信息的M个特征值对应的特征向量,该第一压缩内核信息包括该第一主特征矩阵信息,M为大于1的整数。
根据上述方案,参与节点基于第t次模型训练后得到的梯度信息和前一次模型训练后得到的协方差均值矩阵信息,确定本次模型训练后的协方差均值矩阵信息,能够避免每次模型训练后使用历史模型训练得到的所有梯度信息在每次模型训练后进行迭代计算,能够减小存储开销,以及提高信息处理的效率
结合第一方面,在第一方面的某些实施方式中,该方法还包括:基于该第一全局压缩内核信息对第二梯度信息进行压缩,得到第二压缩梯度信息,该第二梯度信息为第t+i次模型训练后得到的梯度信息,i为正整数;发送该第二压缩梯度信息。
结合第一方面,在第一方面的某些实施方式中,该第一全局压缩内核信息包括全局中心向量信息和全局主特征矩阵信息,该基于该第一全局压缩内核信息对该第二梯度信息进行压缩,得到第二压缩梯度信息,包括:基于该第二梯度信息与该全局中心向量信息之间的偏差量以及该全局主特征矩阵信息,得到该第二压缩梯度信息。
根据上述方案,基于第一全局压缩梯度信息采用PCA对模型训练后得到的梯度信息进行压缩,能够在减小信息压缩对训练结果产生的不利影响的基础上,提高梯度信息的压缩率,提高梯度信息的传输效率。
结合第一方面,在第一方面的某些实施方式中,在接收来自该中心节点的该第一信息之前,该方法还包括:基于第二全局压缩内核信息对该第一梯度信息进行压缩,得到第一压缩梯度信息,该第二全局压缩内核信息是从中心节点获取到的。
结合第一方面,在第一方面的某些实施方式中,该向中心节点发送该第一压缩内核信息,包括:在第一资源上向该中心节点发送该第一压缩内核信息。
结合第一方面,在第一方面的某些实施方式中,该向中心节点发送该第一压缩内核信息,包括:当t/T1为整数时,向该中心节点发送第一压缩内核信息,其中,T1为大于1的整数。
根据上述方案,参与节点在每次模型训练后更新第一压缩内核信息,且每T1次更新压缩内核信息后,向中心节点发送最近一次更新后的压缩内核信息,能够减小每次更新压缩内核信息后均发送给中心节点带来的传输开销。
第二方面,提供了一种信息处理方法,该方法包括:首先,接收来自多个参与节点的 第一压缩内核信息,得到第一全局压缩内核信息。其次,发送第一信息,该第一信息用于指示第一全局压缩内核信息,该第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
结合第二方面,在第二方面的某些实施方式中,该接收来自多个参与节点的第一压缩内核信息,得到第一全局压缩内核信息,包括:在第一资源上接收来自该多个参与节点的第一压缩内核信息,得到聚合压缩内核信息,该聚合压缩内核信息是该多个参与节点的第一压缩内核信息在该第一资源上叠加后的压缩内核信息。以及,根据该聚合压缩内核信息,得到该第一全局压缩内核信息。
结合第二方面,在第二方面的某些实施方式中,该方法还包括:接收来自多个参与节点的压缩梯度信息。以及,根据该多个参数节点的压缩梯度信息,得到压缩梯度均值信息,该压缩梯度均值信息用于表征该多个参数节点的压缩梯度信息的均值向量。再基于该第一全局压缩内核信息,对该压缩梯度均值信息解压缩,得到解压后的梯度均值信息。最后,基于该梯度均值信息,更新智能模型的模型参数。
结合第二方面,在第二方面的某些实施方式中,该第一压缩内核信息包括第一中心向量信息,该第一全局压缩内核信息包括全局中心向量信息,该全局中心向量信息用于表征该多个参与节点的该第一中心向量信息的均值向量。该第一压缩内核信息包括第一主特征矩阵信息,该第一全局压缩内核信息包括全局主特征矩阵信息,该全局主特征矩阵信息用于表征M个全局主特征向量,M为大于1的整数。
结合第二方面,在第二方面的某些实施方式中,该基于该第一全局压缩内核信息,对该压缩梯度均值信息解压缩,得到解压后的梯度均值信息,包括:基于该全局中心向量信息和该全局主特征矩阵信息,对该压缩梯度均值信息解压缩,得到压缩后的梯度均值信息。
第三方面,提供了一种通信装置,包括:处理单元,用于执行第t次模型训练,得到第一梯度信息,t为大于1的整数;该处理单元还用于基于历次模型训练得到的梯度信息,确定第一压缩内核信息,该历次模型训练包括该第t次模型训练和该第t次模型训练之前的至少一次模型训练,该历次模型训练得到的梯度信息包括该第一梯度信息;收发单元,用于向中心节点发送该第一压缩内核信息;该收发单元还用于接收来自该中心节点的第一信息,该第一信息用于指示第一全局压缩内核信息,该第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
结合第三方面,在第三方面的某些实施方式中,该处理单元具体用于基于该第一梯度信息和第二中心向量信息,确定第一中心向量信息,该第一中心向量信息用于表征该历次模型训练得到的梯度信息的均值向量,该第一压缩内核信息包括该第一中心向量信息,该第二中心向量信息是第t-1次模型训练后确定的中心向量信息。
结合第三方面,在第三方面的某些实施方式中,该处理单元具体用于获取该第一梯度信息与第一中心向量信息之间的偏差量的协方差矩阵信息。以及,基于该协方差矩阵信息和第二协方差均值矩阵信息,确定第一协方差均值矩阵信息,该第一协方差均值矩阵信息用于表征该历次模型训练后获取到的偏差量的协方差矩阵信息的均值矩阵,该第二协方差均值信息为第t-1次模型训练后确定的协方差均值矩阵信息。再获取该第一协方差均值矩阵信息的第一主特征矩阵信息,该第一主特征矩阵信息用于表征该第一协方差均值矩阵信息的M个特征值对应的特征向量,该第一压缩内核信息包括该第一主特征矩阵信息,M为大于1的整数。
结合第三方面,在第三方面的某些实施方式中,该处理单元还用于基于该第一全局压缩内核信息对第二梯度信息进行压缩,得到第二压缩梯度信息,该第二梯度信息为第t+i次模型训练后得到的梯度信息,i为正整数。该收发单元还用于发送该第二压缩梯度信息。
结合第三方面,在第三方面的某些实施方式中,该第一全局压缩内核信息包括全局中心向量信息和全局主特征矩阵信息,该处理单元还用于基于该第二梯度信息与该全局中心向量信息之间的偏差量以及该全局主特征矩阵信息,得到该第二压缩梯度信息。
结合第三方面,在第三方面的某些实施方式中,该处理单元还用于在接收到来自该中心节点的该第一信息之前,基于第二全局压缩内核信息对该第一梯度信息进行压缩,得到第一压缩梯度信息,该第二全局压缩内核信息是从中心节点获取到的。
结合第三方面,在第三方面的某些实施方式中,该收发单元具体用于在第一资源上向该中心节点发送该第一压缩内核信息。
结合第三方面,在第三方面的某些实施方式中,该收发单元具体用于在t/T1为整数的情况下,向该中心节点发送第一压缩内核信息,其中,T1为大于1的整数。
第四方面,提供了一种通信装置,包括:处理单元,用于根据接收到的来自多个参与节点的第一压缩内核信息,得到第一全局压缩内核信息。该收发单元还用于发送第一信息,该第一信息用于指示第一全局压缩内核信息,该第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
结合第四方面,在第四方面的某些实施方式中,该收发单元还用于在第一资源上接收来自该多个参与节点的第一压缩内核信息,得到聚合压缩内核信息,该聚合压缩内核信息是该多个参与节点的第一压缩内核信息在该第一资源上叠加后的压缩内核信息。该处理单元具体用于根据该聚合压缩内核信息,得到该第一全局压缩内核信息。
结合第四方面,在第四方面的某些实施方式中,该收发单元还用于接收来自多个参与节点的压缩梯度信息;以及,该处理单元还用于根据该多个参数节点的压缩梯度信息,得到压缩梯度均值信息,该压缩梯度均值信息用于表征该多个参数节点的压缩梯度信息的均值向量。以及,该处理单元还用于基于该第一全局压缩内核信息,对该压缩梯度均值信息解压缩,得到解压后的梯度均值信息,再基于该梯度均值信息,更新智能模型的模型参数。
结合第四方面,在第四方面的某些实施方式中,该第一压缩内核信息包括第一中心向量信息,该第一全局压缩内核信息包括全局中心向量信息,该全局中心向量信息用于表征该多个参与节点的该第一中心向量信息的均值向量。该第一压缩内核信息包括第一主特征矩阵信息,该第一全局压缩内核信息包括全局主特征矩阵信息,该全局主特征矩阵信息用于表征M个全局主特征向量,M为大于1的整数。
结合第四方面,在第四方面的某些实施方式中,该处理单元具体用于基于该全局中心向量信息和该全局主特征矩阵信息,对该压缩梯度均值信息解压缩,得到压缩后的梯度均值信息。
第五方面,提供了一种通信装置,包括处理器。该处理器可以实现上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法。
可选地,该通信装置还包括存储器,该处理器与该存储器耦合,可用于执行存储器中的指令,以实现上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法。
可选地,该通信装置还包括通信接口,处理器与通信接口耦合。本申请实施例中,通信接口可以是收发器、管脚、电路、总线、模块或其它类型的通信接口,本申请对此不作限定。
在一种实现方式中,该通信装置为通信设备。当该通信装置为通信设备时,该通信接口可以是收发器,或,输入/输出接口。
在另一种实现方式中,该通信装置为配置于通信设备中的芯片。当该通信装置为配置于通信设备中的芯片时,该通信接口可以是输入/输出接口,该处理器可以是逻辑电路。
可选地,该收发器可以为收发电路。可选地,该输入/输出接口可以为输入/输出电路。
第六方面,提供了一种处理器,包括:输入电路、输出电路和处理电路。该处理电路用于通过该输入电路接收信号,并通过该输出电路发射信号,使得该处理器执行第一方面以及第一方面中任一种可能实现方式中的方法。
在具体实现过程中,上述处理器可以为一个或多个芯片,输入电路可以为输入管脚,输出电路可以为输出管脚,处理电路可以为晶体管、门电路、触发器和各种逻辑电路等。输入电路所接收的输入的信号可以是由例如但不限于接收器接收并输入的,输出电路所输出的信号可以是例如但不限于输出给发射器并由发射器发射的,且输入电路和输出电路可以是同一电路,该电路在不同的时刻分别用作输入电路和输出电路。本申请实施例对处理器及各种电路的具体实现方式不做限定。
第七方面,提供了一种计算机程序产品,该计算机程序产品包括:计算机程序(也可以称为代码,或指令),当该计算机程序被运行时,使得计算机执行上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法。
第八方面,提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序(也可以称为代码,或指令)当其在计算机上运行时,使得计算机执行上述第一方面以及第一方面中任一种可能实现方式中的方法,或实现上述第二方面以及第二方面中任一种可能实现方式中的方法,或实现上述第三方面以及第三方面中任一种可能实现方式中的方法。
第九方面,提供了一种通信系统,包括前述的多个参与节点和至少一个中心节点。
上述第二方面至第九方面中任一方面及其任一方面中任意一种可能的实现可以达到的技术效果,请参照上述第一方面及其第一方面中相应实现可以带来的技术效果描述,这里不再重复赘述。
附图说明
图1是适用于本申请实施例的通信系统的一个示意图;
图2是本申请实施例提供的信息处理方法的一个示意性流程图;
图3是本申请实施例提供的信息处理方法的另一个示意性流程图;
图4是本申请的通信装置的一例的示意性框图;
图5是本申请的通信装置的另一例的示意性结构图。
具体实施方式
在本申请实施例中,“/”可以表示前后关联的对象是一种“或”的关系,例如,A/B可以 表示A或B;“和/或”可以用于描述关联对象存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。为了便于描述本申请实施例的技术方案,在本申请实施例中,可以采用“第一”、“第二”等字样对功能相同或相似的技术特征进行区分。该“第一”、“第二”等字样并不对数量和执行次序进行限定,并且“第一”、“第二”等字样也并不限定一定不同。在本申请实施例中,“示例性的”或者“例如”等词用于表示例子、例证或说明,被描述为“示例性的”或者“例如”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。使用“示例性的”或者“例如”等词旨在以具体方式呈现相关概念,便于理解。
在本申请实施例中,至少一个(种)还可以描述为一个(种)或多个(种),多个(种)可以是两个(种)、三个(种)、四个(种)或者更多个(种),本申请不做限制。
下面将结合附图,对本申请中的技术方案进行描述。
本申请实施例的技术方案可以应用于各种通信系统,例如:长期演进(long term evolution,LTE)系统、LTE频分双工(frequency division duplex,FDD)系统、LTE时分双工(time division duplex,TDD)、第五代(5th generation,5G)通信系统、未来的通信系统(如第六代(6th generation,6G)通信系统)、无线保真系统(wireless fidelity,Wi-Fi)、超宽带(ultra wide band,UWB)系统或者多种通信系统融合的系统等,本申请实施例不做限定。其中,5G还可以称为新无线(new radio,NR)。
图1是适用于本申请实施例的通信系统的示意图。
如图1所示,适用于本申请实施例的通信系统可以包括至少一个中心节点,以及至少一个参与节点,如图1所示的参与节点1、2、N,中心节点可以向各个参与节点提供模型参数,各个参与节点基于中心节点提供的模型参数更新模型后,采用本地数据集分别对更新后的模型进行训练。例如,参与节点1采用本地数据集1对模型进行训练,参与节点2采用本地数据集2对模型进行训练,参与节点N采用本地数据集N对模型进行训练。多个参与节点进行模型训练后向中心节点发送本次训练得到的损失函数的梯度信息。中心节点确定来自多个参与节点的梯度信息的聚合梯度信息,并基于聚合梯度信息确定更新后的模型参数,并通知各个参与节点,由各个参与节点执行下一次模型训练。参与节点可以采用本申请提供的信息处理方法对梯度信息进行压缩后发送给中心节点,中心节点可以采用本申请提供的信息处理方法对接收到的压缩后的梯度信息进行解压缩。
本申请实施例提供的中心节点可以是网络设备,例如,服务器、基站、Wi-Fi系统中的接入点(access point,AP)等。中心节点可以是一种部署在无线接入网中能够与参与节点进行直接或间接通信的设备。
本申请实施例提供的参与节点可以是一种具有收发功能的设备,如终端、终端设备,示例性地,参与节点可以是传感器或具有数据采集功能的设备。参与节点可以被部署在陆地上,包括室内、室外、手持、和/或车载;也可以被部署在水面上(如轮船等);参与节点还可以被部署在空中(例如飞机、气球和卫星上等)。参与节点可以是用户设备(user equipment,UE),UE包括具有无线通信功能的手持式设备、车载设备、可穿戴设备或计算设备。示例性地,UE可以是手机(mobile phone)、平板电脑或带无线收发功能的电脑。终端设备还可以是虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制中的无线终端、无人驾驶中的无线终端、远程医疗中的无线终端、智能电网中的无线终端、智慧城市(smart city)中的无线终端、和/或智慧家庭(smart home) 中的无线终端等等。
本申请实施例提供的技术方案可以用于在多种场景中,例如,智能零售、智慧家庭、视频监控(video surveillance)、车辆网(如自动驾驶、无人驾驶等)、以及工业无线传感器网络(industrial wireless sens or network,IWSN)等。但本申请不限于此。
在一种实施方式中,本申请提供的技术方案可以应用于智能家庭,实现基于客户需求为客户提供个性化服务。中心节点可以是基站或服务器,参与节点可以是设置在各个家庭中的客户端设备。基于本申请提供的技术方案,客户端设备仅向服务器提供基于本地数据进行模型训练后通过路由器将合成梯度信息,能够在保护客户数据隐私的同时与服务器共享训练结果信息。服务器获取多个客户端设备提供的合成梯度信息的聚合梯度信息,确定更新后的模型参数并通知各个客户端设备,继续智能模型的训练,完成模型训练后客户端设备应用训练后的模型为客户提供个性化服务。
在另一种实施方式中,本申请提供的技术方案可以应用于工业无线传感器网络,实现工业智能化。中心节点可以是服务器,参与节点可以是工厂内的多个传感器(例如,可移动智能机器人等),传感器基于本地数据进行模型训练后向服务器发送合成梯度信息,并由服务器获基于传感器提供的合成梯度信息的聚合梯度信息,确定更新后的模型参数并通知各个传感器,继续智能模型的训练,完成模型训练后传感器应用训练后的模型为执行工厂任务,例如,传感器为可移动智能机器人,可以基于训练后的模型获取移动路线,完成工厂搬运任务、快递分拣任务等。
为了更好地理解本申请实施例,下面对本文中涉及到的术语做简单说明。
1、人工智能AI
人工智能AI是让机器具有学习能力,能够积累经验,从而能够解决人类通过经验可以解决的诸如自然语言理解、图像识别和/或下棋等问题。
2、训练(training)或学习
训练是指对模型(或称为训练模型)的处理过程。在该处理过程中通过优化该模型中的参数,如加权值,使该模型学会执行某项特定的任务。本申请实施例适用于但不限于以下一种或多种训练方法:监督学习、无监督学习、强化学习、和迁移学习等。有监督学习是利用一组具有已经打好正确标签的训练样本来训练。其中,已经打好正确标签是指每个样本有一个期望的输出值。与有监督学习不同,无监督学习是指一种方法,该方法没有给定事先标记过的训练样本,自动对输入的数据进行分类或分群。
3、推理
推理是指利用训练后的模型(训练后的模型可以称为推理模型)执行数据处理。将实际数据输入推理模型进行处理,得到对应的推理结果。推理还可以称为预测或决策,推理结果还可以称为预测结果、或决策结果等。
4、联邦学习(federated learning)
一种分布式AI训练方法,将AI算法的训练过程放在多个设备上进行,而不是聚合到一个服务器上,能够解决集中式AI训练时收集数据导致的耗时和大量通信开销问题。同时,由于不用将设备数据发送到服务器,也能够减少隐私安全问题。联邦学习的过程可以包括但不限于:中心节点向多个参与节点发送AI模型的模型参数,参与节点基于参与节点的本地数据进行AI模型训练,并在模型训练后获取损失函数的梯度信息,发送中心节点。中心节点对多个参与节点反馈的梯度信息,基于多个参与节点反馈的梯度信 息,更新AI模型的参数。中心节点可以将AI模型的更新后的参数发送给多个参与节点,参与节点再次执行对AI模型的训练。不同次联邦学习过程中,中心节点选择的参与节点可能相同,也可能不同,本申请对此不做限定。
在联邦学习中,参与节点反馈梯度信息的开销较大,需要压缩后传输以降低通信成本。然而,目前的压缩方式并不适合联邦学习,如梯度稀疏化由于梯度稀疏化仅选择了梯度向量中的部分元素,这将使得训练结果产生偏差。
基于主成分分析(principal component analysis,PCA)是一种统计方法。通过正交变换将一组存在相关性的变量转换为一组线性不相关的变量,转换后的这组变量称为主成分。具体地,对于一组存在的潜在相关性的数据中,PCA将维度为N的数据集中的每个高维数据点dl∈RN投影到M个主成分(principal component,PC)上,以获得RM中的低维表示(M<<N),PCA捕获了数据中最重要的子空间,其中,Rx表示维度为x的实数向量集合。传统的PCA通常通过对数据样本的奇异值分解(singular value decomposition,SVD)来实现。具体来说,PCA要求计算数据集中样本数据的样本均值μ,以及由数据D构建的协方差矩阵的前M个特征向量U。因此,给定数据集D中的数据点dl∈RN,可以通过sl=UT(dl-μ)∈RM(M<<N)有效地对数据压缩,以及,可以通过进行解压缩。
在联邦学习中,每个参与节点的模型训练后得到的梯度信息之间具有较强的相关性,本申请提出参与节点可以随着模型训练得到的梯度信息动态跟踪学习压缩内核信息,并反馈给中心节点,中心节点可以基于多个参与节点提供的压缩内核信息,得到全局压缩内核信息并通知参与节点,参与节点基于全局压缩内核信息利用PCA对梯度信息进行压缩。提高梯度信息的压缩率,并且能够对完整的梯度信息进行压缩,减小信息压缩对训练结果产生的不利影响,提高梯度信息的传输效率。
下面结合附图对本申请实施例提供的智能模型的训练方法进行说明。
图2是本申请实施例提供的信息处理方法的一个示意性流程图。中心节点与多个参与节点联合执行智能模型的训练,图2所示的信息处理方法由多个参与节点中的一个参与节点k执行。
S201,参与节点执行第t次模型训练,得到第一梯度信息。
其中,t为大于1的整数。参与节点k基于中心节点最近一次更新的智能模型的模型权重,更新智能模型,并对更新后的智能模型执行第t次模型训练,得到第一梯度信息。
一种实施方式中,该第一梯度信息为第t次模型训练后得到的损失函数的梯度信息其中,θ为联邦学习的目标模型参数,θ为实数向量。
另一种实施方式中,该第一梯度信息包括第t次模型训练后得到的损失函数的梯度信息和第t次模型训练之前该参与节点k未发送给中心节点的残差梯度信息。第一梯度信息可以表示如下:
其中,为第t次模型训练之前该参与节点未发送给中心节点的残差梯度信息。 是组合权重,ηt是第t次模型训练的学习率。由于压缩处理会使得参与节点k反馈给中心节点的梯度信息存在损失,影响联邦学习的训练收敛速度,因此,本申请提出可以通过在反馈的梯度信息中包含第t次模型训练之前未反馈给中心节点的残差梯度信息的方式,可以提高联邦学习训练的收敛速度。
需要说明的是,下文中以第一梯度信息为为例说明本申请提供的信息处理方法,应理解,本申请并不限于此,第一梯度信息可以是参与节点k在执行第t次模型训练后待压缩的梯度信息,如第一梯度信息还可以是上述当该第一梯度信息为时,在具体实施中可以将下文中替换为为了简要,在此不再赘述。
参与节点k得到第一梯度信息后,基于由中心节点最近一次(如第l次)更新的全局压缩内核信息(记作第二全局压缩内核信息)对第一梯度信息进行压缩,得到第一压缩梯度信息。该第二全局压缩内核信息是中心节点发送给参与节点的。参与节点通过图2所示实施例提供的信息处理方法为中心节点提供压缩内核信息,中心节点可以基于获取到的来自多个参与节点的压缩内核信息更新全局压缩内核信息,并发送给参与节点,用于参与节点压缩梯度信息。具体中心节点得到全局压缩内核信息的方式可以参见下文对图3所示实施例的描述。
可选地,第二全局压缩内核信息包括第二全局中心向量信息μl和第二全局主特征矩阵信息,该第二全局主特征信息用于表征Mθ个全局主特征向量。参与节点k可以基于第一梯度信息与第二全局中心向量信息μl之间的偏差量以及第二全局主特征矩阵信息Ul,得到第一压缩梯度信息。
例如,参与节点k可以将第一梯度信息与第二全局中心向量信息μl之间的偏差量投影至(Ul)T上,得到第一压缩梯度信息 即:
其中,(x)T表示x的转置。Rx×y表示维度为x×y的实数矩阵集合。
参与节点k得到第一压缩梯度信息后,参与节点k可以向中心节点发送第一压缩梯度信息。具体地,参与节点k可以获取传输系数并使用传输系数传输参与节点k向中心节点发送传输系数加权后的第一压缩梯度信息,即
一种实施方式中,参与节点k可以基于传输功率增益ρ,ρ∈R,和第t次模型训练对应的参与节点k与中心节点之间的信道衰落确定传输系数满足:
另一种实施方式中,参与节点可以基于截断阈值γ与信道衰落之间的关系,确定传输系数的取值,从而确定是否向中心节点发送加权后的第一压缩梯度信息,其中,γ∈R。如满足:
其中,当时,传输系数为参与节点k向中心节点发送传输系数加权后的第一压缩梯度信息,即否则,即当时,传输系数为0,参与节点k不向中心节点发送第一压缩梯度信息。
也就是说,该实施方式中,参与节点k可以基于信道条件判断是否向中心节点发送第一压缩梯度信息,当信道条件不满足截断阈值时,参与节点k不向中心节点发送第一梯度信息。能够减小因信道条件较差使得梯度信息传输错误或传输失败的概率,能够减小参与节点的功率消耗。参与节点k更新残差梯度信息 为第t+1次模型训练之前未发 送给中心节点的残差梯度信息。
其中,表示对事件A的指示函数(indicator function),当事件A发生时取值为1,当事件A不发生时取值为0。
也就是说,当时,参与节点k向中心节点发送了加权后的第一压缩梯度信息,则
不满足时,参与节点k不向中心节点发送第一压缩梯度信息,是第一梯度信息与第二全局中心向量信息μl之间的偏差量,即满足:
中心节点可以获取多个参与节点在第t次模型训练后发送的第一压缩梯度信息,以及基于第二全局压缩内核信息中的第二全局中心向量信息μl和第二全局主特征矩阵信息Ul进行信息解压缩,得到梯度均值信息,中心节点再基于梯度均值信息,确定智能模型的模型参数θt。中心节点将更新后的模型参数θt发送给K个参与节点,K个参与节点基于θt更新智能模型的模型参数后,对该更新模型参数后的智能模型执行t+1次模型训练。
具体中心节点接收来自参与节点的第t次模型训练后发送的压缩梯度信息并基于第二全局压缩内核信息对信息解压缩得到压缩均值信息的方式,可以参考图3所示实施例中S307中,中心节点接收来自参与节点的第t+i次模型训练后发送的压缩梯度信息,以及基于第一全局压缩内核信息对信息解压缩得到压缩均值信息的实施方式,如将S307中的各表达式中的t+i替换为t,将l+1替换为l即可,为了简要,在此不再赘述。
参与节点k在第t次模型训练之后除了以上描述的对第一梯度信息进行压缩后发送给中心节点以外,在S202中基于第一梯度信息更新压缩内核信息。
S202,参与节点基于历次模型训练后得到的梯度信息,确定第一压缩内核信息。
其中,历次模型训练包括该第t次模型训练和该第t次模型训练之前的至少一次模型训练。以下以本申请提供的优选方案,即历次模型训练包括该第t次模型训练以及该第t次模型训练之前的t-1次模型训练,共t次模型训练为例进行说明,应理解,本申请并不限于此,可以基于下文中的描述,在具体实施中,以参与节点选择该第t次模型训练和该第t次模型训练之前的至少一次模型训练得到的梯度信息确定压缩内核信息进行实施。
该第一压缩内核信息可以包括第一中心向量信息和/或第一主特征矩阵信息。
可选地,该第一压缩内核信息可以包括第一中心向量信息,其中,第一中心向量信息用于表征包括历次模型训练得到的梯度信息的均值向量。
参与节点k在第t次模型训练后,基于前t次模型训练后得到的t个梯度信息,得到该t个梯度信息的均值,即第一中心向量信息。
一种实施方式中,参与节点k可以基于t次模型训练得到的t个梯度信息得到该第一中心向量信息如第一中心向量信息满足下式:
另一种实施方式中,参与节点k基于第一梯度信息和第二中心向量信息,确定第一中心向量信息,其中,第二中心向量信息是第t-1次模型训练后确定的中心向量信息。
例如,参与节点k可以基于第t次模型训练后得到的第一梯度信息和第t-1次模型训练后确定的第二中心向量信息通过下式得到第一中心向量信息
在本申请中,参与节点k执行第一次模型训练后,中心向量信息为第一次模型训练得到的梯度信息
在该实施方式中,参与节点k通过第t次模型训练后得到的梯度信息和第t-1次模型训练后确定的第二中心向量信息,确定第t次模型训练后的中心向量信息(即第一中心向量信息),能够避免每次模型训练后使用历次模型训练得到的所有梯度信息在每次模型训练后进行迭代计算,能够减小存储开销,以及提高信息处理的效率。
可选地,第一压缩内核信息包括第一协方差均值矩阵信息的第一主特征矩阵信息。其中,第一协方差均值矩阵信息用于表征历次模型训练获取到的梯度信息与中心向量信息之间的偏差量的协方差矩阵的均值矩阵。
参与节点k在第t次模型训练后,基于前t次模型训练后得到的t个梯度信息,得到第一协方差均值矩阵信息。
一种实施方式中,参与节点k可以基于t次模型训练得到的t个梯度信息得到第一协方差均值矩阵信息 满足下式:
另一种实施方式中,参与节点k获取第一梯度信息与第一中心向量信息的偏差量和第二协方差均值矩阵信息,确定第一协方差均值矩阵信息,其中,第二协方差均值矩阵信息是第t-1次模型训练后获取到的协方差均值矩阵信息。
例如,参与节点k获取第一梯度信息与第一中心向量信息的偏差量的协方差矩阵信息 满足下式:
其中,是维度为Nθ×Nθ的矩阵,即
参与节点k再基于协方差矩阵信息和第二协方差均值矩阵信息确定第一协方差均值矩阵信息 满足:
在本申请中,参与节点k执行第一次模型训练后得到的协方差均值矩阵信息为梯度信息与中心向量信息的偏差量的协方差矩阵信息
在该实施方式中,参与节点k基于本次模型训练后得到的梯度信息和前一次模型训练后得到的协方差均值矩阵信息,确定本次的协方差均值矩阵信息,能够避免每次模型训练后使用历史模型训练得到的所有梯度信息在每次模型训练后进行迭代计算,能够减小存储开销,以及提高信息处理的效率。
前文中,第一中心向量信息和第一协方差均值矩阵信息的计算使用了求平均的方法得到,但本申请不限于此,第一中心向量信息和第一协方差均值矩阵信息也可以用更通用的方式表达,如:

其中,β、δ为加权系数,β、δ均为小于1的浮点数。
参与节点k确定第一协方差均值矩阵信息后,利用特征值分解(eigenvalue decomposition,EVD)对该第一协方差均值矩阵信息进行分解,可以得到该第一协方差均 值矩阵信息的第一主特征矩阵信息即,
其中,包括的Mθ个特征值对应的Mθ个特征向量,即可选地,包括的幅度最大的Mθ个特征值对应的Mθ个特征向量,即包括的Mθ个主特征值对应的Mθ个主特征向量。
S203,参与节点向中心节点发送第一压缩内核信息。
相应地,中心节点接收来自参与节点的第一压缩内核信息。
可选地,中心节点可以为参与节点配置发送第一压缩内核信息的第一资源,该参与节点在第一资源上向中心节点发送该第一压缩内核信息。
例如,中心节点可以配置多个参与节点均在第一资源上发送第一压缩内核信息,使得多个参与节点发送的压缩内核信息可以在第一资源上实现空中叠加,中心节点在第一资源上可以接收到叠加的压缩内核信息。
一种实施方式中,参与节点以T1为周期,周期性地向中心节点发送压缩内核信息,T1为大于1的整数。因此,参与节点向中心节点发送第一压缩内核信息,包括:当t/T1为整数时,参与节点向中心节点发送第一压缩内核信息。
参与节点可以在每次模型训练后采用如S202中描述的方式,更新压缩内核信息。在每T1次更新压缩内核信息后,参与节点向中心节点发送最近一次更新后的压缩内核信息,以便中心节点基于接收到的多个参与节点的第一压缩内核信息,确定全局压缩内核信息,并发送给参与节点,使得参与节点可以基于全局压缩内核信息对待发送的梯度信息进行压缩。能够提高梯度信息的压缩率。
另一种实施方式中,参与节点每更新一次压缩内核信息,向中心节点发送一次更新后的压缩内核信息,以便中心节点基于接收到的多个参与节点的第一压缩内核信息,确定全局压缩内核信息,并发送给参与节点,使得参与节点可以基于全局压缩内核信息对待发送的梯度信息进行压缩。能够提高梯度信息的压缩率。
中心节点确定第一全局压缩内核信息的具体方式可以参见图3所示实施例中的描述。
参与节点k得到第一压缩内核信息后,参与节点k可以向中心节点发送第一压缩内核信息。具体地,参与节点k可以获取传输系数,使用传输系数传输第一压缩内核信息。
若第一压缩内核信息包括第一中心向量信息参与节点k向中心节点发送传输系数加权后的第一中心向量信息,即 满足:
其中,是第t次更新后的中心向量信息对应的参与节点k与中心节点之间的信道衰落。
或者,参与节点k可以基于截断阈值γ与信道衰落之间的关系,确定传输系数的取值,从而确定是否向中心节点发送加权后的第一中心向量信息,满足:
其中,当时,参与节点k向中心节点发送传输系数加权后的第一中心向量信息,否则,不发送第一中心向量信息。
若第一压缩内核信息包括第一主特征矩阵信息参与节点k向中心节点发送传输系数加权后的第一主特征矩阵信息,即 满足:
其中,是第t次更新后的主特征矩阵信息对应的参与节点k与中心节点之间的信道衰落。
或者,参与节点k可以基于截断阈值γ与信道衰落之间的关系,确定传输系数的取值,从而确定是否向中心节点发送加权后的第一主特征矩阵信息,满足:
其中,当时,参与节点k向中心节点发送传输系数加权后的第一主特征矩阵信息,否则,不发送第一主特征矩阵信息。
S204,参与节点接收来自中心节点的第一信息,该第一信息用于指示第一全局压缩内核信息,该第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
参与节点接收到来自中心节点的第一信息后,基于第一全局压缩内核信息对待发送的梯度信息进行压缩得到压缩后的梯度信息。上述用于压缩第一梯度信息的第二全局压缩内核信息为中心节点第l次更新的全局压缩内核信息,则该第一全局压缩内核信息为中心节点第l+1次更新的全局压缩内核信息。该第一全局压缩内核信息包括第一全局中心向量信息μl+1以及第一全局主特征矩阵信息Ul+1
示例性地,参与节点k在接收到第一信息,确定中心节点将全局压缩内核更新为第一全局压缩内核信息后,参与节点k执行第t+i次模型训练,得到第二梯度信息i为正整数。参与节点k基于该第一全局压缩内核信息对第二梯度信息进行压缩,得到第二压缩梯度信息 满足:
以及,参与节点k在得到第二梯度信息后,基于第二梯度信息更新压缩内核信息(即t+i次更新压缩内核信息)。该压缩内核信息包括第一中心向量信息和/或第一主特征矩阵信息具体可以参考前文中第t次模型训练后更新压缩内核信息的描述,为了简要在此不再赘述。
一种实施方式中,参与节点以T1为周期周期性地更新压缩内核信息,则参与节点在得到t+i次更新的压缩内核信息后,若(t+i)/T1不是整数,则参与节点不向中心节点发送该t+i次更新的压缩内核信息;若(t+i)/T1是整数,则参与节点向中心节点发送该t+i次更新的压缩内核信息。
另一种实施方式中,参与节点每更新一次压缩内核信息,向中心节点发送一次更新后的压缩内核信息,则参与节点向中心节点发送该压缩内核信息。
根据本申请提供的上述方案,利用模型训练得到的梯度信息之间的相关性,提出了适用于联邦学习的梯度信息压缩方法。参与节点随着模型训练动态跟踪学习压缩内核信息,并反馈给中心节点,以便中心节点基于多个参与节点提供的压缩内核信息,得到全局压缩内核信息并通知参与节点,参与节点基于全局压缩内核信息利用PCA对梯度信息进行压缩。能够在减小信息压缩对训练结果产生的不利影响的基础上,提高梯度信息的压缩率, 提高梯度信息的传输效率。
以上结合图2示例性地介绍了本申请提供的参与节点随着模型训练动态跟踪学习压缩内核信息,并反馈给中心节点,从而获取来自中心节点的用于压缩梯度信息的全局压缩内核信息的信息处理方法。下面结合图3介绍本申请提供的中心节点基于来自参与节点的压缩内核信息,获取全局压缩内核信息的方式。
图3是本申请实施例提供的信息处理方法300的示意性流程图。在图3所示的信息处理方法300中,中心节点与K个参与节点执行智能模型的联合训练,K为大于或等于2的整数。该K个参与节点包括但不限于图3所示的参与节点1和参与节点2,若该K个参与节点还包括图3为示出的其他参与节点,可以参考图3所示的参与节点进行实施。
S301,参与节点1和参与节点2分别向中心节点发送压缩内核信息。
参与节点1、参与节点2分别执行第t次模型训练后,参与节点1得到梯度信息参与节点2得到梯度信息参与节点1得到梯度信息基于历次模型训练得到的梯度信息,确定压缩内核信息,该压缩内核信息可以包括中心向量信息和/或主特征矩阵信息参与节点2得到梯度信息基于历次模型训练得到的梯度信息,确定压缩内核信息,该压缩内核信息可以包括中心向量信息和/或主特征矩阵信息参与节点确定压缩内核信息的实施方式可以参考图2所示实施例中的描述,为了简要,在此不再赘述。
参与节点可以是在每次更新压缩内核信息后,向中心节点发送压缩内核信息。如参与节点1、参与节点2分别将该第t次模型训练后更新的压缩内核信息发送给中心节点。或者,参与节点以T1为周期,周期性地向中心节点发送压缩内核信息,t/T1为整数,则参与节点1、参与节点2分别向中心节点发送该第t次模型训练后更新的压缩内核信息。本申请对此不作限定。
可选地,参与节点1和参与节点2均在第一资源上向中心节点发送各自得到的压缩内核信息。其中,该第一资源可以是中心节点预配置给参与节点的。
S302,中心节点接收来自多个参与节点的压缩内核信息,得到第一全局压缩内核信息。
其中,该多个参与节点包括参与节点1和参与节点2。
一种实施方式中,多个参与节点均在第一资源上发送各自得到的压缩内核信息,中心节点在该第一资源上接收来自该多个参与节点的压缩内核信息,得到聚合压缩内核信息,该聚合压缩内核信息是该多个参与节点的压缩内核信息在第一资源上叠加后的压缩内核信息。
该多个参与节点均在第一资源上发送压缩内核信息,使得多个参与节点的压缩内核信息在无线信道中(或者说在中心节点的空中接口(air interface)上)叠加,使得中心节点在第一资源上接收到多个压缩内核信息叠加后的聚合压缩内核信息。
中心节点得到聚合压缩内核信息后,根据该聚合压缩内核信息,得到该第一全局压缩内核信息。该第一全局压缩内核信息包括第一全局中心向量信息μl+1和第一全局主特征矩阵信息Ul+1,第一全局压缩内核信息是中心节点第l+1次更新得到的全局压缩内核信息。
若参与节点发送的压缩内核信息包括中心向量信息参与节点得到的聚合压缩内核信息包括聚合中心向量信息yt,μ,聚合中心向量信息yt,μ是多个参与节点的中心向量信息在第一资源上叠加且经历了信道传播后的中心向量信息。
其中,为噪声,Cx表示维度为x的复数向量集合。
中心节点再基于在第一资源上接收到的该聚合中心向量信息yt,μ,确定第一全局中心向量信息μt+1
其中,Re(x)表示取x的实部。
可选地,中心节点与参与节点可以通过信息交互对参与节点是否发送中心向量信息可以达成共识,则中心节点可以确定K个参与节点中发送了中心向量信息的参与节点的个数Kμ,中心节点可以根据Kμ确定第一全局中心向量μl+1,μl+1满足:
一种实施方式中,参与节点在发送压缩内核信息之前,向中心节点发送传输系数信息,该传输系数用于指示传输系数是否为0。则中心节点可以根据K个参与节点的传输系数信息,确定Kμ个参与节点的传输系数不为0,即该Kμ个参与节点发送了中心向量信息。中心节点可以根据Kμ确定第一全局中心向量信息μl+1
另一种实施方式中,信道衰落是基于中心节点与参与节点之间传输的参考信号估计得到的,中心节点与参与节点可以通过信息交互对信道衰落达成共识。参与节点和中心节点均基于截断阈值γ与信道衰落之间的关系,确定传输系数的取值,使得中心节点能够基于每个参与节点的传输系数,确定发送了中心向量信息的参与节点的个数Kμ
例如,在信道具有互易性的情况下,中心节点可以向参与节点发送参考信号,参与节点基于该参考信号估计得到信道衰落,基于信道具有互易性,参与节点可以将该信道衰落作为参与节点可以向中心节点反馈该信道状态信息,用于向中心节点通知该参与节点估计得到的信道衰落使得中心节点与参与节点对信道衰落达成共识。再例如,参与节点可以向中心节点发送参考信号,中心节点基于该参考信号估计得到参与节点至中心节点方向的信道对应的信道衰落中心节点向参与节点发送信道状态信息,用于通知参与节点估计得到的信道衰落使得中心节点与参与节点对信道衰落达成共识。
参与节点和中心节点均基于截断阈值γ与信道衰落之间的关系,确定传输系数的取值。从而中心节点可以基于K个参与节点的传输系数,确定发送了中心向量信息的Kμ个参与节点发送了中心向量信息,其中,
中心节点在基于聚合中心向量信息yt,μ,确定第一全局中心向量信息μl+1,μl+1满足:
以上介绍了参与节点发送的压缩内核信息中包括中心向量信息时,中心节点基于接收到的压缩内核信息,得到全局中心向量信息的方式。但本申请不限于此,参与节点发送的压缩内核信息可以不包括中心向量信息如压缩内核信息仅包括主特征矩阵信息。中心节点可以基于从参与节点k获取到的该参与节点k执行历次模型训练后的梯度信息,确定参与节点k对应的中心向量信息具体中心节点确定中心向量信息的方式,可以参考参考图2所示实施例中参与节点k确定中心向量信息的方式,在此不再赘述。中心节点确定参与节点的中心向量信息后,中心节点可以计算得到多个参与节点的中心向量信息的均值向量,得到该第一全局中心向量信息μl+1。本申请对此不作限定。
若参与节点发送的压缩内核信息包括主特征矩阵信息参与节点得到的聚合压缩内 核信息包括聚合主特征矩阵信息yt,U,聚合主特征矩阵信息yt,U是多个参与节点的主特征矩阵信息在第一资源上叠加且经历了信道传播后的主特征矩阵信息。
其中,为噪声,Cx×y表示维度为x×y的复数矩阵集合。
中心节点在基于聚合主特征矩阵信息yt,U,确定第一全局主特征矩阵信息Ul+1,其中,第一全局主特征矩阵信息是中心节点第l+1次更新得到的全局主特征矩阵信息。
其中,将输入投影到斯蒂弗尔流形(Stiefel manifold)空间(或集合)上,由欧几里得空间的所有正交归一的Mθ维标架(frame)所组成。该第一全局主特征矩阵信息Ul+1用于表征Mθ个全局主特征向量。
可选地,中心节点与参与节点可以通过信息交互对参与节点是否发送主特征矩阵信息可以达成共识,则中心节点可以确定K个参与节点中发送了主特征矩阵信息的参与节点的个数KU,中心节点可以根据KU确定第一全局主特征矩阵信息Ul+1,Ul+1满足:
中心节点确定KU的具体实施方式,可以参考前文中中心节点确定Kμ的实施方式,为了简要在此不再赘述。
以上介绍了参与节点发送的压缩内核信息中包括主特征矩阵信息时,中心节点基于接收到的压缩内核信息,得到全局主特征矩阵信息的方式。但本申请不限于此,参与节点发送的压缩内核信息可以不包括主特征矩阵信息,如压缩内核信息仅包括中心向量信息。中心节点可以基于从参与节点k获取到的该参与节点k执行历次模型训练后的梯度信息,确定参与节点k对应的主特征矩阵信息具体中心节点确定主特征矩阵信息的方式,可以参考图2所示实施例中参与节点k确定主特征矩阵信息的方式,在此不再赘述。中心节点确定每个参与节点的主特征矩阵信息后,中心节点可以计算得到多个参与节点的主特征矩阵信息的均值矩阵,得到该第一全局主特征矩阵信息Ul+1。本申请对此不作限定。
另一种实施方式中,中心节点分别接收来自该多个参与节点的压缩内核信息,中心节点叠加聚合该多个参与节点的压缩内核信息,得到聚合压缩内核信息。中心节点再基于该聚合压缩内核信息,得到第一全局压缩内核信息。
也就是说,具体实施中,可以不采用上述参与节点均在第一资源上发送压缩内核信息的方式,参与节点可以在不同的资源上向中心节点发送压缩内核信息,由中心节点进行叠加聚合,得到聚合压缩内核信息,再基于聚合压缩内核信息得到第一全局压缩内核信息。
S303,中心节点发送第一信息,该第一信息用于指示第一全局压缩内核信息。
中心节点可以广播该第一信息,或者可以分别向K个参与节点发送该第一信息,本申请对此不作限定。
相应地,K个参与节点接收来自中心节点的该第一全局压缩内核信息,参与节点接收到该第一全局压缩内核信息后,基于该第一全局压缩内核信息对梯度信息进行压缩。参与节点在接收到中心节点第l+1次更新后的第一全局压缩内核信息(包括第一全局中心向量信息μl+1和第一全局主特征矩阵信息Ul+1)之前,参与节点基于来自中心节点的第l次更新后的第二全局压缩内核信息(包括第二全局中心向量信息μl和第二全局主特征矩阵信息 Ul)对模型训练后得到的梯度信息进行压缩,可以参考图2所示实施例中的描述。中心节点在发送第一全局压缩内核信息之前,基于最近一次发送的全局压缩内核信息对接收到的压缩梯度信息进行解压。
例如,中心节点在发送第一全局压缩内核信息之前,接收到了来自参与节点的第一压缩梯度信息(即第t次模型训练后得到的压缩梯度信息),则中心节点基于第二全局压缩内核信息中的第二全局中心向量信息μl和第二全局主特征矩阵信息Ul对第一压缩梯度信息进行解压缩。具体可以参考下文中介绍的中心节点基于第一全局压缩内核信息对来自参与节点的第t+i次模型训练得到的压缩梯度信息进行解压缩的实施方式,即相应公式中的t+i替换为t即可。
S304,参与节点1和参与节点2分别执行第t+i次模型训练,得到梯度信息。
参与节点1执行第t+i次模型训练,得到梯度信息参与节点2执行第t+i次模型训练,得到梯度信息
S305,参与节点1和参与节点2分别基于第一全局压缩内核信息,对该梯度信息进行压缩,得到压缩梯度信息。
参与节点1基于第一全局压缩内核信息中的第一全局中心向量信息μl+1和第一全局主特征矩阵信息Ul+1对梯度信息进行压缩,得到压缩梯度信息同理,参与节点2可以得到压缩梯度信息
S306,参与节点1和参与节点2分别向中心节点发送压缩梯度信息。
参与节点1和参与节点2可以在得到压缩梯度信息后向中心节点发送压缩梯度信息,或者参与节点1和参与节点2可以基于截断阈值和各自的信道衰落,确定是否向中心节点发送压缩梯度信息。
S307,中心节点基于第一全局压缩内核信息,解压缩获取到的压缩梯度信息。
一种实施方式中,多个参与节点均在第二资源上向中心节点发送压缩梯度信息,中心节点在第二资源上接收来自多个参与节点的压缩梯度信息,得到聚合压缩梯度信息,该聚合压缩梯度信息是多个参与节点的压缩梯度信息在第二资源上叠加后的压缩梯度信息。
其中,
中心节点基于yt+i对压缩梯度信息的均值进行估计,得到压缩梯度均值信息例如,中心节点可以基于最小二乘法(least square,LS)估计得到压缩梯度均值信息 满足:
其中,‖a‖2表示a的l2范数(l2-norm)。
可选地,中心节点与参与节点可以通过信息交互对参与节点是否发送压缩梯度信息可以达成共识,则中心节点可以确定K个参与节点中发送了压缩梯度信息的参与节点的个数Ks,中心节点可以根据Ks估计压缩梯度均值信息 满足:
中心节点得到压缩梯度均值信息后,基于第一全局中心向量信息μl+1和第一全局主特征矩阵信息Ul+1,对压缩梯度均值信息进行解压缩,得到估计的梯度均值信息Δt+i
中心节点再基于梯度均值信息,确定智能模型的模型参数θt+i+1
θt+i+1=θt+it+iΔt+i
中心节点将更新后的模型参数发送给K个参与节点,K个参与节点基于θt+i+1更新智能模型的模型参数后,执行t+i+1次模型训练。
另一种实施方式中,多个参与节点分别在不同的资源上发送压缩梯度信息,中心节点分别在不同的资源上接收来自该多个参与节点的压缩梯度信息,并基于第一全局压缩梯度信息分别对获取到的压缩梯度信息解压缩后,得到解压缩后的梯度信息。中心节点再对多个参与节点对应的解压缩后的梯度信息求平均,得到梯度均值信息。中心节点再基于均值梯度信息,确定智能模型的模型参数θt+i+1并发送给K个参与节点。
根据本申请提供的上述方案,利用模型训练得到的梯度信息之间的相关性,提出了适用于联邦学习的梯度信息压缩方法。参与节点随着模型训练动态跟踪学习压缩内核信息,并反馈给中心节点,中心节点基于多个参与节点提供的压缩内核信息,得到全局压缩内核信息并通知参与节点,参与节点基于全局压缩内核信息利用PCA对梯度信息进行压缩。能够在减小信息压缩对训练结果产生的不利影响的基础上,提高梯度信息的压缩率,提高梯度信息的传输效率。
以上,结合图2、图3详细说明了本申请实施例提供的方法。以下详细说明本申请实施例提供的装置。为了实现上述本申请实施例提供的方法中的各功能,各网元可以包括硬件结构和/或软件模块,以硬件结构、软件模块、或硬件结构加软件模块的形式来实现上述各功能。上述各功能中的某个功能以硬件结构、软件模块、还是硬件结构加软件模块的方式来执行,取决于技术方案的特定应用和设计约束条件。
图4是本申请实施例提供的通信装置的示意性框图。如图4所示,该通信装置400可以包括处理单元410和收发单元420。
在一种可能的设计中,该通信装置400可对应于上文方法实施例中的参与节点,或者配置于(或用于)参与节点中的芯片,或者是其他能够实现参与节点执行的方法的装置、模块、电路或单元等。
应理解,该通信装置400可对应于本申请实施例的方法中的参与节点,该通信装置400可以包括用于执行图2、图3所示的方法的第一设备中的各个单元。并且,该通信装置400中的各单元和上述其他操作和/或功能分别为了实现图2、图3所示的方法的相应流程。
还应理解,该通信装置400为配置于(或用于)参与节点中的芯片时,该通信装置400中的收发单元420可以为芯片的输入/输出接口或电路,该通信装置400中的处理单元410可以为芯片中的逻辑电路。
在另一种可能的设计中,该通信装置400可对应于上文方法实施例中的中心节点,例如,或者配置于(或用于)中心节点中的芯片,或者是其他能够实现中心节点执行的方法的装置、模块、电路或单元等。
应理解,该通信装置400可对应于图2、图3所示的方法中的中心节点,该通信装置400可以包括用于执行图2、图3所示的方法的中心节点的各个单元。并且,该通信装置400中的各单元和上述其他操作和/或功能分别为了实现图2、图3所示的方法的相应流程。
还应理解,该通信装置400为配置于(或用于)中心节点中的芯片时,该通信装置400中的收发单元420可以为芯片的输入/输出接口或电路,该通信装置400中的处理单元410可以为芯片中的逻辑电路。可选地,通信装置400还可以包括存储单元430,该存储单元430可以用于存储指令或者数据,处理单元410可以执行该存储单元中存储的指令或者数 据,以使该通信装置实现相应的操作。
应理解,该通信装置400中的收发单元420为可通过通信接口(如收发器或输入/输出接口)实现,例如可对应于图5中示出的通信装置500中的收发器510。该通信装置400中的处理单元410可通过至少一个处理器实现,例如可对应于图5中示出的通信装置500中的处理器520。该通信装置400中的处理单元410还可以通过至少一个逻辑电路实现。该通信装置400中的存储单元430可对应于图5中示出的通信装置500中的存储器。
还应理解,各单元执行上述相应步骤的具体过程在上述方法实施例中已经详细说明,为了简洁,在此不再赘述。
图5是本申请实施例提供的通信装置500的结构示意图。
该通信装置500可对应于上述方法实施例中的参与节点,如图5所示,该参与节点500包括处理器520和收发器510。可选地,该参与节点500还包括存储器。其中,处理器520、收发器510和存储器之间可以通过内部连接通路互相通信,传递控制和/或数据信号。该存储器用于存储计算机程序,该处理器520用于执行该存储器中的该计算机程序,以控制该收发器510收发信号。
应理解,图5所示的通信装置500能够实现图2、图3所示方法实施例中涉及参与节点的过程。参与节点500中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见上述方法实施例中的描述,为避免重复,此处适当省略详细描述。
该通信装置500可对应于上述方法实施例中的中心节点,如图5所示,该中心节点500包括处理器520和收发器510。可选地,该中心节点500还包括存储器。其中,处理器520、收发器510和存储器之间可以通过内部连接通路互相通信,传递控制和/或数据信号。该存储器用于存储计算机程序,该处理器520用于执行该存储器中的该计算机程序,以控制该收发器510收发信号。
应理解,图5所示的通信装置500能够实现图2、图3所示方法实施例中涉及中心节点的过程。中心节点500中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见上述方法实施例中的描述,为避免重复,此处适当省略详细描述。
上述处理器520可以和存储器可以合成一个处理装置,处理器520用于执行存储器中存储的程序代码来实现上述功能。具体实现时,该存储器也可以集成在处理器520中,或者独立于处理器520。该处理器520可以与图4中的处理单元对应。
上述收发器510可以与图4中的收发单元对应。收发器510可以包括接收器(或称接收机、接收电路)和发射器(或称发射机、发射电路)。其中,接收器用于接收信号,发射器用于发射信号。
应理解,图5所示的通信装置500能够实现图2、图3所示方法实施例中涉及终端设备的过程。终端设备500中的各个模块的操作和/或功能,分别为了实现上述方法实施例中的相应流程。具体可参见上述方法实施例中的描述,为避免重复,此处适当省略详细描述。
本申请实施例还提供了一种处理装置,包括处理器和(通信)接口;所述处理器用于执行上述任一方法实施例中的方法。
应理解,上述处理装置可以是一个或多个芯片。例如,该处理装置可以是现场可编程门阵列(field programmable gate array,FPGA),可以是专用集成芯片(application specific integrated circuit,ASIC),还可以是系统芯片(system on chip,SoC),还可以是中央处理器(central processor unit,CPU),还可以是网络处理器(network processor,NP),还 可以是数字信号处理电路(digital signal processor,DSP),还可以是微控制器(micro controller unit,MCU),还可以是可编程控制器(programmable logic device,PLD)或其他集成芯片。
根据本申请实施例提供的方法,本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码由一个或多个处理器执行时,使得包括该处理器的装置执行图2、图3所示实施例中的方法。
本申请实施例提供的技术方案可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、网络设备、终端设备、核心网设备、机器学习设备或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机可以存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,数字视频光盘(digital video disc,DVD))、或者半导体介质等。
根据本申请实施例提供的方法,本申请还提供一种计算机可读存储介质,该计算机可读存储介质存储有程序代码,当该程序代码由一个或多个处理器运行时,使得包括该处理器的装置执行图2、图3所示实施例中的方法。
根据本申请实施例提供的方法,本申请还提供一种系统,其包括前述的一个或多个参与节点。该系统还可以进一步包括前述的一个或多个中心节点。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (31)

  1. 一种信息处理方法,其特征在于,包括:
    执行第t次模型训练,得到第一梯度信息,t为大于1的整数;
    基于历次模型训练得到的梯度信息,确定第一压缩内核信息,所述历次模型训练包括所述第t次模型训练和所述第t次模型训练之前的至少一次模型训练,所述历次模型训练得到的梯度信息包括所述第一梯度信息;
    向中心节点发送所述第一压缩内核信息;
    接收来自所述中心节点的第一信息,所述第一信息用于指示第一全局压缩内核信息,所述第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
  2. 根据权利要求1所述的方法,其特征在于,所述基于历次模型训练得到的梯度信息,确定第一压缩内核信息,包括:
    基于所述第一梯度信息和第二中心向量信息,确定第一中心向量信息,所述第一中心向量信息用于表征所述历次模型训练得到的梯度信息的均值向量,所述第一压缩内核信息包括所述第一中心向量信息,所述第二中心向量信息是第t-1次模型训练后确定的中心向量信息。
  3. 根据权利要求2所述的方法,其特征在于,所述基于历次模型训练得到的梯度信息,确定第一压缩内核信息,还包括:
    获取所述第一梯度信息与第一中心向量信息之间的偏差量的协方差矩阵信息;
    基于所述协方差矩阵信息和第二协方差均值矩阵信息,确定第一协方差均值矩阵信息,所述第一协方差均值矩阵信息用于表征所述历次模型训练后获取到的偏差量的协方差矩阵信息的均值矩阵,所述第二协方差均值信息为第t-1次模型训练后确定的协方差均值矩阵信息;
    获取所述第一协方差均值矩阵信息的第一主特征矩阵信息,所述第一主特征矩阵信息用于表征所述第一协方差均值矩阵信息的M个特征值对应的特征向量,所述第一压缩内核信息包括所述第一主特征矩阵信息,M为大于1的整数。
  4. 根据权利要求1至3中任一项所述的方法,其特征在于,所述方法还包括:
    基于所述第一全局压缩内核信息对第二梯度信息进行压缩,得到第二压缩梯度信息,所述第二梯度信息为第t+i次模型训练后得到的梯度信息,i为正整数;
    发送所述第二压缩梯度信息。
  5. 根据权利要求4所述的方法,其特征在于,所述第一全局压缩内核信息包括全局中心向量信息和全局主特征矩阵信息,
    所述基于所述第一全局压缩内核信息对所述第二梯度信息进行压缩,得到第二压缩梯度信息,包括:
    基于所述第二梯度信息与所述全局中心向量信息之间的偏差量以及所述全局主特征矩阵信息,得到所述第二压缩梯度信息。
  6. 根据权利要求1至5中任一项所述的方法,其特征在于,在接收来自所述中心节点的所述第一信息之前,所述方法还包括:
    基于第二全局压缩内核信息对所述第一梯度信息进行压缩,得到第一压缩梯度信息,所述第二全局压缩内核信息是从中心节点获取到的。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述向中心节点发送所述第一压缩内核信息,包括:
    在第一资源上向所述中心节点发送所述第一压缩内核信息。
  8. 根据权利要求1至7中任一项所述的方法,其特征在于,所述向中心节点发送所述第一压缩内核信息,包括:
    当t/T1为整数时,向所述中心节点发送第一压缩内核信息,其中,T1为大于1的整数。
  9. 一种信息处理方法,其特征在于,包括:
    接收来自多个参与节点的第一压缩内核信息,得到第一全局压缩内核信息;
    发送第一信息,所述第一信息用于指示第一全局压缩内核信息,所述第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
  10. 根据权利要求9所述的方法,其特征在于,
    所述接收来自多个参与节点的第一压缩内核信息,得到第一全局压缩内核信息,包括:
    在第一资源上接收来自所述多个参与节点的第一压缩内核信息,得到聚合压缩内核信息,所述聚合压缩内核信息是所述多个参与节点的第一压缩内核信息在所述第一资源上叠加后的压缩内核信息;
    根据所述聚合压缩内核信息,得到所述第一全局压缩内核信息。
  11. 根据权利要求9或10所述的方法,其特征在于,所述方法还包括:
    接收来自多个参与节点的压缩梯度信息;
    根据所述多个参数节点的压缩梯度信息,得到压缩梯度均值信息,所述压缩梯度均值信息用于表征所述多个参数节点的压缩梯度信息的均值向量;
    基于所述第一全局压缩内核信息,对所述压缩梯度均值信息解压缩,得到解压后的梯度均值信息;
    基于所述梯度均值信息,更新智能模型的模型参数。
  12. 根据权利要求11所述的方法,其特征在于,
    所述第一压缩内核信息包括第一中心向量信息,所述第一全局压缩内核信息包括全局中心向量信息,所述全局中心向量信息用于表征所述多个参与节点的所述第一中心向量信息的均值向量;
    所述第一压缩内核信息包括第一主特征矩阵信息,所述第一全局压缩内核信息包括全局主特征矩阵信息,所述全局主特征矩阵信息用于表征M个全局主特征向量,M为大于1的整数。
  13. 根据权利要求12所述的方法,其特征在于,所述基于所述第一全局压缩内核信息,对所述压缩梯度均值信息解压缩,得到解压后的梯度均值信息,包括:
    基于所述全局中心向量信息和所述全局主特征矩阵信息,对所述压缩梯度均值信息解压缩,得到压缩后的梯度均值信息。
  14. 一种通信装置,其特征在于,包括:
    处理单元,用于执行第t次模型训练,得到第一梯度信息,t为大于1的整数;
    所述处理单元还用于基于历次模型训练得到的梯度信息,确定第一压缩内核信息,所述历次模型训练包括所述第t次模型训练和所述第t次模型训练之前的至少一次模型训练,所述历次模型训练得到的梯度信息包括所述第一梯度信息;
    收发单元,用于向中心节点发送所述第一压缩内核信息;
    所述收发单元还用于接收来自所述中心节点的第一信息,所述第一信息用于指示第一全局压缩内核信息,所述第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
  15. 根据权利要求14所述的装置,其特征在于,
    所述处理单元具体用于基于所述第一梯度信息和第二中心向量信息,确定第一中心向量信息,所述第一中心向量信息用于表征所述历次模型训练得到的梯度信息的均值向量,所述第一压缩内核信息包括所述第一中心向量信息,所述第二中心向量信息是第t-1次模型训练后确定的中心向量信息。
  16. 根据权利要求15所述的装置,其特征在于,所述处理单元具体用于:
    获取所述第一梯度信息与第一中心向量信息之间的偏差量的协方差矩阵信息;
    基于所述协方差矩阵信息和第二协方差均值矩阵信息,确定第一协方差均值矩阵信息,所述第一协方差均值矩阵信息用于表征所述历次模型训练后获取到的偏差量的协方差矩阵信息的均值矩阵,所述第二协方差均值信息为第t-1次模型训练后确定的协方差均值矩阵信息;
    获取所述第一协方差均值矩阵信息的第一主特征矩阵信息,所述第一主特征矩阵信息用于表征所述第一协方差均值矩阵信息的M个特征值对应的特征向量,所述第一压缩内核信息包括所述第一主特征矩阵信息,M为大于1的整数。
  17. 根据权利要求14至16中任一项所述的装置,其特征在于,
    所述处理单元还用于基于所述第一全局压缩内核信息对第二梯度信息进行压缩,得到第二压缩梯度信息,所述第二梯度信息为第t+i次模型训练后得到的梯度信息,i为正整数;
    所述收发单元还用于发送所述第二压缩梯度信息。
  18. 根据权利要求17所述的装置,其特征在于,所述第一全局压缩内核信息包括全局中心向量信息和全局主特征矩阵信息,
    所述处理单元还用于基于所述第二梯度信息与所述全局中心向量信息之间的偏差量以及所述全局主特征矩阵信息,得到所述第二压缩梯度信息。
  19. 根据权利要求14至18中任一项所述的装置,其特征在于,
    所述处理单元还用于在接收到来自所述中心节点的所述第一信息之前,基于第二全局压缩内核信息对所述第一梯度信息进行压缩,得到第一压缩梯度信息,所述第二全局压缩内核信息是从中心节点获取到的。
  20. 根据权利要求14至19中任一项所述的装置,其特征在于,
    所述收发单元具体用于在第一资源上向所述中心节点发送所述第一压缩内核信息。
  21. 根据权利要求14至20中任一项所述的装置,其特征在于,
    所述收发单元具体用于在t/T1为整数的情况下,向所述中心节点发送第一压缩内核信息,其中,T1为大于1的整数。
  22. 一种信息处理装置,其特征在于,包括:
    处理单元,用于根据接收到的来自多个参与节点的第一压缩内核信息,得到第一全局压缩内核信息;
    收发单元,用于发送第一信息,所述第一信息用于指示第一全局压缩内核信息,所述第一全局压缩内核信息用于对模型训练后得到的梯度信息进行压缩。
  23. 根据权利要求22所述的装置,其特征在于,
    所述收发单元还用于在第一资源上接收来自所述多个参与节点的第一压缩内核信息,得到聚合压缩内核信息,所述聚合压缩内核信息是所述多个参与节点的第一压缩内核信息在所述第一资源上叠加后的压缩内核信息;
    所述处理单元具体用于根据所述聚合压缩内核信息,得到所述第一全局压缩内核信息。
  24. 根据权利要求22或23所述的装置,其特征在于,所述收发单元还用于接收来自多个参与节点的压缩梯度信息;以及,所述处理单元还用于:
    根据所述多个参数节点的压缩梯度信息,得到压缩梯度均值信息,所述压缩梯度均值信息用于表征所述多个参数节点的压缩梯度信息的均值向量;
    基于所述第一全局压缩内核信息,对所述压缩梯度均值信息解压缩,得到解压后的梯度均值信息;
    基于所述梯度均值信息,更新智能模型的模型参数。
  25. 根据权利要求24所述的装置,其特征在于,
    所述第一压缩内核信息包括第一中心向量信息,所述第一全局压缩内核信息包括全局中心向量信息,所述全局中心向量信息用于表征所述多个参与节点的所述第一中心向量信息的均值向量;
    所述第一压缩内核信息包括第一主特征矩阵信息,所述第一全局压缩内核信息包括全局主特征矩阵信息,所述全局主特征矩阵信息用于表征M个全局主特征向量,M为大于1的整数。
  26. 根据权利要求25所述的装置,其特征在于,所述处理单元具体用于基于所述全局中心向量信息和所述全局主特征矩阵信息,对所述压缩梯度均值信息解压缩,得到压缩后的梯度均值信息。
  27. 一种通信装置,其特征在于,包括至少一个处理器,与存储器耦合;
    所述存储器用于存储程序或指令;
    所述至少一个处理器用于执行所述程序或指令,以使所述装置实现如权利要求1至8中任一项所述的方法,或者实现如权利要求9至13中任一项所述的方法。
  28. 一种芯片,其特征在于,包括至少一个逻辑电路和输入输出接口;
    所述逻辑电路用于控制所述输入输出接口并执行如权利要求1至8中任一项所述的方法,或者实现如权利要求9至13中任一项所述的方法。
  29. 一种计算机可读存储介质,其特征在于,存储有指令,当所述指令在计算机上运行时,使得所述计算机执行如权利要求1至8中任一项所述的方法,或者实现如权利要求9至13中任一项所述的方法。
  30. 一种计算机程序产品,其特征在于,包括指令,当所述指令在计算机上运行时,使得计算机执行如权利要求1至8中任一项所述的方法,或者实现如权利要求9至13中任一项所述的方法。
  31. 一种通信系统,其特征在于,包括权利要求14-21任一项所述的通信装置以及权利要求22-26任一项所述的通信装置。
PCT/CN2023/083106 2022-03-25 2023-03-22 信息处理方法和通信装置 WO2023179675A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210303188.1A CN116882487A (zh) 2022-03-25 2022-03-25 信息处理方法和通信装置
CN202210303188.1 2022-03-25

Publications (1)

Publication Number Publication Date
WO2023179675A1 true WO2023179675A1 (zh) 2023-09-28

Family

ID=88100071

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/083106 WO2023179675A1 (zh) 2022-03-25 2023-03-22 信息处理方法和通信装置

Country Status (2)

Country Link
CN (1) CN116882487A (zh)
WO (1) WO2023179675A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111901829A (zh) * 2020-07-10 2020-11-06 江苏智能交通及智能驾驶研究院 基于压缩感知和量化编码的无线联邦学习方法
CN112906046A (zh) * 2021-01-27 2021-06-04 清华大学 一种利用单比特压缩感知技术的模型训练方法和装置
CN113591145A (zh) * 2021-07-28 2021-11-02 西安电子科技大学 基于差分隐私和量化的联邦学习全局模型训练方法
CN113934578A (zh) * 2021-10-28 2022-01-14 电子科技大学 一种在联邦学习场景下的数据恢复攻击的方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111901829A (zh) * 2020-07-10 2020-11-06 江苏智能交通及智能驾驶研究院 基于压缩感知和量化编码的无线联邦学习方法
CN112906046A (zh) * 2021-01-27 2021-06-04 清华大学 一种利用单比特压缩感知技术的模型训练方法和装置
CN113591145A (zh) * 2021-07-28 2021-11-02 西安电子科技大学 基于差分隐私和量化的联邦学习全局模型训练方法
CN113934578A (zh) * 2021-10-28 2022-01-14 电子科技大学 一种在联邦学习场景下的数据恢复攻击的方法

Also Published As

Publication number Publication date
CN116882487A (zh) 2023-10-13

Similar Documents

Publication Publication Date Title
Chen et al. Distributed learning in wireless networks: Recent progress and future challenges
CN109617584B (zh) 一种基于深度学习的mimo系统波束成形矩阵设计方法
Elbir et al. Federated learning for physical layer design
WO2023020502A1 (zh) 数据处理方法及装置
WO2023036268A1 (zh) 一种通信方法及装置
CN116458103A (zh) 一种神经网络的训练方法以及相关装置
CN108376099B (zh) 一种优化时延与能效的移动终端计算迁移方法
WO2022174642A1 (zh) 基于空间划分的数据处理方法和通信装置
CN111967309A (zh) 一种电磁信号智能协同识别方法及系统
WO2023179675A1 (zh) 信息处理方法和通信装置
WO2024017001A1 (zh) 模型的训练方法及通信装置
WO2023098860A1 (zh) 通信方法和通信装置
WO2023279967A1 (zh) 智能模型的训练方法和装置
WO2023116787A1 (zh) 智能模型的训练方法和装置
CN116419257A (zh) 一种通信方法及装置
WO2024000538A1 (zh) 信息安全保护方法和装置
WO2023283785A1 (zh) 信号处理的方法及接收机
WO2023070675A1 (zh) 数据处理的方法及装置
WO2023185890A1 (zh) 一种数据处理方法及相关装置
EP4220484A1 (en) Communication method, apparatus and system
WO2023231934A1 (zh) 一种通信方法及装置
CN116600361B (zh) 无人机组网配置方法、设备和可读存储介质
WO2024051789A1 (zh) 一种波束管理方法
WO2024045888A1 (zh) 一种处理装置及控制方法
Vuckovic et al. CSI-Based Data-driven Localization Frameworking using Small-scale Training Datasets in Single-site MIMO Systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773922

Country of ref document: EP

Kind code of ref document: A1