WO2024060002A1 - 通信方法以及相关装置 - Google Patents

通信方法以及相关装置 Download PDF

Info

Publication number
WO2024060002A1
WO2024060002A1 PCT/CN2022/119814 CN2022119814W WO2024060002A1 WO 2024060002 A1 WO2024060002 A1 WO 2024060002A1 CN 2022119814 W CN2022119814 W CN 2022119814W WO 2024060002 A1 WO2024060002 A1 WO 2024060002A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
information
parameters
quantization
parameter
Prior art date
Application number
PCT/CN2022/119814
Other languages
English (en)
French (fr)
Inventor
张公正
徐晨
李榕
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2022/119814 priority Critical patent/WO2024060002A1/zh
Publication of WO2024060002A1 publication Critical patent/WO2024060002A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L27/00Modulated-carrier systems

Definitions

  • the present application relates to the field of communication technology, and in particular to a communication method and related devices.
  • Distributed learning is a learning method that implements joint learning. Specifically, multiple node devices use local data to train to obtain local models, and the central node device integrates multiple local models to obtain a global model. In this way, joint learning can be achieved on the premise of protecting the privacy of user data of node devices.
  • Multiple node devices can separately train their local models to obtain relevant parameters of the local models. For example, local model weight parameters or weight gradients. Then, multiple node devices send relevant parameters of the local model to the central node device. The central node device fuses the relevant parameters of the local model sent by multiple node devices to obtain the relevant parameters of the global model, and sends them to each node device. Each node device can update the local model of the node device through the relevant parameters of the global model.
  • each node device sends relevant parameters of the local model to the central node device respectively. This results in a large amount of data reported by the node device and a large communication overhead. Therefore, how node devices can report relevant parameters of the local model with low communication overhead is an issue that needs to be solved urgently.
  • Embodiments of the present application provide a communication method and related devices, which are used to reduce the communication overhead of the first device reporting related information of the first model and save system overhead.
  • a first aspect of the present application provides a communication method.
  • the communication method can be executed by a first device.
  • the first device can be a communication device or a component (such as a chip (system)) in the communication device.
  • the communication method include:
  • the first device receives at least one quantization threshold value from the second device. Then, the first device performs quantization processing on the relevant information of the first model of the first device according to at least one quantization threshold value. The first device sends first information to the second device, where the first information is used to indicate relevant information of the quantized first model. This reduces the communication overhead of the first device reporting related information of the first model and saves communication resources.
  • the relevant information of the first model includes: the output parameters or update parameters of the first model, and the update parameters include the weight gradient or weight parameters of the first model.
  • the model on each device can be understood as the same model.
  • the model may be called the first model.
  • this model may be called a global model.
  • the method before the first device receives at least one quantization threshold value from the second device, the method further includes: the first device sends second information to the second device; wherein, the first device The second information is used to indicate the information obtained by processing the relevant information of the first model; or the second information is used to indicate the information obtained by processing the relevant information obtained by the first device performing the Mth round of training on the first model.
  • the first model The relevant information is the relevant information obtained by the first device performing the Q-th round of training on the first model, M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1.
  • the first device may send the second information to the second device, thereby facilitating the second device to determine the at least one quantization threshold value. It is helpful for the second device to determine an appropriate quantization threshold value, and for the first device to perform reasonable quantification processing on the relevant information of the first model. Therefore, while ensuring the accuracy of the related information of the first model reported by the first device, the overhead of reporting the related information of the first model by the first device is reduced.
  • the relevant information of the first model includes the output parameters of the first model, and the information obtained after processing the relevant information of the first model includes the absolute value of the output parameter of the first model. or, the related information of the first model includes the updated parameters of the first model, and the information obtained after processing the related information of the first model includes the average of the absolute values of the updated parameters of the first model.
  • the first device can calculate the average of the absolute values of the output parameters of the first model or the update parameters of the first model. The average of the absolute values of the values is reported to the second device. This facilitates the second device to determine an appropriate quantization threshold value.
  • the method further includes: the first device receiving third information from the second device, where the third information is used to indicate global information of the first model.
  • the first device can implement updating or training of the first model in combination with global information of the first model.
  • the global information of the first model includes the global output parameters of the first model; or, the global information of the first model includes the global update parameters and/or the global learning rate of the first model.
  • the global information of the first model includes the global output parameters of the first model, thereby facilitating the first device to train the first model through the global output parameters, which is beneficial to improving the training performance of the first model and improving the accuracy of the first model.
  • the global information of the first model includes global update parameters and/or global learning rate of the first model. This facilitates the first device to update the first model in combination with the global update parameter and/or the global learning rate, which is beneficial to improving the accuracy of the first model.
  • the relevant information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1; the first device determines the first value according to at least one quantization threshold value.
  • the related information of the first model of the device is quantized, including: the first device quantizes N parameters according to at least one quantization threshold value to obtain quantized N parameters; the first information includes the quantized N parameters Parameters; the first device sends the first information to the second device, including: the first device modulates the quantized N parameters to obtain N first signals; the first device sends the N first signals to the second device.
  • the first information includes N parameters after quantization processing.
  • the first device may perform quantization processing on the N parameters of the first model, modulate the quantized N parameters, and then send the modulated N first signals.
  • the sending of the first information is realized.
  • At least one quantization threshold includes a first quantization threshold and a second quantization threshold; the first device quantizes N parameters according to at least one quantization threshold. Processing to obtain N parameters after quantization processing, including: if the i-th parameter among the N parameters is greater than the first quantization threshold value, the first device quantizes the i-th parameter to a first value, and i is greater than or An integer equal to 1 and less than or equal to N; or, if the i-th parameter among the N parameters is less than or equal to the first quantization threshold value and greater than or equal to the second quantization threshold value, the first device will The parameter is quantized to a second value; or, if the i-th parameter among the N parameters is less than the second quantization threshold, the first device quantizes the i-th parameter to a third value.
  • the specific quantization process of the first device to quantize the i-th parameter is shown, thereby facilitating the implementation of the solution.
  • the at least one quantization threshold value includes a plurality of quantization threshold values, thereby achieving a finer quantification accuracy of the parameters of the first model by the first device, which is beneficial to improving the accuracy of the first device updating the first model, and improving Training performance of the first model.
  • the first device modulates the N parameters after quantization to obtain N first signals, including: the first device modulates the i-th parameter after quantization to obtain The i-th first signal corresponds to two sequences; when the i-th parameter after quantization is the first value, the first device sends the first sequence of the two sequences.
  • the transmission power is less than the transmission power of the second sequence of the two sequences transmitted by the first device; when the i-th parameter after quantization is the second value, the first device transmits the transmission of the first sequence of the two sequences.
  • the power is equal to the transmission power used by the first device to send the second sequence of the two sequences; when the i-th parameter after quantization is the third value, the first device sends the first of the two sequences.
  • the transmission power of the sequence is greater than the transmission power of the second sequence of the two sequences transmitted by the first device.
  • the first means modulates each of the N parameters of the first model onto two sequences.
  • the first device controls the transmission power used to transmit each of the two sequences, thereby facilitating the second device to determine the value of the parameter.
  • the first device does not need to perform channel estimation and equalization, and therefore does not need corresponding pilot overhead.
  • the first device may carry the quantized i-th parameter through an all-0 sequence and/or a non-all-0 sequence. Under the same total transmission power, it is helpful for the second device to identify the value of the i-th parameter after the quantization process, thereby improving power utilization efficiency.
  • the first device sending the first information to the second device includes: the first device sending the first information to the second device L times, where L is an integer greater than or equal to 1.
  • L is an integer greater than or equal to 1.
  • the first device when the number of transmissions L is greater than 1, the first device repeatedly sends the first information, which is beneficial to the second device to select the judgment result with the most occurrences as the best judgment result after each judgment. This reduces the probability of judgment errors and improves the performance of model training.
  • the method further includes: the first device receiving first instruction information from the second device, the first instruction information being used to instruct the first device to send the first information to the second device.
  • the number of sending times is L.
  • the first device may receive the number of sending times indicated by the second device, and send the first information according to the number of sending times. This facilitates the second device to determine the number of transmissions based on actual needs, thereby rationally utilizing communication resources.
  • the relevant information of the first model includes N parameters after quantization error compensation of the first model, and the N parameters after quantization error compensation are obtained by the first device according to the first device.
  • the quantization errors corresponding to the N parameters obtained by performing the Q-th round of training on the first model are obtained by error compensation for the N parameters.
  • the quantization error corresponding to the i-th parameter among the N parameters is obtained by performing error compensation on the first model based on the first device. Determined by the i-th parameter obtained after the Q-1 round of training and quantization error compensation, i is an integer greater than or equal to 1 and less than or equal to N, N is an integer greater than or equal to 1, and Q is an integer greater than 1 .
  • the first device may first perform quantization error compensation on the N parameters of the first model, and then perform quantization processing on the N parameters after the quantization error compensation according to the at least one quantization threshold value. This is conducive to improving the accuracy of the first device in updating the first model and improving the training performance of the first model.
  • the relevant information of the first model includes N parameters of the first model obtained through sparse processing; the N parameters of the first model obtained through sparse processing are obtained by the first device according to The public sparse mask selects N parameters from the K parameters of the first model.
  • the K parameters of the first model are parameters obtained by the first device performing a round of training on the first model. K is an integer greater than or equal to N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
  • the first device may first select N parameters from K parameters of the first model through a common sparse mask, and then perform quantization processing on the N parameters according to the at least one quantization threshold value. This is beneficial to reducing the overhead caused by the first device reporting the parameters of the first model.
  • the public sparse mask is a bit sequence, the bit sequence includes K bits, and the K bits correspond to K parameters one-to-one; when one of the K bits is When the value is 0, it is used to instruct the first device not to select the parameter corresponding to the bit; when the value of one of the K bits is 1, it is used to instruct the first device to select the parameter corresponding to the bit.
  • a specific form of public sparse mask is provided. The first device selects which parameters are based on the values of bits in the bit sequence. The operation is simple and convenient. This reduces the overhead of the first device reporting the parameters of the first model and reduces the occupation of communication resources.
  • the common sparse mask is determined by the first device based on the sparse ratio and the pseudo-random number, and the sparse ratio is indicated by the second device to the first device.
  • a generation method of public sparse masks is provided to facilitate the implementation of the solution. This enables the first device to report some parameters of the first model based on the common sparse mask, and reduces the overhead caused by the first device reporting parameters of the first model.
  • the method further includes: the first device receives second indication information from the second device, and the second indication information is used to indicate the common sparse mask.
  • the first device receives second indication information from the second device, and the second indication information is used to indicate the common sparse mask.
  • the method further includes: the first device sends third indication information to the second device, and the third indication information is used to indicate the largest absolute value of the corresponding value among the K parameters. Index of N parameters.
  • the first device may indicate to the second device the indexes of the N parameters corresponding to the largest absolute values among its K parameters. This facilitates the second device to determine an appropriate common sparse mask.
  • the third indication information is used to indicate the index of the N parameters with the largest absolute value of the corresponding value among the K parameters. It is conducive to the first device to give priority to subsequent feedback of parameters with large changes, thereby improving the accuracy of model training and improving the performance of model training.
  • the first model is a neural network model
  • the relevant information of the first model includes relevant parameters of the neurons in the P layer of the neural network model, where P is an integer greater than or equal to 1.
  • the first device may report parameters of a certain layer or multiple layers in the neural network model. That is, the first device reports the parameters of the neural network model in units of layers of the neural network model, which facilitates the first device to accurately report the parameters of each layer and improves the accuracy of model training.
  • a second aspect of the present application provides a communication method.
  • the communication method can be executed by a second device.
  • the second device can be a communication device or a component (such as a chip (system)) in the communication device.
  • the communication method include:
  • the second device sends at least one quantization threshold value to the first device, and the at least one quantization threshold value is used to quantize the relevant information of the first model of the first device; the second device receives the first quantization threshold value sent from the first device.
  • Information the first information is used to indicate relevant information of the first model after quantization processing.
  • the relevant information of the first model includes: the output parameters or update parameters of the first model, and the update parameters include the weight gradient or weight parameters of the first model.
  • the models on each device can be understood as the same model.
  • the model may be called a first model
  • the model may be called a global model.
  • a possible implementation method also includes:
  • the second device receives the second information from the first device; wherein the second information is used to indicate the information obtained by processing the relevant information of the first model; or the second information is used to instruct the first device to perform a third process on the first model.
  • the relevant information of the first model is the relevant information obtained by the first device performing the Q round of training on the first model.
  • M is an integer greater than or equal to 1 and less than Q, and Q is greater than 1. an integer; the second device determines at least one quantization threshold value according to the second information.
  • the second device receives the second information from the first device, thereby enabling the second device to determine at least one quantization threshold value based on the second information. It is helpful for the second device to determine an appropriate quantization threshold value, and for the first device to perform reasonable quantification processing on the relevant information of the first model. Therefore, while ensuring the accuracy of the related information of the first model reported by the first device, the overhead of reporting the related information of the first model by the first device is reduced.
  • the relevant information of the first model includes the output parameters of the first model, and the information obtained after processing the relevant information of the first model includes the absolute value of the output parameter of the first model. or, the related information of the first model includes the updated parameters of the first model, and the information obtained after processing the related information of the first model includes the average of the absolute values of the updated parameters of the first model.
  • the second device can receive the average or the absolute value of the output parameters of the first model from the first device. The average of the absolute values of the updated parameter values of a model. This facilitates the second device to determine an appropriate quantization threshold value.
  • the method further includes: the second device receives third information from the third device; wherein the third information is used to indicate that the related information of the second model of the third device has been processed The information obtained; or, the third information is used to instruct the third device to perform the S-th round of training on the second model and obtain the processed information, and the relevant information of the second model is the third device to perform the R-th round of training on the second model.
  • S is an integer greater than or equal to 1 and less than R, and R is an integer greater than 1;
  • the second device determines at least one quantization threshold value according to the second information, including: the second device determines at least one quantization threshold value according to the second information and the first Three pieces of information determine at least one quantization threshold value.
  • the second device may also receive third information from the third device, and determine at least one quantization threshold value by combining the second information and the third information. It is helpful for the second device to determine an appropriate quantization threshold value, thereby reducing the overhead of the first device reporting the relevant information of the first model while ensuring the accuracy of the relevant information of the first model reported by the first device.
  • the method further includes: the second device determines the global information of the first model based on the first information; the second device sends fourth information to the first device, and the fourth information is used to indicate Global information of the first model.
  • the second device may determine the global information of the first model in combination with the first information, and send the global information of the first model to the first device.
  • the first device can update or train the first model.
  • the global information of the first model includes a global output parameter of the first model; or, the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
  • the global information of the first model includes the global output parameters of the first model, thereby facilitating the first device to train the first model through the global output parameters, which is beneficial to improving the training performance of the first model and improving the accuracy of the first model.
  • the global information of the first model includes global update parameters and/or global learning rate of the first model. This facilitates the first device to update the first model in combination with the global update parameter and/or the global learning rate, which is beneficial to improving the accuracy of the first model.
  • the method further includes: the second device receives fifth information from the third device, the fifth information is used to indicate related information of the second model of the third device; the second device Determining global information of the first model based on the first information includes: the second device determining global information of the first model based on the first information and fifth information.
  • the second device may also receive fifth information from the third device, and determine global information of the first model by combining the first information and the fifth information. It is beneficial to improve the accuracy of the second device in determining the global information of the first model and improve the accuracy of model update.
  • the relevant information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1; the relevant information of the second model includes N parameters of the second model. Parameters; the first information includes N parameters of the quantized first model; the second device receives the first information sent from the first device, including: the second device receives N first signals from the first device, N The N first signals carry the N parameters of the quantized first model, and the N first signals correspond to the N parameters of the quantized first model; the fifth information includes the quantized second model.
  • the second device receives the fifth information from the third device, including: the second device receives N second signals from the third device, and the N second signals carry the N of the second model after quantization processing Parameters, N second signals correspond one-to-one to N parameters of the quantized second model; the second device determines the global information of the first model based on the first information and the fifth information, including: the second device determines the global information of the first model based on the first information and the fifth information, including: The N first signals and the N second signals determine global information of the first model.
  • the i-th first signal among the N first signals corresponds to the first sequence and the second sequence
  • the i-th second signal among the N second signals corresponds to the third sequence.
  • the time-frequency resources used by the first device to send the first sequence are the same as the time-frequency resources used by the third device to send the third sequence
  • the time-frequency resources used by the first device to send the second sequence are the same as the time-frequency resources used by the third device to send the third sequence.
  • the time-frequency resources used by the four sequences are the same;
  • the global information of the first model includes N global parameters of the first model; i is an integer greater than or equal to 1 and less than or equal to N;
  • the second device is based on N first signals and N
  • the second signal determines the global information of the first model, including: the second device determines that the second device receives the first signal energy sum of the first sequence and the third sequence; the second device determines that the second device receives the second sequence and the fourth sequence.
  • the second signal energy sum of the sequence the second device determines the i-th global parameter among the N global parameters according to the first signal energy sum and the second signal energy sum.
  • the second device can determine the i-th global parameter by receiving the signal energy of the two sequences corresponding to the i-th first signal and the signal energy of the two sequences corresponding to the i-th second signal. This supports the second device to achieve non-coherent reception of multi-user air signal superposition transmission and achieve robustness to fading channels.
  • the second device determines the i-th global parameter among N global parameters based on the first signal energy sum and the second signal energy sum, including: if the sum of the first signal energy sum and the decision threshold value is less than the second signal energy sum, the second device determines the value of the i-th global parameter to be the first value; or, if the sum of the first signal energy sum and the decision threshold value is greater than or equal to the second signal energy sum, and the sum of the second signal energy sum and the decision threshold value is greater than or equal to the first signal energy sum, the second device determines the value of the i-th global parameter to be the second value; or, if the sum of the second signal energy sum and the decision threshold value is less than the first signal energy sum, the second device determines the value of the i-th global parameter to be the third value.
  • the method further includes: the second device sends first instruction information to the first device, and the first instruction information is used to instruct the first device to send the first information to the second device.
  • Degree L, L is an integer greater than or equal to 1.
  • the second device instructs the first device the number of times to send the first information, so that the first device sends the first information according to the number of times. This facilitates the second device to determine the number of transmissions based on actual needs, thereby rationally utilizing communication resources.
  • the method further includes: the second device sends second indication information to the first device, the second indication information is used to indicate a common sparse mask, and the common sparse mask is used to indicate the third A device reports some parameters obtained by training the first model by the first device.
  • the second device sends second indication information to the first device, and the second indication information is used to indicate the common sparse mask. This facilitates the first device to select N parameters from the K parameters of the first model according to the common sparse mask. This is beneficial to reducing the overhead caused by the first device reporting the parameters of the first model.
  • the method further includes: the second device receives third instruction information from the first device, and the third instruction information is used to instruct the first device to perform a round of training on the first model to obtain Indexes of the N parameters corresponding to the largest absolute values among the K parameters; the second device receives fourth indication information from the third device, and the fourth indication information is used to indicate K of the second model of the third device Indexes of the N parameters with the largest absolute values corresponding to the values among the parameters.
  • the K parameters of the second model are the K parameters obtained by the third device performing a round of training on the second model; the second device performs a round of training on the second model according to the third instruction.
  • information and the fourth indication information determine the common sparse mask.
  • each device indicates the index of the parameter with the largest absolute value among its K parameters, which is helpful for the second device to determine an appropriate common sparse mask based on the third indication information and the fourth indication information.
  • the first device can give priority to feedback parameters with large changes based on the public sparse mask, thereby improving the accuracy of model training and improving the performance of model training.
  • a third aspect of the present application provides a communication method.
  • the communication method can be executed by a first device.
  • the first device can be a communication device or a component (such as a chip (system)) in the communication device.
  • the communication method include:
  • the first device sends first indication information to the second device.
  • the first indication information is used to indicate the index of the corresponding N parameters with the largest absolute value among the K parameters of the first model of the first device.
  • the first model The K parameters are the K parameters obtained by the first device performing a round of training on the first model. K is an integer greater than or equal to the N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1. .
  • the first device receives second indication information from the second device.
  • the second indication information is used to indicate a public sparse mask.
  • the public sparse mask is determined by the second device based on the first indication information.
  • the public sparse mask is used to instruct the first device to report the part obtained by training the first model by the first device. parameter.
  • the first device may report the first indication information to the second device, thereby indicating the indexes of the corresponding N parameters with the largest absolute values among the K parameters of the first model.
  • This enables the second device to determine an appropriate common sparse mask based on the first indication information.
  • the first device receives second indication information from the second device.
  • the second indication information is used to indicate the common sparse mask. This facilitates the realization that the first device can preferentially feed back parameters with large changes according to the common sparse mask. It is beneficial to reduce the overhead caused by the first device reporting the parameters of the first model, and at the same time, it also improves the accuracy of model training and improves the performance of model training.
  • a fourth aspect of the present application provides a communication method.
  • the communication method can be executed by a second device.
  • the second device can be a communication device or a component (such as a chip (system)) in the communication device.
  • the communication method include:
  • the second device receives the first indication information from the first device.
  • the first indication information is used to indicate the index of the corresponding N parameters with the largest absolute value among the K parameters of the first model of the first device.
  • the first The K parameters of the model are the K parameters obtained by the first device performing a round of training on the first model. K is an integer greater than or equal to the N, K is an integer greater than or equal to 1, and N is an integer greater than or equal to 1.
  • the second device determines the public sparse mask according to the first instruction information, and the public sparse mask is used to instruct the first device to report some parameters obtained by training the first model by the first device. Then, the second device sends second indication information to the first device, and the second indication information is used to indicate the common sparse mask.
  • the second device receives the first indication information from the first device, and the first indication information is used for the index of the corresponding N parameters with the largest absolute values among the K parameters of the first model.
  • the second device can determine an appropriate common sparse mask according to the first indication information. This facilitates the first device to give priority to feedback parameters with large changes based on the public sparse mask, thereby reducing the overhead caused by the first device reporting parameters of the first model, while also improving the accuracy of model training and improving the performance of model training.
  • the method further includes: the second device receives third indication information from the third device, and the third indication information is used to indicate K parameters of the second model of the third device.
  • the K parameters of the second model are the K parameters obtained by the second device performing a round of training on the second model; the second device determines the common parameters based on the first instruction information.
  • the sparse mask includes: the second device determines a common sparse mask based on the first indication information and the third indication information.
  • the second device may also determine the public sparse mask in combination with the third indication information reported by the third device, thereby facilitating the second device to determine an appropriate public sparse mask for the first device.
  • the first device is implemented to give priority to feedback parameters with large changes based on the public sparse mask, thereby improving the accuracy of model training and improving the performance of model training.
  • a fifth aspect of this application provides a first device, including:
  • the transceiver module is used to receive at least one quantization threshold value from the second device; the processing module is used to perform quantification processing on the relevant information of the first model of the first device according to the at least one quantization threshold value; the transceiver module is also used to In order to send the first information to the second device, the first information is used to indicate relevant information of the quantized first model.
  • the relevant information of the first model includes: the output parameters or update parameters of the first model, and the update parameters include the weight gradient or weight parameters of the first model.
  • the transceiver module is further configured to: send second information to the second device; wherein the second information is used to indicate information obtained by processing the relevant information of the first model; or, The second information is used to indicate that the first device has processed the relevant information obtained by performing the M-th round of training on the first model.
  • the relevant information of the first model is the relevant information obtained by the first device performing the Q-th round of training on the first model.
  • Information M is an integer greater than or equal to 1 and less than Q, Q is an integer greater than 1.
  • the relevant information of the first model includes the output parameters of the first model, and the information obtained after processing the relevant information of the first model includes the absolute values of the output parameters of the first model. or, the related information of the first model includes the updated parameters of the first model, and the information obtained after processing the related information of the first model includes the average of the absolute values of the updated parameters of the first model.
  • the transceiver module is further configured to: receive third information from the second device, where the third information is used to indicate global information of the first model.
  • the global information of the first model includes a global output parameter of the first model; or, the global information of the first model includes a global update parameter and/or a global learning rate of the first model.
  • the relevant information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1; the processing module is specifically configured to: according to at least one quantization threshold value Perform quantization processing on N parameters to obtain N parameters after quantization processing; the first information includes the N parameters after quantization processing; the transceiver module is specifically used to: modulate the N parameters after quantization processing to obtain N first Signals; sending N first signals to the second device.
  • At least one quantization threshold includes a first quantization threshold and a second quantization threshold; the processing module is specifically configured to:
  • the i-th parameter among N parameters is greater than the first quantization threshold value, quantize the i-th parameter to the first value, where i is an integer greater than or equal to 1 and less than or equal to N; or, if N parameters When the i-th parameter in is less than or equal to the first quantization threshold and greater than or equal to the second quantization threshold, quantize the i-th parameter to the second value; or, if the i-th parameter among the N parameters is less than When the second quantization threshold is used, the i-th parameter is quantized to the third value.
  • the transceiver module is specifically configured to: modulate the i-th parameter after quantization to obtain the i-th first signal, where the i-th first signal corresponds to two sequences; When the i-th parameter after quantization is the first value, the transmission power of the first device in transmitting the first sequence of the two sequences is less than the transmission power of the first device in transmitting the second sequence of the two sequences.
  • the transmission power of the first sequence of the two sequences sent by the first device is equal to the transmission power of the second sequence of the two sequences sent by the first device;
  • the transmission power of the first device that transmits the first sequence of the two sequences is greater than the transmission power of the first device that transmits the second sequence of the two sequences.
  • the first sequence of the two sequences is a non-all-0 sequence, and the second sequence is all 0s. sequence; when the i-th parameter after quantization is the second value, both sequences are all 0 sequences; when the i-th parameter after quantization is the third value, the first sequence of the two sequences is an all-0 sequence, and the second sequence is a non-all-0 sequence.
  • the transceiver module is specifically configured to: send the first information L times to the second device, where L is an integer greater than or equal to 1.
  • the transceiver module is further configured to: receive first instruction information from the second device, where the first instruction information is used to instruct the first device to send the first information to the second device. times L.
  • the relevant information of the first model includes N parameters of the first model after quantization error compensation, and the N parameters after quantization error compensation are obtained by the first device according to the first device.
  • the quantization errors corresponding to the N parameters obtained by performing the Q-th round of training on the first model are obtained by error compensation for the N parameters.
  • Q is an integer greater than 1.
  • the quantization error corresponding to the i-th parameter among the N parameters is It is determined based on the i-th parameter obtained by performing the Q-1 round of training on the first model by the first device and undergoing quantization error compensation.
  • the relevant information of the first model includes N parameters of the first model obtained after sparse processing; the N parameters of the first model obtained after sparse processing are N parameters selected by the first device from K parameters of the first model according to a common sparse mask, and the K parameters of the first model are parameters obtained by the first device performing the Qth round of training on the first model, where K is an integer greater than or equal to N, and K is an integer greater than or equal to 1.
  • the public sparse mask is a bit sequence, the bit sequence includes K bits, and the K bits correspond to K parameters one-to-one; when one of the K bits is When the value is 0, it is used to instruct the first device not to select the parameter corresponding to the bit; when the value of one of the K bits is 1, it is used to instruct the first device to select the parameter corresponding to the bit.
  • the common sparse mask is determined by the first device based on the sparse ratio and the pseudo-random number, and the sparse ratio is indicated by the second device to the first device.
  • the transceiver module is further configured to: receive second indication information from the second device, where the second indication information is used to indicate the common sparse mask.
  • the transceiver module is further configured to: send third indication information to the second device, where the third indication information is used to indicate N with the largest absolute value of the corresponding value among the K parameters.
  • the index of the parameter is further configured to: send third indication information to the second device, where the third indication information is used to indicate N with the largest absolute value of the corresponding value among the K parameters. The index of the parameter.
  • the first model is a neural network model
  • the relevant information of the first model includes relevant parameters of the neurons in the P layer of the neural network model, where P is an integer greater than or equal to 1.
  • a sixth aspect of this application provides a second device, including:
  • a transceiver module configured to send at least one quantization threshold value to the first device, and the at least one quantization threshold value is used to perform quantization processing on the relevant information of the first model of the first device; receive the first information sent from the first device , the first information is used to indicate the relevant information of the first model after quantization processing.
  • the relevant information of the first model includes: the output parameters or update parameters of the first model, and the update parameters include the weight gradient or weight parameters of the first model.
  • the transceiver module is further configured to: receive second information from the first device; wherein the second information is used to indicate information obtained by processing relevant information of the first model; or , the second information is used to instruct the first device to perform the M-th round of training on the first model and obtain the processed information.
  • the relevant information of the first model is the relevant information obtained by the first device to perform the Q-th round of training on the first model, M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1;
  • the second device further includes a processing module configured to determine at least one quantization threshold value according to the second information.
  • the relevant information of the first model includes the output parameters of the first model, and the information obtained after processing the relevant information of the first model includes the absolute value of the output parameter of the first model. or, the related information of the first model includes the updated parameters of the first model, and the information obtained after processing the related information of the first model includes the average of the absolute values of the updated parameters of the first model.
  • the transceiver module is also used to: receive third information from a third device; wherein the third information is used to indicate information obtained by processing relevant information of the second model of the third device; or, the third information is used to indicate information obtained by processing the third device after performing the Sth round of training on the second model, and the relevant information of the second model is the relevant information obtained by the third device after performing the Rth round of training on the second model, S is an integer greater than or equal to 1 and less than R, and R is an integer greater than 1; a processing module is used to determine at least one quantization threshold value based on the second information and the third information.
  • the processing module is further configured to: determine the global information of the first model based on the first information; the transceiver module is further configured to: send fourth information to the first device, and the fourth information is to indicate the global information of the first model.
  • the global information of the first model includes the global output parameters of the first model; or, the global information of the first model includes the global update parameters and/or the global learning rate of the first model.
  • the transceiver module is further configured to: receive fifth information from the third device, where the fifth information is used to indicate relevant information of the second model of the third device; the processing module specifically uses Yu: Determine the global information of the first model based on the first information and the fifth information.
  • the relevant information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1; the relevant information of the second model includes N parameters of the second model. Parameters; the first information includes N parameters of the quantized first model; the transceiver module is specifically configured to: receive N first signals from the first device, and the N first signals carry N parameters of the first model, The N first signals have a one-to-one correspondence with the N parameters of the quantized first model; the fifth information includes the N parameters of the quantized second model; the transceiver module is specifically used to: receive signals from the third device N second signals, the N second signals carry N parameters of the quantized second model, and the N second signals correspond to the N parameters of the quantized second model one-to-one; the processing module specifically uses Yu: Determine the global information of the first model based on N first signals and N second signals.
  • the i-th first signal among the N first signals corresponds to the first sequence and the second sequence
  • the i-th second signal among the N second signals corresponds to the third sequence.
  • the time-frequency resources used by the first device to send the first sequence are the same as the time-frequency resources used by the third device to send the third sequence
  • the time-frequency resources used by the first device to send the second sequence are the same as those used by the third device to send the third sequence.
  • the time-frequency resources used in the fourth sequence are the same; the global information of the first model includes N global parameters of the first model; i is an integer greater than or equal to 1 and less than or equal to N; the processing module is specifically used to: determine the second The device receives the first signal energy sum of the first sequence and the third sequence; determines that the second device receives the second signal energy sum of the second sequence and the fourth sequence; determines N based on the first signal energy sum and the second signal energy sum.
  • the i-th global parameter in the global parameters is specifically used to: determine the second The device receives the first signal energy sum of the first sequence and the third sequence; determines that the second device receives the second signal energy sum of the second sequence and the fourth sequence; determines N based on the first signal energy sum and the second signal energy sum.
  • the processing module is specifically configured to: if the sum of the first signal energy sum and the decision threshold value is less than the second signal energy sum, determine the value of the i-th global parameter as a first value; or, if the sum of the first signal energy sum and the decision threshold value is greater than or equal to the second signal energy sum, and the sum of the second signal energy sum and the decision threshold value is greater than or equal to the first signal energy sum, Then determine the value of the i-th global parameter to be the second value; or, if the sum of the second signal energy sum and the decision threshold value is less than the first signal energy sum, determine the i-th global parameter to be the third value value.
  • the transceiver module is further configured to: send first indication information to the first device, where the first indication information is used to instruct the first device to send the first information to the second device a number of times.
  • L is an integer greater than or equal to 1.
  • the transceiver module is further configured to: send second indication information to the first device, the second indication information is used to indicate a common sparse mask, and the common sparse mask is used to indicate the first
  • the device reports some parameters obtained by training the first model by the first device.
  • the transceiver module is further configured to: receive third instruction information from the first device, where the third instruction information is used to instruct the first device to perform a round of training on the first model. Indexes of the N parameters with the largest absolute values corresponding to the K parameters; receiving fourth indication information from the third device, the fourth indication information being used to indicate the corresponding K parameters of the second model of the third device.
  • the indexes of the N parameters with the largest absolute values, the K parameters of the second model are the K parameters obtained by the third device performing a round of training on the second model;
  • the second device also includes a processing module, and the processing module also Used for: determining the common sparse mask according to the third indication information and the fourth indication information.
  • a seventh aspect of this application provides a first device, including:
  • the transceiver module is configured to send first indication information to the second device.
  • the first indication information is used to indicate the index of the corresponding N parameters with the largest absolute value among the K parameters of the first model of the first device.
  • the K parameters of a model are the K parameters obtained by the first device performing a round of training on the first model.
  • K is an integer greater than or equal to the N
  • K is an integer greater than or equal to 1
  • N is greater than or equal to 1.
  • the second indication information is used to indicate a public sparse mask, which is determined by the second device based on the first indication information; the public sparse mask is used to indicate
  • the first device reports some parameters obtained by training the first model by the first device.
  • An eighth aspect of this application provides a second device, including:
  • the transceiver module is configured to receive first indication information from the first device, where the first indication information is used to indicate the index of the corresponding N parameters with the largest absolute value among the K parameters of the first model of the first device,
  • the K parameters of the first model are the K parameters obtained by the first device performing a round of training on the first model.
  • K is an integer greater than or equal to the N
  • K is an integer greater than or equal to 1
  • N is an integer greater than or equal to 1.
  • a processing module configured to determine a public sparse mask according to the first indication information, where the public sparse mask is used to instruct the first device to report some parameters obtained by training the first model by the first device;
  • the transceiver module is also used to send second indication information to the first device, where the second indication information is used to indicate a common sparse mask.
  • the transceiver module is also used to:
  • the third indication information is used to indicate the index of the N parameters with the largest absolute value among the K parameters of the second model of the third device, the K of the second model
  • the parameters are the K parameters obtained by the third device performing a round of training on the second model
  • the processing module is specifically used for:
  • the common sparse mask is determined according to the first indication information and the third indication information.
  • the first device may be a communication device
  • the transceiver module may be a transceiver, or an input/output interface
  • the processing module may be a processor
  • the first device is a chip, a chip system or a circuit configured in a communication device.
  • the transceiver module may be an input/output interface, interface circuit, output circuit, input circuit, pipe on the chip, chip system or circuit. pins or related circuits; the processing module may be a processor, processing circuit or logic circuit, etc.
  • the second device may be a communication device
  • the transceiver module may be a transceiver, or an input/output interface
  • the processing module may be a processor
  • the second device is a chip, a chip system or a circuit configured in a communication device.
  • the transceiver module may be an input/output interface, interface circuit, output circuit, input circuit, pipe on the chip, chip system or circuit. pins or related circuits; the processing module may be a processor, processing circuit or logic circuit, etc.
  • a ninth aspect of the present application provides a first device, the first device including: a processor and a memory.
  • Computer programs or computer instructions are stored in the memory, and the processor is used to call and run the computer program or computer instructions stored in the memory, so that the processor implements any implementation manner as in the first aspect or the third aspect.
  • the first device further includes a transceiver, and the processor is used to control the transceiver to send and receive signals.
  • a tenth aspect of the present application provides a second device, the second device including: a processor and a memory.
  • Computer programs or computer instructions are stored in the memory, and the processor is used to call and run the computer program or computer instructions stored in the memory, so that the processor implements any implementation manner such as the second aspect or the fourth aspect.
  • the second device further includes a transceiver, and the processor is used to control the transceiver to send and receive signals.
  • An eleventh aspect of the present application provides a first device, including a processor and an interface circuit.
  • the processor is configured to communicate with other devices through the interface circuit and execute the method described in the first or third aspect.
  • the processor includes one or more.
  • a twelfth aspect of the present application provides a second device, including a processor and an interface circuit.
  • the processor is configured to communicate with other devices through the interface circuit and execute the method described in the second or fourth aspect.
  • the processor includes one or more.
  • a thirteenth aspect of the present application provides a first device, including a processor, connected to a memory, and configured to call a program stored in the memory to execute the method described in the first or third aspect.
  • the memory may be located within the first device, or may be located outside the first device.
  • the processor includes one or more.
  • a fourteenth aspect of the present application provides a second device, including a processor, connected to a memory, and configured to call a program stored in the memory to execute the method described in the second aspect or the fourth aspect.
  • the memory may be located within the second device or may be located outside the second device.
  • the processor includes one or more.
  • the first device of the fifth aspect, the seventh aspect, the ninth aspect, the eleventh aspect, and the thirteenth aspect may be a chip (system).
  • the second device of the above-mentioned sixth, eighth, tenth, twelfth, and fourteenth aspects may be a chip (system).
  • a fifteenth aspect of the present application provides a computer program product including instructions, which are characterized in that, when run on a computer, the computer is caused to execute any one of the first to fourth aspects. Method to realize.
  • a sixteenth aspect of the present application provides a computer-readable storage medium, including computer instructions. When the instructions are run on a computer, they cause the computer to execute any implementation method in any one of the first to fourth aspects. .
  • a seventeenth aspect of the present application provides a chip device, including a processor for calling a computer program or computer instructions in a memory, so that the processor executes any one of the above-mentioned first to fourth aspects. way of implementation.
  • the processor is coupled to the memory through an interface.
  • An eighteenth aspect of the present application provides a communication system, which includes a first device as in the fifth aspect and a second device as in the sixth aspect; or, the communication system includes a first device as in the seventh aspect and a second device as in the sixth aspect; The second device of the eighth aspect.
  • the first device receives at least one quantization threshold value from the second device. Then, the first device performs quantization processing on the relevant information of the first model of the first device according to at least one quantization threshold value. The first device sends first information to the second device, where the first information is used to indicate relevant information of the quantized first model. This reduces the communication overhead of the first device reporting related information of the first model and saves communication resources.
  • Figure 1 is a schematic diagram of a communication system applied in an embodiment of the present invention
  • Figure 2 is a schematic diagram of a communication method according to an embodiment of the present application.
  • Figure 3 is a schematic flow chart of a communication method according to an embodiment of the present application.
  • Figure 4 is a schematic diagram of another embodiment of the communication method according to the embodiment of the present application.
  • Figure 5 is a schematic diagram of generating a public sparse mask according to an embodiment of the present application.
  • FIG6 is a schematic structural diagram of a first device according to an embodiment of the present application.
  • Figure 7 is another structural schematic diagram of the first device according to the embodiment of the present application.
  • Figure 8 is a schematic structural diagram of the second device according to the embodiment of the present application.
  • Figure 9 is another structural schematic diagram of the second device according to the embodiment of the present application.
  • Figure 10 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
  • Figure 11 is a schematic structural diagram of a network device according to an embodiment of the present application.
  • Embodiments of the present application provide a communication method and related devices, which are used to reduce the communication overhead of the first device reporting related information of the first model and save communication resources.
  • At least one of a, b, or c can represent: a, b, c; a and b; a and c; b and c; or a, b, and c.
  • a, b, c can be single or multiple.
  • instruction may include direct instruction, indirect instruction, explicit instruction, and implicit instruction.
  • indication information When it is described that certain indication information is used to indicate A, it can be understood that the indication information carries A, indicates A directly, or indicates A indirectly.
  • the information indicated by the indication information is called information to be indicated.
  • the information to be indicated can be directly indicated, such as the information to be indicated itself or the index of the information to be indicated, or by indicating other information.
  • Information to be indicated is indirectly indicated, where there is an association relationship between the other information and the information to be indicated. It is also possible to indicate only a part of the information to be indicated, while other parts of the information to be indicated are known or agreed in advance.
  • the indication of specific information can also be achieved by means of a pre-agreed (for example, protocol stipulated) arrangement order of each piece of information, thereby reducing the indication overhead to a certain extent.
  • the information to be instructed can be sent together as a whole, or can be divided into multiple sub-information and sent separately, and the sending period and/or sending timing of these sub-information can be the same or different.
  • the specific sending method is not limited in this application.
  • the sending period and/or sending timing of these sub-information may be predefined, for example, according to a protocol, or may be configured by the transmitting device by sending configuration information to the receiving device.
  • the technical solution of this application can be applied to cellular communication systems related to the 3rd generation partnership project (3GPP).
  • 3GPP 3rd generation partnership project
  • the fourth generation (4G) communication system the fifth generation (5G) communication system, and the communication systems after the fifth generation communication system.
  • the sixth generation communication system the fourth generation communication system.
  • the fourth generation communication system may include a long term evolution (LTE) communication system.
  • the fifth generation communication system may include a new radio (NR) communication system.
  • WiFi wireless fidelity
  • communication systems that support the integration of multiple wireless technologies such as WiFi
  • D2D device-to-device (D2D) systems, vehicle to everything (vehicle to everything) , V2X) communication system, etc.
  • D2D device-to-device
  • V2X vehicle to everything
  • the communication system to which the technical solution of the present application is applicable includes a first device and a second device.
  • the communication system also includes a third device.
  • the first device is the first terminal device or a chip in the first terminal device
  • the second device is a network device or a chip in the network device.
  • the first device and the second device can execute the communication method provided by this application.
  • the third device is the second terminal device or a chip within the second terminal device.
  • the third device can execute the communication method provided by this application.
  • the above introduction takes the first terminal device and the second terminal device as an example.
  • the network device can perform the communication method provided by this application with more terminal devices.
  • the first device is the first network device or a chip in the first network device
  • the second device is a terminal device or a chip in the terminal device.
  • the first device and the second device can execute the communication method provided by this application.
  • the third device is the second network device or a chip within the second network device.
  • the third device can execute the communication method provided by this application.
  • the above introduction takes the first network device and the second network device as an example.
  • the terminal device can perform the communication method provided by this application with more network devices.
  • the first device is the first terminal device or a chip in the first terminal device
  • the second device is the second terminal device or a chip in the second terminal device.
  • the first device and the second device can execute the communication method provided by this application.
  • the third device is a third terminal device or a chip in the third terminal device.
  • the third device can execute the communication method provided in this application.
  • the first terminal device can perform the communication method provided by this application with more terminal devices.
  • a terminal device is a device with wireless transceiver functions and also has computing capabilities.
  • the terminal device can perform machine learning training through local data and send relevant information about the model trained by the terminal device to the network device.
  • Terminal equipment can refer to user equipment (UE), access terminal, subscriber unit (subscriber unit), user station, mobile station (mobile station), remote station, remote terminal, mobile device, user terminal, wireless communication equipment, User agent or user device.
  • the terminal device may also be a satellite phone, a cellular phone, a smartphone, a wireless data card, a wireless modem, a machine type communications device, a cordless phone, a session initiation protocol (SIP) phone, a wireless local loop (wireless local) loop (WLL) station, personal digital assistant (PDA), handheld device with wireless communication capabilities, computing device or other processing device connected to a wireless modem, vehicle-mounted equipment, communication equipment carried on high-altitude aircraft, wearable Equipment, drones, robots, terminals in D2D, terminals in V2X, virtual reality (VR) terminal equipment, augmented reality (AR) terminal equipment, wireless terminals in industrial control (industrial control) , wireless terminals in self-driving, wireless terminals in remote medical, wireless terminals in smart grid, wireless terminals in
  • Network equipment has wireless transceiver functions and also has computing capabilities.
  • Network devices are used to communicate with end devices.
  • the network device may be a device that connects the terminal device to the wireless network.
  • the network device may be a network node with computing capabilities.
  • the network device may be an artificial intelligence (AI) node, a computing power node, or an access network node with AI capabilities on the network side (for example, access network or core network).
  • AI artificial intelligence
  • the network device can fuse the models trained by multiple terminal devices and then send them to these terminal devices. This enables joint learning between multiple terminal devices.
  • the network device may be a node in a wireless access network.
  • Network equipment can be called base stations, radio access network (RAN) nodes or RAN equipment.
  • the network equipment can be an evolved base station (evolved Node B, eNB or eNodeB) in LTE, or a next generation node B (next generation node B, gNB) in the 5G network or a future evolved public land mobile network (public land mobile network) , PLMN) base stations, broadband network service gateways (broadband network gateway, BNG), aggregation switches or non-3rd generation partnership project (3rd generation partnership project, 3GPP) access equipment, etc.
  • the network equipment in the embodiment of this application may include various forms of base stations.
  • macro base stations For example, macro base stations, micro base stations (also called small stations), relay stations, access points, equipment that implements base station functions in communication systems evolved after 5G, access points (access points, APs) in WiFi systems, transmission points (transmitting and receiving point, TRP), transmitting point (TP), mobile switching center, D2D communication, V2X device communication or equipment that performs base station functions in machine-to-machine (M2M) communication, etc.
  • Network equipment can also include centralized units (CU) and distributed units (DU, non-terrestrial network) in cloud radio access network (C-RAN) systems.
  • NTN network equipment in the communication system can be deployed on high-altitude platforms or satellites, and this application does not impose restrictions.
  • Figure 1 is a schematic diagram of a communication system applied in an embodiment of the present application.
  • the communication system includes a terminal device 101, a terminal device 102, a network device 103, a network device 104 and a server 105.
  • the terminal device 101 can establish a communication connection with the network device 103
  • the terminal device 102 can establish a communication connection with the network device 103.
  • the terminal device 101, the terminal device 102 and the network device 103 can execute the communication method provided by this application. This reduces the overhead of terminal devices reporting relevant information about their models and saves communication overhead.
  • the communication system includes at least one network device and at least one terminal device.
  • Distributed learning is a learning method that implements joint learning. Specifically, multiple node devices use local data to train to obtain local models, and the central node device integrates multiple local models to obtain a global model. In this way, joint learning can be achieved on the premise of protecting the privacy of user data of node devices.
  • Multiple node devices can separately train their local models to obtain relevant parameters of the local models. For example, local model weight parameters or weight gradients. Then, multiple node devices send relevant parameters of the local model to the central node device. The central node device fuses the relevant parameters of the local model sent by multiple node devices to obtain the relevant parameters of the global model, and sends them to each node device. Each node device can update the local model of the node device through the relevant parameters of the global model. It can be seen from the above technical solution that each node device sends relevant parameters of the local model to the central node device respectively. This results in a large amount of data reported by the node device and a large communication overhead. Therefore, how node devices can report relevant parameters of the local model with low communication overhead is an issue that needs to be solved urgently.
  • abs(y) means finding the absolute value of each element in vector y.
  • Figure 2 is a schematic diagram of a communication method according to an embodiment of the present application. See Figure 2. Methods include:
  • the second device sends at least one quantization threshold value to the first device.
  • the first device receives at least one quantization threshold value from the second device.
  • the at least one quantization threshold value is used by the first device to perform quantization processing on the relevant information of the first model.
  • the first model may be a model configured by the second device for the first device.
  • the first model may be a neural network model.
  • the relevant information of the first model is obtained by the first device performing a round of training on the first model.
  • the related information of the first model includes output parameters or update parameters of the first model.
  • the output parameters of the first model can be understood as the output data of the first model.
  • the updated parameters of the first model include weight parameters or weight gradients of the first model.
  • the first model is a neural network model, and the relevant information of the first model includes output parameters of the neural network model.
  • the relevant information of the first model includes weight parameters or weight gradients in the neural network model.
  • the first device is a first terminal device and the second device is a network device.
  • the at least one quantization threshold value can be carried in downlink control information, radio resource control (radio resource control, RRC) message or In the media access control control element (MAC CE).
  • RRC radio resource control
  • MAC CE media access control control element
  • the first device is a network device
  • the second device is a terminal device
  • the at least one quantization threshold value can be carried in the uplink control information.
  • the embodiment shown in Figure 2 also includes step 201a and step 201b. Steps 201a and 201b may be performed before step 201.
  • the first device sends the second information to the second device.
  • the second device receives the second information from the first device.
  • the second information is used to indicate information obtained by processing the relevant information of the first model.
  • the second information includes information obtained by processing the relevant information of the first model, or the second information indicates information obtained by processing the relevant information of the first model.
  • the relevant information of the first model includes the output parameters of the first model.
  • the information obtained by processing the relevant information of the first model includes the average value or weighted value of the absolute value of the output parameters of the first model.
  • the output parameters of the first model include output parameter A, output parameter B and output parameter C of the first model.
  • the first device averages the absolute values corresponding to output parameter A, output parameter B and output parameter C respectively to obtain the average value of the absolute values of the output parameters.
  • the second information includes the average value or weighted value of the absolute value of the output parameter of the first model.
  • the second information indicates the average value or weighted value of the absolute value of the output parameter of the first model.
  • the second information is indication information
  • the corresponding relationship between the value of the indication information and the average value or weighted value of the absolute value of the output parameter of the first model can be as shown in Table 1 or Table 2:
  • the relevant information of the first model includes the update parameters of the first model.
  • the information obtained by processing the relevant information of the first model includes the average value or weighted value of the absolute value of the update parameters of the first model.
  • the update parameters of the first model include the weight gradient obtained by the first device performing the Qth round of training on the first model. Weight gradient and weight gradient The first device is to weight the gradient Weight gradient and weight gradient The corresponding absolute values are averaged to obtain the average value of the absolute value of the weight gradient of the first model.
  • the second information includes the average value or weighted value of the absolute value of the update parameter of the first model.
  • the second information indicates the average value or weighted value of the absolute value of the update parameter of the first model.
  • the second information is indication information, and the corresponding relationship between the value of the indication information and the average value or weighted value of the absolute value of the update parameter of the first model can be shown in Table 2:
  • Indication value Update the average or weighted value of the absolute value of a parameter 00 0.5 01 1 10 1.5 11 2
  • Implementation Mode 2 The second information is used to instruct the first device to obtain information obtained by processing the relevant information obtained by performing the Mth round of training on the first model.
  • the relevant information of the first model is the relevant information obtained by the first device performing the Qth round of training on the first model.
  • M is an integer greater than or equal to 1 and less than Q
  • Q is an integer greater than 1.
  • the second information includes information obtained by processing the relevant information obtained by the first device performing the M-th round of training on the first model; or, the second information instructs the first device to perform the M-th round of training on the first model.
  • Information obtained through processing of relevant information Regarding the information obtained by processing the relevant information obtained by performing the M-th round of training on the first model by the first device, please refer to the related introduction of the information obtained by processing the relevant information of the first model.
  • Implementation method 2 is similar to implementation method 1. For details, please refer to the relevant introduction of implementation method 1.
  • the first device is a terminal device and the second device is a network device.
  • the second information can be carried in downlink control information, RRC messages or MAC CE.
  • the first device is a network device
  • the second device is a terminal device
  • the second information can be carried in uplink control information.
  • the second device determines the at least one quantization threshold value according to the second information.
  • the at least one quantization threshold value includes a quantization threshold value.
  • the second information includes the average of the absolute values of the weight gradients of the first model.
  • the quantization threshold value ⁇ 1 mean(abs( ⁇ w Q ))*a, a is a control factor used to control the interval of quantization processing, and the value range of a is [0, + ⁇ ).
  • abs( ⁇ w Q ) represents the absolute value of the weight gradient obtained by the first device performing the Q-th round of training on the first model.
  • At least one quantization threshold value includes two quantization threshold values, which are a first quantization threshold value and a second quantization threshold value respectively.
  • the first quantization threshold value ⁇ 1 mean(abs( ⁇ w Q ))*a
  • the second quantization threshold value - ⁇ 1 -mean(abs( ⁇ w Q ))*a.
  • abs( ⁇ w Q ) represents the absolute value of the weight gradient obtained by the first device performing the Q-th round of training on the first model.
  • Step 201c may be performed before step 201.
  • the third device sends the third information to the second device.
  • the second device receives the third information from the third device.
  • the third information is used to indicate information obtained by processing the relevant information of the second model of the third device.
  • the third information is used to instruct the third device to perform the S-th round of training on the second model and process the information.
  • the relevant information of the second model is the relevant information obtained by the third device performing the R-th round of training on the second model.
  • S is an integer greater than or equal to 1 and less than R
  • R is an integer greater than 1.
  • the third information is similar to the second information. For details, please refer to the related introduction about the second information mentioned above.
  • the second model may be a model configured by the second device for the third device.
  • the first model and the second model may be the same model.
  • the first model and the second model are both global models configured by the second device.
  • the first model and the second model herein are used to distinguish the models on the first device and the second device, and they may actually be the same model.
  • the above step 201b specifically includes:
  • the second device determines the at least one quantization threshold value according to the second information and the third information.
  • the second information includes an average of absolute values of weight gradients of the first model.
  • the third information includes the average of the absolute values of the weight gradients of the second model.
  • the second device determines the at least one quantization threshold value based on the average of the absolute values of the weight gradients of the first model and the average of the absolute values of the weight gradients of the second model.
  • the at least one quantization threshold value includes two quantization threshold values, which are a first quantization threshold value and a second quantization threshold value respectively.
  • the first quantization threshold value ⁇ 1 mean(mean(abs( ⁇ w Q )),mean(abs( ⁇ w R )))*a
  • the second quantization threshold value - ⁇ 1 -mean(mean(abs( ⁇ w Q )),mean(abs( ⁇ w R )))*a.
  • the N weight gradients obtained by the first device performing the Q-th round of training on the first model are represented by the vector ⁇ w Q.
  • the N weight gradients obtained by the second device performing the R-th round of training on the second model are represented by the vector ⁇ w R.
  • the above-mentioned steps 201a to 201c are only an example of the technical solution of the present application in which the second device determines the at least one quantization threshold based on the second information of the first device and the third information of the third device.
  • the second device may receive relevant information of models indicated by multiple devices, and determine the at least one quantization threshold value based on the relevant information of these models, which is not specifically limited in this application.
  • step 201a can be performed first, and then step 201c can be performed; or step 201c can be performed first, and then step 201a can be performed; or step 201a and step 201c can be performed simultaneously depending on the situation, and this application does not limit the specifics.
  • the first device performs quantization processing on the relevant information of the first model of the first device according to at least one quantization threshold value.
  • the relevant information of the first model includes the output parameters or update parameters of the first model.
  • the technical solution of the present application is introduced by taking the relevant information of the first model including N parameters of the first model as an example. N is an integer greater than or equal to 1. Therefore, the above-mentioned step 202 specifically includes: the first device performs quantization processing on the N parameters of the first model according to at least one quantization threshold value, and obtains the quantized N parameters. For example, as shown in Figure 3, the first device performs the Q-th round of training on the first model to obtain relevant information of the first model. Then, the first device performs quantification processing on the relevant information of the first model.
  • the at least one quantization threshold value includes a quantization threshold value ⁇ 1 .
  • the relevant information of the first model includes N parameters of the first model.
  • the above-mentioned step 202 specifically includes: if the i-th parameter among the N parameters is greater than the quantization threshold value ⁇ 1 , the first device quantizes the i-th parameter to a first value, where i is greater than or equal to 1 and less than or equal to N. integer. If the i-th parameter among the N parameters is less than or equal to the quantization threshold value ⁇ 1 , the first device quantizes the i-th parameter to a third value.
  • the above step 202 specifically includes: if the i-th parameter among the N parameters is greater than or equal to the quantization threshold value ⁇ 1 , then the first device quantizes the i-th parameter to the first value, i is greater than or equal to 1 and An integer less than or equal to N. If the i-th parameter among the N parameters is less than the quantization threshold value ⁇ 1 , the first device quantizes the i-th parameter to a third value.
  • the first value is +1 and the third value is -1.
  • the N parameters of the first model are the N weight gradients of the first model.
  • the i-th weight gradient among the N weight gradients is expressed as When the weight gradient is greater than the quantization threshold ⁇ 1 , then the weight gradient Quantized to +1, when the weight gradient is less than or equal to the quantization threshold value ⁇ 1 , then the weight gradient Quantized to -1.
  • the ith weight gradient s i after quantization processing can be expressed by the following formula 1:
  • the above shows the quantization process of the i-th parameter among the N parameters of the first model by the first device.
  • the quantization process of other parameters among the N parameters is also applicable, and will not be explained one by one here.
  • the first device quantizes the i-th parameter to the first value, and i is greater than or equal to 1. and an integer less than or equal to N. If the i-th parameter among the N parameters is less than or equal to the quantization threshold value ⁇ 1 , the first device quantizes the i-th parameter to a third value. That is to say, if the i-th parameter is equal to the quantization threshold value ⁇ 1 , the first device can quantize the i-th parameter to the first value or the third value. In this case, the first device can randomly quantize the i-th parameter into the first value or the third value through random quantization processing.
  • At least one quantization threshold value includes two quantization threshold values, which are the first quantization threshold value ⁇ 1 and the second quantization threshold value - ⁇ 1 respectively.
  • the relevant information of the first model includes N parameters of the first model.
  • the above step 202 specifically includes: if the i-th parameter among the N parameters is greater than the first quantization threshold value ⁇ 1 , the first device quantizes the i-th parameter to a first value, i is greater than or equal to 1 and less than or An integer equal to N; if the i-th parameter among the N parameters is less than or equal to the first quantization threshold value ⁇ 1 and greater than or equal to the second quantization threshold value - ⁇ 1 , the first device quantizes the i-th parameter is the second value; if the i-th parameter among the N parameters is less than the second quantization threshold value - ⁇ 1 , the first device quantizes the i-th parameter to the third value.
  • the above step 202 specifically includes: if the i-th parameter among the N parameters is greater than or equal to the first quantization threshold value ⁇ 1 , the first device quantizes the i-th parameter to the first value, i is greater than or equal to 1 and an integer less than or equal to N; if the i-th parameter among the N parameters is less than the first quantization threshold value ⁇ 1 and greater than the second quantization threshold value - ⁇ 1 , the first device quantizes the i-th parameter is the second value; or, if the i-th parameter among the N parameters is less than or equal to the second quantization threshold value - ⁇ 1 , the first device quantizes the i-th parameter to the third value.
  • the first value is +1
  • the second value is 0, and the third value is -1.
  • the N parameters of the first model are the N weight gradients of the first model.
  • the i-th weight gradient among the N weight gradients is expressed as When the weight gradient is greater than the first quantization threshold ⁇ 1 , then the weight gradient Quantize to +1. When the weight gradient is less than the second quantization threshold value - ⁇ 1 , then the weight gradient Quantized to -1.
  • the weight gradient When it is less than or equal to the first quantization threshold value ⁇ 1 and greater than or equal to the second quantization threshold value - ⁇ 1 , then the weight gradient Quantized to 0. Therefore, the ith weight gradient s i after quantization processing can be expressed by the following formula 2:
  • the first device can quantize the parameters of the first model through multiple quantization threshold values, which is beneficial to improving the quantization accuracy. Improve the convergence speed and performance of the model.
  • s i can take a value of 0, which means that when the value of the i-th parameter falls in the interval between the second quantization threshold value and the first quantization threshold value, the i-th parameter A device may not update the i-th parameter. For example, if the i-th parameter is caused by training noise, then the first device does not update the i-th parameter, which is beneficial to improving the accuracy of the first model trained by the second device.
  • the first device quantizes the i-th parameter to the first value, and i is greater than or equal to An integer equal to 1 and less than or equal to N; if the i-th parameter among the N parameters is less than or equal to the first quantization threshold value ⁇ 1 and greater than or equal to the second quantization threshold value - ⁇ 1 , the first device will The i-th parameter is quantized to a second value; if the i-th parameter among the N parameters is less than the second quantization threshold value - ⁇ 1 , the first device quantizes the i-th parameter to a third value.
  • the first device can quantize the i-th parameter into the first value or the second value.
  • the first device can randomly quantize the i-th parameter into the first value or the second value through random quantization processing.
  • the first device quantizes the i-th parameter to a first value, where i is an integer greater than or equal to 1 and less than or equal to N; if the i-th parameter among the N parameters is less than or equal to the first quantization threshold value ⁇ 1 and greater than or equal to the second quantization threshold value - ⁇ 1 , the first device quantizes the i-th parameter to a second value; if the i-th parameter among the N parameters is less than or equal to the second quantization threshold value - ⁇ 1 , the first device quantizes the i-th parameter to a third value.
  • the first device may quantize the i-th parameter to a second value or a third value.
  • the first device may randomly quantize the i-th parameter to a second value or a third value by random quantization processing.
  • the at least one quantization threshold value includes one quantization threshold value and two quantization threshold values.
  • the at least one quantization threshold value may include three quantization threshold values, four quantization threshold values, or more quantization threshold values. This application does not limit the details, and no examples are given here.
  • the relevant information of the first model includes N parameters of the first model after quantization error compensation.
  • N parameters after quantization error compensation please refer to the relevant introduction of step 202a below.
  • step 202a which may be performed before step 202.
  • the first device performs error compensation on the N parameters of the first model according to the quantization errors corresponding to the N parameters of the first model, and obtains N parameters after quantization error compensation.
  • the N parameters of the first model are obtained by the first device performing the Qth round of training on the first model.
  • the quantization error corresponding to the i-th parameter among the N parameters is determined by the i-th parameter obtained by performing Q-1 round of training on the first model by the first device and undergoing quantization error compensation.
  • the i-th parameter among the N parameters of the first model is the i-th weight gradient
  • the i-th weight gradient after quantization error compensation can be expressed as in, Represents the i-th weight gradient obtained in the Q-1 round of training after quantization error compensation, and eta is the global learning rate. expresses right Perform quantification processing.
  • the first device can determine the quantization error corresponding to the i-th parameter obtained in the Q+1 round of training. This facilitates the first device to perform quantization error compensation on the N parameters obtained in the Q+1 round of training.
  • the relevant information of the above first model includes N parameters after quantization error compensation.
  • the above step 202 specifically includes: the first device quantizes the N parameters after quantization error compensation according to the at least one quantization threshold value.
  • the specific quantization process please refer to the relevant introduction of the above step 202.
  • the first device performs quantization error compensation on the N parameters of the first model with the quantization errors corresponding to the N parameters, which is beneficial to improving the accuracy of the second device updating the first model and improving Model training performance.
  • the relevant information of the first model includes N parameters of the first model that have been sparsely processed.
  • N parameters of the first model please refer to the relevant introduction in step 202b below.
  • Step 202b may be performed before step 202.
  • the first device selects N parameters from K parameters of the first model according to the common sparse mask, and obtains N parameters of the first model that have been sparsely processed.
  • the K parameters of the first model are obtained by the first device performing a round of training on the first model.
  • the K parameters of the first model are obtained by the first device performing a round of training on the first model and performing quantization error compensation.
  • the process of quantization error compensation by the first device for the K parameters is similar to the aforementioned step 202a.
  • the common sparse mask is a bit sequence, and the bit sequence includes K bits.
  • K bits correspond to the K parameters one-to-one.
  • the value of one bit among the K bits is 0, it is used to instruct the first device not to select the parameter corresponding to the bit.
  • the value of one bit among the K bits is 1, it is used to instruct the first device to select the parameter corresponding to the bit.
  • the value of one bit among the K bits is 0, it is used to instruct the first device to select the parameter corresponding to the bit.
  • the K parameters include the ten weight gradients of the first model.
  • the bit sequence is 1000111100, which corresponds to ten weight gradients from high to low.
  • the first bit of the bit sequence corresponds to the first weight gradient of the ten weight gradients.
  • the second bit of the bit sequence corresponds to the second weight gradient of the ten weight gradients, and so on.
  • the tenth bit of the bit sequence corresponds to the tenth weight gradient among the ten weight gradients. It can be seen that the relevant information of the first model includes the first weight gradient, the fifth weight gradient, the sixth weight gradient, the seventh weight gradient and the eighth weight gradient among the ten weight gradients.
  • Implementation Mode 1 The public sparse mask is determined by the first device based on the sparse ratio and pseudo-random number.
  • the sparse ratio is indicated by the second device to the first device.
  • multiple devices need to use the same common sparse mask, so that each of the multiple devices can send parameters of the same index of the model configured on each device to the second device.
  • the multiple devices may send parameters with the same index through the same time-frequency resource. It is beneficial to reduce the communication resources required by multiple devices to report model parameters. Improve the utilization of communication resources. This supports the second device to receive parameters with the same index sent by multiple devices on the same time-frequency resource. That is, the second device is supported to achieve model fusion through the superposition of air signals.
  • the second device may indicate different sparse ratios to the first device at different training stages. For example, at the beginning of training, the sparsity ratio can be smaller. This facilitates the second device to obtain more relevant information about the model and achieve rapid convergence of the model. During the training convergence phase, this sparsity ratio can be larger.
  • Implementation method 2 is introduced below in conjunction with step 201e.
  • Step 201e may be performed before step 202b.
  • the second device sends the second instruction information to the first device.
  • the first device receives the second indication information from the second device.
  • the second indication information is used to indicate the common sparse mask.
  • the above step 202 specifically includes: the first device performs quantization processing on the N parameters of the first model that have undergone sparse processing according to the at least one quantization threshold value.
  • the first device performs the Q-th round of training on the first model to obtain K parameters of the first model.
  • the first device quantizes errors on the K parameters of the first model to obtain K parameters after quantization error compensation.
  • the first device selects N parameters from the K parameters after quantization error compensation according to the common sparse mask, and then performs quantization processing on the N parameters according to the at least one quantization threshold value.
  • the first device selects some parameters of the first model based on the common sparse mask, which is beneficial to reducing the overhead of the first device reporting parameters of the first model.
  • Step 201e may be performed first, and then step 201a, step 201b, step 201c and step 201 may be performed.
  • step 201a, step 201b, step 201c, and step 201 may be performed first, and then step 201e may be performed; or, depending on the situation, step 201e, step 201a, step 201b, step 201c, and step 201 may be performed simultaneously.
  • the first device sends the first information to the second device.
  • the first information is used to indicate relevant information of the first model after quantization processing.
  • the second device receives the first information from the first device.
  • the first information includes information related to the quantized first model.
  • the relevant information of the first model includes N parameters of the first model, and the first information includes the N parameters of the first model after quantization processing.
  • the first information is indication information
  • the indication information indicates relevant information of the first model after quantization processing.
  • the relevant information of the first model includes N parameters of the first model after quantization processing.
  • a possible implementation of the above step 203 is introduced below.
  • the above step 203 specifically includes step 2003a and step 2003b.
  • the first device modulates the N parameters of the quantized first model to obtain N first signals.
  • the N first signals correspond to the N parameters one-to-one.
  • the first device sends N first signals to the second device.
  • the second device receives the N first signals from the first device.
  • the first device modulates the i-th parameter among the N parameters of the quantized first model to obtain the i-th first signal.
  • the i-th first signal corresponds to two sequences, each of the two sequences includes at least one symbol.
  • the following describes two possible implementation methods for the first device to send the two sequences, so as to facilitate the second device to determine the value of the i-th parameter after the quantization process.
  • Implementation Mode 1 When the i-th parameter after quantization is the first value, the transmission power of the first sequence of the two sequences sent by the first device is less than the transmission power of the second sequence of the two sequences sent by the first device. Transmit power. When the i-th parameter after quantization is the second value, the transmission power of the first device sending the first sequence of the two sequences is equal to the transmission power of the first device sending the second sequence of the two sequences. . When the i-th parameter after quantization is the third value, the transmission power of the first sequence of the two sequences sent by the first device is greater than the transmission power of the second sequence of the two sequences sent by the first device.
  • the first sequence of the two sequences is an all-0 sequence
  • the second sequence is a non-all-0 sequence
  • both sequences are all-0 sequences.
  • the ith parameter after quantization is the third value
  • the first sequence of the two sequences is a non-all-0 sequence
  • the second sequence is an all-0 sequence.
  • the first value is +1
  • the second value is 0, and the third value is -1.
  • the i-th first signal carries the i-th parameter si , and the two sequences corresponding to the i-th parameter.
  • the corresponding two sequences are shown in Table 3:
  • c 0 and c 1 are both sequences of specific length.
  • the length of c 0 and the length of c 1 are both 1, that is, they both include one symbol.
  • both c 0 and c 1 can be Zadoff–Chu sequences, and the Zadoff–Chu sequences can be referred to as ZC sequences for short.
  • Implementation Mode 2 When the i-th parameter after quantization is the first value, the transmission power of the first sequence of the two sequences sent by the first device is greater than the transmission power of the second sequence of the two sequences sent by the first device. Transmit power. When the i-th parameter after quantization is the second value, the transmission power of the first device sending the first sequence of the two sequences is equal to the transmission power of the first device sending the second sequence of the two sequences. . When the i-th parameter after quantization is the third value, the transmission power of the first device in transmitting the first sequence of the two sequences is less than the transmission power of the first device in transmitting the second sequence of the two sequences.
  • the first sequence of the two sequences is a non-all-zero sequence
  • the second sequence is an all-zero sequence.
  • both sequences are all-0 sequences.
  • the i-th parameter after quantization is the third value
  • the first sequence of the two sequences is an all-zero sequence
  • the second sequence is a non-all-zero sequence.
  • the first value is +1
  • the second value is 0, and the third value is -1.
  • the i-th first signal carries the i-th parameter si , and the two sequences corresponding to the i-th parameter. For various values of the i-th parameter, the corresponding two sequences (ie, sequence 1 and sequence 2) are shown in Table 4:
  • the first value, the second value and the third value can also be other values, which are not specifically limited in this application.
  • the first value is 0.7
  • the second value is 0,
  • the third value is -0.7.
  • the quantized N parameters of the first model are obtained.
  • the first device modulates the N parameters of the quantized first model, maps the modulated sequence to the corresponding time-frequency resource, and performs waveform shaping to obtain the N first signals.
  • the first device sends the N first signals to the second device.
  • the first device modulates each of the N parameters of the first model to two sequences.
  • the first device controls the transmission power used to transmit each of the two sequences, thereby facilitating the second device to determine the value of the parameter.
  • the first device does not need to perform channel estimation and equalization, and therefore does not need corresponding pilot overhead.
  • the first device receives at least one quantization threshold value from the second device. Then, the first device performs quantization processing on the relevant information of the first model of the first device according to at least one quantization threshold value. The first device sends first information to the second device, where the first information is used to indicate relevant information of the quantized first model. This reduces the communication overhead of the first device reporting related information of the first model and saves communication resources.
  • the embodiment shown in Figure 2 also includes step 204 and step 205. Steps 204 and 205 may be performed after step 203.
  • the second device determines global information of the first model based on the first information.
  • the global information of the first model includes global output parameters of the first model.
  • the global information of the first model includes global update parameters and/or global learning rate of the first model.
  • the global output parameters of the first model can be understood as the global output data of the first model.
  • the global update parameters of the first model include global weight parameters or global weight gradients of the first model.
  • the global information of the first model includes N global parameters of the first model, and the global parameters are output parameters or update parameters.
  • N global parameters are output parameters or update parameters.
  • the first information includes N parameters of the first model after quantization processing
  • the second device can determine the global learning rate ⁇ based on the N parameters of the first model.
  • the N parameters of the first model after quantization include N weight gradients obtained by performing Q-th round of training on the first model by the first device and undergoing quantization.
  • the N weight gradients of the first model are represented by vector ⁇ w Q. That is, the vector ⁇ w Q includes N weight gradients obtained by the first device performing the Q-th round of training on the first model.
  • the vector ⁇ w q includes the quantized weight parameters in the vector ⁇ w Q that are not 0.
  • the first device may also send the sixth information to the second device.
  • the sixth information is used to indicate the average of the absolute values of the quantized parameters among the N parameters of the first model that are not 0.
  • the second device determines the global learning rate based on the sixth information.
  • the N parameters of the first model are the N weight gradients obtained by performing the Q-th round of training on the first model by the first device.
  • the N weight gradients of the first model are represented by the vector ⁇ w Q.
  • the global learning rate eta is variable.
  • the global learning rate eta is a constant that changes with the number of training epochs.
  • the second device is used to determine the global learning rate based on the first information.
  • the second device may determine the global learning rate based on the second information.
  • the second device determines the global learning rate based on the second information and the third information, which is not limited in this application.
  • the first model is a neural network model.
  • the relevant information of the first model includes relevant parameters of neurons in all layers of the neural network model.
  • the N global parameters of the first model included in the global information of the first model in step 204 are global parameters of neurons in all layers.
  • the at least one quantization threshold value and the global learning rate are uniformly set for the neurons in each layer of the neural network model.
  • the first model is a neural network model.
  • the relevant information of the first model includes relevant parameters of the neurons in layer P of the neural network model, where P is an integer greater than or equal to 1.
  • the N global parameters of the first model included in the global information of the first model in step 204 are global parameters of the neurons of the P layer.
  • the at least one quantization threshold and the global learning rate are uniformly set for the neurons of the P layer in the neural network model. For neurons in other layers of the neural network model except the P layer, corresponding quantization thresholds and global learning rates should be determined separately.
  • step 203 a the embodiment shown in FIG. 2 further includes step 203 a , and step 203 a may be performed before step 204 .
  • the third device sends fifth information to the second device.
  • the fifth information is used to indicate relevant information of the second model after quantization processing.
  • the second device receives the fifth information from the third device.
  • the quantized related information of the second model is obtained by the third device quantizing the related information of the second model according to the at least one quantization threshold value.
  • the specific quantization process please refer to the relevant introduction of the aforementioned step 202.
  • the relevant information of the second model includes N parameters of the second model.
  • the second model please refer to the relevant introduction mentioned above.
  • the above step 203a specifically includes step 1 and step 2.
  • Step 1 The third device modulates N parameters of the second model to obtain N second signals.
  • the N second signals carry N parameters of the second model, and the N second signals correspond to the N parameters of the second model one-to-one.
  • Step 2 The third device sends the N second signals to the second device.
  • the second device receives N second signals from the third device.
  • Step 1 to step 2 are similar to the aforementioned step 2003a to step 2003b.
  • Step 1 to step 2 are similar to the aforementioned step 2003a to step 2003b.
  • the i-th first signal among the N first signals corresponds to the first sequence and the second sequence.
  • the first sequence is the first sequence of the two sequences corresponding to the i-th first signal
  • the second sequence is the second sequence of the two sequences corresponding to the i-th first signal.
  • the i-th second signal among the N second signals corresponds to the third sequence and the fourth sequence.
  • the third sequence is the first sequence of the two sequences corresponding to the i-th second signal
  • the fourth sequence is the second sequence of the two sequences corresponding to the i-th second signal.
  • i is an integer greater than or equal to 1 and less than or equal to N.
  • the time-frequency resources used by the first device to send the first sequence are the same as the time-frequency resources used by the third device to send the third sequence.
  • the time-frequency resources used by the first device to send the second sequence are the same as the time-frequency resources used by the third device to send the fourth sequence. This supports the second device to achieve non-coherent reception of multi-user air signal superposition transmission.
  • step 203 may be executed first, and then step 203a may be executed; or step 203a may be executed first, and then step 203 may be executed; or step 203 and step 203a may be executed at the same time depending on the situation, and this application does not limit the details.
  • the above step 204 specifically includes: the second device determines the global information of the first model based on the first information and the fifth information.
  • the second device determines the global information of the first model based on N first signals and N second signals.
  • the following describes the above step 204 by taking the i-th first signal among the N first signals corresponding to the first sequence and the second sequence, and the i-th second signal among the N second signals corresponding to the third sequence and the fourth sequence as an example.
  • the time-frequency resources used by the first device to send the first sequence are the same as the time-frequency resources used by the third device to send the third sequence.
  • the time-frequency resources used by the first device to send the second sequence are the same as the time-frequency resources used by the third device to send the fourth sequence.
  • step 204 specifically includes steps 204a to 204c.
  • the second device determines that the second device receives the sum of the first signal energies of the first sequence and the third sequence.
  • the first signal energy sum can be expressed as
  • the second device determines that the second device receives the sum of the second signal energies of the second sequence and the fourth sequence.
  • the second signal energy sum can be expressed as
  • the second device determines the i-th global parameter among the N global parameters based on the first signal energy sum and the second signal energy sum.
  • the aforementioned step 204c specifically includes:
  • the second device determines that the value of the i-th global parameter is the first value; or, if the first signal energy sum and the decision threshold value are The sum of the values is greater than or equal to the second signal energy sum, and the sum of the second signal energy sum and the decision threshold value is greater than or equal to the first signal energy sum, then the second device determines that the value of the i-th global parameter is the second value; or, if the sum of the second signal energy sum and the decision threshold value is less than the first signal energy sum, the second device determines that the value of the i-th global parameter is the third value.
  • the global information of the first model includes N global weight gradients of the first model.
  • the i-th global weight gradient a i of the N global weight gradients can be expressed as Formula 3:
  • ⁇ 2 is the decision threshold value
  • the energy sum of the first signal can be expressed as
  • the energy sum of the second signal can be expressed as
  • the aforementioned step 204c specifically includes:
  • the second device determines that the value of the i-th global parameter is the first value; or, if the first signal energy sum is less than or equal to the i-th global parameter The sum of the energy sum of the two signals and the decision threshold value, and the sum of the energy sum of the second signal is less than or equal to the sum of the energy sum of the first signal and the decision threshold value, then the second device determines that the value of the i-th global parameter is the second value; or, if the second signal energy sum is greater than the sum of the first signal energy sum and the decision threshold value, the second device determines that the value of the i-th global parameter is the third value.
  • the global information of the first model includes N global weight gradients of the first model.
  • the i-th global weight gradient a i of the N global weight gradients can be expressed as Formula 4:
  • ⁇ 2 is the decision threshold
  • the first signal energy can be expressed as
  • the second signal energy sum can be expressed as
  • steps 204a to 204c shows the process of the second device determining the i-th global parameter.
  • the second device may use a similar process to determine other global parameters among the N global parameters, which will not be explained one by one here.
  • the second device may determine the decision threshold in combination with the N first signals and/or the N second signals.
  • the first device sends the i-th first signal to the second device
  • the third device sends the i-th second signal to the second device.
  • the i-th first signal and the i-th second signal occupy the same time-frequency resources.
  • the other first signals and second signals are also similar and will not be explained one by one here.
  • the decision threshold value ⁇ 2 mean(abs(
  • b is a control factor, used to control the threshold of the decision, affecting the number of non-zero elements in the global parameters and the update of the first model.
  • the second device can determine the i-th global parameter by receiving the signal energy of the two sequences corresponding to the i-th first signal and the signal energy of the two sequences corresponding to the i-th second signal. This supports the second device to achieve non-coherent reception of multi-user air signal superposition transmission and achieve robustness to fading channels.
  • the second device may determine the global learning rate based on the first information and the fifth information.
  • the N parameters of the first model after quantization include N weight gradients obtained by performing Q-th round of training on the first model by the first device and undergoing quantization.
  • the N weight gradients of the first model are represented by vector ⁇ w Q. That is, the vector ⁇ w Q includes N weight gradients obtained by the first device performing the Q-th round of training on the first model.
  • the N parameters of the quantized second model include N weight gradients obtained by performing the R-th round of training on the second model by the second device and undergoing quantization processing.
  • the N weight gradients of the second model are represented by vector ⁇ w R. That is, the vector ⁇ w R includes N weight gradients obtained by the second device performing the Q-th round of training on the second model.
  • the vector ⁇ w q includes the quantized weight parameters in the vector ⁇ w Q that are not 0.
  • the vector ⁇ w r includes the quantized non-zero weight gradient in the vector ⁇ w R .
  • the first device may send the sixth information to the second device.
  • the sixth information is used to indicate the absolute value of a parameter value that is not 0 after quantization among the N parameters of the first model.
  • the third device sends seventh information to the second device.
  • the seventh information is used to indicate the average of the absolute values of the quantized parameters among the N parameters of the second model that are not 0.
  • the second device determines the global learning rate based on the sixth information and the seventh information.
  • the N parameters of the first model after quantization include N weight gradients obtained by performing the Qth round of training on the first model by the first device and undergoing quantization.
  • the N weight gradients of the first model are represented by vector ⁇ w Q. That is, the vector ⁇ w Q includes N weight gradients obtained by the first device performing the Q-th round of training on the first model.
  • the N parameters of the quantized second model include N weight gradients obtained by performing the R-th round of training on the second model by the second device and undergoing quantization processing.
  • the N weight gradients of the second model are represented by vector ⁇ w R. That is, the vector ⁇ w R includes N weight gradients obtained by the second device performing the Q-th round of training on the second model.
  • the first device indicates to the second device through the sixth information the average mean(abs( ⁇ w q )) of the absolute values of the quantized weight gradient values that are not 0 in the vector ⁇ w Q .
  • the second device sends fourth information to the first device.
  • the fourth information is used to indicate global information of the first model determined by the second device.
  • the first device receives the fourth information from the second device.
  • the fourth information includes global information of the first model determined by the second device.
  • the fourth information indicates global information of the first model determined by the second device.
  • the second device encodes or modulates the global information of the first model to obtain the fourth information, and indicates the global information of the first model to the first device through the fourth information.
  • global information about the first model please refer to the relevant introduction mentioned above.
  • the fourth information includes N global weight gradients of the first model determined by the second device.
  • w Q-1 is the Q-1 round of updates performed by the first device on the first model to obtain the global weight parameters of the first model.
  • w Q is the global weight parameter of the first model obtained by the first device performing the Q-th round of updating the first model.
  • eta is the global learning rate.
  • the fourth information includes N global output parameters of the first model determined by the second device.
  • the first device can perform the Q+1 round of training on the first model to obtain N actual output parameters of the first model.
  • the first device trains the first model according to the N actual output parameters and the N global output parameters to obtain the weight parameters of the first model.
  • step 201d may be performed before step 203.
  • the second device sends the first instruction information to the first device.
  • the first instruction information is used to instruct the first device to send the first information to the second device the number of times L.
  • the first device receives the first indication information from the second device.
  • L is an integer greater than or equal to 1.
  • the above step 203 specifically includes: the first device sends the first information to the second device L times.
  • the second device receives the first information from the first device L times.
  • the second device may instruct the first device to repeatedly send the first information to the second device.
  • the second device's energy-based gradient decision may result in decision errors due to the randomness of channel noise and signal incoherent superposition. Therefore, the first device repeatedly sends the first information, which is conducive to the second device selecting the decision result with the most occurrences as the best decision result after making separate decisions, thereby reducing the probability of decision errors and improving the performance of model training.
  • the first device quantizes the N parameters of the first model and obtains the quantized N parameters of the first model.
  • the first device modulates the N parameters of the quantized first model.
  • the first device can map the modulated sequence to the corresponding time-frequency resource according to the number of transmissions L, and perform waveform shaping to obtain the corresponding first signal.
  • the first device sends the first signal to the second device. For example, if L is equal to 2, the first device can map the modulated sequence to the corresponding time-frequency resource twice.
  • the number of transmissions L can be set in combination with at least one factor including the training stage of the model, the number of users participating in model training, and the signal-to-noise ratio of the channel. For example, in the later stages of training, when the number of users participating in model training is small and the signal-to-noise ratio is low, the number of sending times can be larger.
  • the second device determines the global learning rate in combination with the first information to introduce the technical solution of the present application.
  • the second device may determine the global learning rate in combination with the first information and/or the third information, which is not limited in this application.
  • the embodiment shown in FIG. 2 described above is a solution in which the second device determines at least one quantization threshold value based on the second information and the third information.
  • the second device may send the third information to the first device.
  • the first device determines the at least one quantization threshold value by itself based on the second information and the third information, which is not limited in this application.
  • This application also provides another embodiment, which is similar to the embodiment shown in FIG. 2 , except for step 204 .
  • the above step 204 is replaced by step 2004a, and this embodiment also includes step 2004b and step 2004c. Steps 2004b and 2004c may be performed before step 205.
  • the second device sends the first information to the fourth device.
  • the fourth device receives the first information from the second device.
  • the second device is a network device
  • the fourth device is a server.
  • the server may receive the first information sent from the network device.
  • the fourth device determines global information of the first model based on the first information.
  • Step 2004b is similar to step 204 in the embodiment shown in FIG. 2 .
  • Step 2004b is similar to step 204 in the embodiment shown in FIG. 2 .
  • this embodiment also includes step 2004d, which may be executed before step 2004b.
  • the second device sends the fifth information to the fourth device.
  • the fourth device receives the fifth information from the second device.
  • step 203a in the embodiment shown in FIG. 2 .
  • Step 2004a may be performed first and then step 2004d; or step 2004d may be performed first and then step 2004a; or step 2004a and step 2004d may be performed simultaneously depending on the situation.
  • the fourth device sends fourth information to the second device, where the fourth information is used to indicate the determined global information of the second model.
  • the second device receives the fourth information from the fourth device.
  • step 205 in the embodiment shown in FIG. 2 , which will not be described again here.
  • the first device may be a first terminal device.
  • the second device may be a network device.
  • the third device may be a second terminal device.
  • the fourth device may be a server.
  • the above embodiment describes a process in which the server obtains relevant information about the models of the terminal devices managed by the network device, and combines the relevant information of these models to determine the global information of the first model.
  • the server may obtain relevant information about models of terminal devices managed by multiple network devices respectively, and combine the relevant information of these models to determine the process of global information of the first model, which is not limited in this application.
  • Figure 4 is a schematic diagram of another embodiment of the communication method according to the embodiment of the present application. See Figure 4. Methods include:
  • the first device sends third indication information to the second device.
  • the third indication information is used to indicate the indexes of the N parameters with the largest absolute values among the K parameters obtained by the first device through a round of training on the first model.
  • the second device receives the third indication information from the first device.
  • the K parameters of the first model are obtained by the first device performing a round of training on the first model.
  • the first device determines the N parameters whose corresponding values among the K parameters have the largest absolute values. Then, the first device sends the third instruction information to the second device.
  • the third indication information is a bit sequence, the bit sequence includes K bits, and the K bits correspond to the K parameters of the first model one-to-one.
  • the value of a bit in the bit sequence is 0, it indicates that the first device does not indicate the parameter corresponding to the bit; when the value of a bit in the bit sequence is 1, it indicates that the first device indicates the parameter corresponding to the bit. parameters.
  • bit sequences please refer to the relevant introduction in Figure 5 below.
  • the third device sends fourth instruction information to the second device.
  • the fourth indication information is used to indicate the indexes of N parameters corresponding to the largest absolute values among the K parameters of the second model of the second device.
  • the second device receives the fourth indication information from the first device.
  • the K parameters of the second model are obtained by the third device performing one round of training on the second model.
  • the third device determines the N parameters corresponding to the largest absolute values among the K parameters of the second model. Then, the third device sends the fourth instruction information to the second device.
  • the format of the fourth indication information is similar to that of the third indication information.
  • the second device determines the common sparse mask according to the third indication information and the fourth indication information.
  • FIG. 4 describes the process in which the second device determines the common sparse mask based on the third indication information and the fourth indication information.
  • the second device may receive indication information sent by each device in the plurality of devices to indicate corresponding N parameters with the largest absolute values among the K parameters of the model of the device. Then, the second device determines the common sparse mask by combining the indication information of the plurality of devices.
  • the network device may receive instructions from each of the plurality of terminal devices indicating the corresponding N parameters with the largest absolute values among the K parameters of the model of the terminal device. information.
  • the first terminal device sends a first bit sequence to the network device, and the first bit sequence is 110010100.
  • Each bit in the first bit sequence corresponds to one of K parameters of the model of the first terminal device, that is, K is equal to 9.
  • the first bit in the first bit sequence corresponds to the first parameter among the K parameters
  • the second bit corresponds to the second parameter among the K parameters
  • the last bit corresponds to the K parameter. the last parameter.
  • the parameters corresponding to the bits with a value of 1 in the first bit sequence are the four parameters corresponding to the largest absolute values among the nine parameters.
  • the first terminal device indicates the index of the four parameters to the network device through the first bit sequence.
  • the second terminal device sends a second bit sequence to the network device, and the second bit sequence is 101000101.
  • Each bit in the second bit sequence corresponds to one of the K parameters of the model of the second terminal device, that is, K is equal to 9.
  • the parameters corresponding to the bits with a value of 1 in the second bit sequence are the four parameters corresponding to the largest absolute values among the nine parameters.
  • the second terminal device indicates the index of the four parameters to the network device through the second bit sequence.
  • the third terminal device sends a third bit sequence to the network device, and the third bit sequence is 110001001. Each bit in the third bit sequence corresponds to one of the K parameters of the model of the third terminal device, that is, K is equal to 9.
  • the parameters corresponding to the bits with a value of 1 in the third bit sequence are the four parameters corresponding to the largest absolute values among the nine parameters.
  • the third terminal device indicates the index of the four parameters to the network device through the third bit sequence.
  • the network device determines the common sparse mask based on the first bit sequence, the second bit sequence and the third bit sequence. As shown in Figure 5, the public sparse mask is a bit sequence, specifically 110001101.
  • the network device instructs the terminal device to report the model parameter corresponding to the bit with a value of 1 in the bit sequence through the bit sequence. This reduces the overhead of terminal devices reporting model parameters and saves communication resources.
  • FIG. 6 is a schematic structural diagram of the first device according to the embodiment of the present application.
  • the first device 600 may be used to perform the steps performed by the first device in the embodiments shown in Figures 2 and 4.
  • FIG. 6 is a schematic structural diagram of the first device according to the embodiment of the present application.
  • the first device 600 may be used to perform the steps performed by the first device in the embodiments shown in Figures 2 and 4.
  • FIG. 6 is a schematic structural diagram of the first device according to the embodiment of the present application.
  • the first device 600 may be used to perform the steps performed by the first device in the embodiments shown in Figures 2 and 4.
  • FIG. 2 is a schematic structural diagram of the first device according to the embodiment of the present application.
  • the first device 600 may be used to perform the steps performed by the first device in the embodiments shown in Figures 2 and 4.
  • FIG. 2 is a schematic structural diagram of the first device according to the embodiment of the present application.
  • the first device 600 may be used to perform the steps performed by the first device in the
  • the first device 600 includes a transceiver module 601 and a processing module 602.
  • Transceiver module 601 configured to receive at least one quantization threshold value from the second device
  • the processing module 602 is configured to perform quantization processing on the relevant information of the first model of the first device 600 according to at least one quantization threshold value;
  • the transceiver module 601 is also configured to send first information to the second device, where the first information is used to indicate relevant information of the quantized first model.
  • the relevant information of the first model includes: the output parameters or update parameters of the first model, and the update parameters include the weight gradient or weight parameters of the first model.
  • the transceiver module 601 is also used to:
  • the second information is used to indicate that the first device 600 performs the Mth round of training on the first model and obtains the relevant information after processing.
  • the relevant information of the first model is the first device 600 performs the Qth round of training on the first model.
  • M is an integer greater than or equal to 1 and less than Q
  • Q is an integer greater than 1.
  • the relevant information of the first model includes the output parameters of the first model, and the information obtained after processing the relevant information of the first model includes the average of the absolute values of the output parameters of the first model. ;or,
  • the related information of the first model includes the updated parameters of the first model, and the information obtained after processing the related information of the first model includes the average of the absolute values of the updated parameters of the first model.
  • the transceiver module 601 is also used to:
  • Third information is received from the second device, and the third information is used to indicate global information of the first model.
  • the global information of the first model includes the global output parameters of the first model; or, the global information of the first model includes the global update parameters and/or the global learning rate of the first model.
  • the relevant information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1; the processing module 602 is specifically used to:
  • the transceiver module 601 is specifically used for:
  • N first signals are sent to the second device.
  • At least one quantization threshold includes a first quantization threshold and a second quantization threshold; the processing module 602 is specifically used to:
  • i-th parameter among the N parameters is greater than the first quantization threshold value, quantize the i-th parameter to the first value, where i is an integer greater than or equal to 1 and less than or equal to N; or,
  • the i-th parameter among the N parameters is less than or equal to the first quantization threshold and greater than or equal to the second quantization threshold, quantize the i-th parameter to the second value; or,
  • the i-th parameter among the N parameters is less than the second quantization threshold value, the i-th parameter is quantized to a third value.
  • the transceiver module 601 is specifically used to:
  • the transmission power of the first sequence of the two sequences sent by the first device 600 is less than the transmission power of the second sequence of the first device 600 of the two sequences. Transmitting power; when the i-th parameter after quantization is the second value, the transmitting power of the first sequence of the two sequences sent by the first device 600 is equal to the transmitting power of the first sequence of the two sequences sent by the first device 600 The transmission power of The transmit power of the two sequences.
  • the first sequence of the two sequences is a non-all-0 sequence
  • the second sequence is an all-0 sequence
  • both sequences are all-0 sequences
  • the first sequence of the two sequences is an all-0 sequence.
  • the second sequence is a non-all-zero sequence.
  • the transceiver module 601 is specifically used to:
  • the transceiver module 601 is also used to:
  • First instruction information is received from the second device, and the first instruction information is used to instruct the first device 600 to send the first information to the second device a number of times L.
  • the relevant information of the first model includes N parameters of the first model after quantization error compensation, and the N parameters after quantization error compensation are the first device 600 according to the first device 600
  • the quantization errors corresponding to the N parameters obtained by the Q-th round of training of the model are obtained by error compensation for the N parameters.
  • the quantization error corresponding to the i-th parameter among the N parameters is obtained by performing the first device 600 on the first model. Determined by the i-th parameter obtained in the Q-1 round of training and the i-th parameter obtained in the Q-1 round of training after quantization error compensation, i is an integer greater than or equal to 1 and less than or equal to N, and N is greater than Or an integer equal to 1, Q is an integer greater than 1.
  • the relevant information of the first model includes N parameters obtained by sparse processing of the first model; the N parameters obtained by sparse processing of the first model are obtained by the first device 600 according to the public sparse mask.
  • the code selects N parameters from the K parameters of the first model.
  • the K parameters of the first model are parameters obtained by the first device 600 performing a round of training on the first model. K is an integer greater than or equal to N, and K is An integer greater than or equal to 1, N is an integer greater than or equal to 1.
  • the public sparse mask is a bit sequence, the bit sequence includes K bits, and the K bits correspond to K parameters one-to-one; when one of the K bits has a value of 0 , used to instruct the first device 600 not to select the parameter corresponding to the bit; when the value of one bit among the K bits is 1, used to instruct the first device 600 to select the parameter corresponding to the bit.
  • the public sparse mask is determined by the first device 600 based on the sparse ratio and the pseudo-random number, and the sparse ratio is indicated by the second device to the first device 600 .
  • the transceiver module 601 is also used to:
  • Second indication information is received from the second device, and the second indication information is used to indicate the common sparse mask.
  • the transceiver module 601 is also used to:
  • Third indication information is sent to the second device, where the third indication information is used to indicate the indexes of the N parameters with the largest absolute values corresponding to the K parameters.
  • the first model is a neural network model
  • the relevant information of the first model includes relevant parameters of neurons in layer P of the neural network model, where P is an integer greater than or equal to 1.
  • FIG 7 is another structural schematic diagram of the first device according to the embodiment of the present application.
  • the first device 700 can be used to perform the steps performed by the first device in the embodiment shown in Figure 4.
  • the relevant introduction of the above method embodiment please refer to the relevant introduction of the above method embodiment.
  • the first device 700 includes a transceiver module 701. Optionally, the first device 700 also includes a processing module 702.
  • the transceiver module 701 is configured to send first indication information to the second device.
  • the first indication information is used to indicate the index of the corresponding N parameters with the largest absolute value among the K parameters of the first model of the first device 700 .
  • the K parameters of the first model are the K parameters obtained by the first device 700 performing a round of training on the first model, K is an integer greater than or equal to the N, K is an integer greater than or equal to 1, and N is greater than or an integer equal to 1; receiving second indication information from the second device; the second indication information is used to indicate a public sparse mask, and the public sparse mask is determined by the second device based on the first indication information; the public sparse mask Used to instruct the first device 700 to report some parameters obtained by training the first model by the first device.
  • the second device provided in the embodiment of the present application is described below. Please refer to Figure 8, which is a schematic diagram of the structure of the second device in the embodiment of the present application.
  • the second device 800 can be used to execute the steps performed by the second device in the embodiments shown in Figures 2 and 4. For details, please refer to the relevant introduction of the above method embodiment.
  • the second device 800 includes a transceiver module 801.
  • the second device 800 also includes a processing module 802.
  • the transceiver module 801 is configured to send at least one quantization threshold value to the first device, and the at least one quantization threshold value is used to perform quantization processing on the relevant information of the first model of the first device; and receive the first quantization threshold value sent from the first device.
  • Information, the first information is used to indicate relevant information of the first model after quantization processing.
  • the relevant information of the first model includes: the output parameters or update parameters of the first model, and the update parameters include the weight gradient or weight parameters of the first model.
  • the transceiver module 801 is also used to:
  • the second information is used to indicate that the first device performs the Mth round of training on the first model and obtains it after processing.
  • the relevant information of the first model is the relevant information obtained by the first device performing the Qth round of training on the first model.
  • M is an integer greater than or equal to 1 and less than Q, and Q is an integer greater than 1;
  • the processing module 802 is configured to determine at least one quantization threshold value according to the second information.
  • the relevant information of the first model includes the output parameters of the first model, and the information obtained after processing the relevant information of the first model includes the average of the absolute values of the output parameters of the first model. ;or,
  • the related information of the first model includes the updated parameters of the first model, and the information obtained after processing the related information of the first model includes the average of the absolute values of the updated parameters of the first model.
  • the transceiver module 801 is also used to:
  • the relevant information of the second model is the relevant information obtained by the third device performing the R round of training on the second model.
  • S is an integer greater than or equal to 1 and less than R, and R is greater than an integer of 1;
  • the processing module 802 is configured to determine at least one quantization threshold value according to the second information and the third information.
  • processing module 802 is also used to:
  • the transceiver module 801 is also used for:
  • Fourth information is sent to the first device, where the fourth information is used to indicate global information of the first model.
  • the global information of the first model includes the global output parameters of the first model; or, the global information of the first model includes the global update parameters and/or the global learning rate of the first model.
  • the transceiver module 801 is also used to:
  • the processing module 802 is specifically used for:
  • Global information of the first model is determined based on the first information and the fifth information.
  • the relevant information of the first model includes N parameters of the first model, where N is an integer greater than or equal to 1; the relevant information of the second model includes N parameters of the second model;
  • the transceiver module 801 is specifically used for:
  • the transceiver module 801 is specifically used for:
  • the processing module 802 is specifically used for:
  • Global information of the first model is determined based on the N first signals and the N second signals.
  • the i-th first signal among the N first signals corresponds to the first sequence and the second sequence
  • the i-th second signal among the N second signals corresponds to the third sequence and the fourth sequence
  • the time-frequency resource used by the first device to send the first sequence is the same as the time-frequency resource used by the third device to send the third sequence
  • the time-frequency resource used by the first device to send the second sequence is the same as the time-frequency resource used by the third device to send the fourth sequence.
  • the time-frequency resources used are the same;
  • the global information of the first model includes N global parameters of the first model; i is an integer greater than or equal to 1 and less than or equal to N; the processing module 802 is specifically used to:
  • the i-th global parameter among the N global parameters is determined according to the first signal energy sum and the second signal energy sum.
  • processing module 802 is specifically used to:
  • the value of the i-th global parameter is determined to be the first value
  • the i-th global The value of the parameter is the second value
  • the value of the i-th global parameter is determined to be the third value.
  • the transceiver module 801 is also used to:
  • the first instruction information is sent to the first device.
  • the first instruction information is used to instruct the first device to send the first information to the second device 800 a number of times L, where L is an integer greater than or equal to 1.
  • the transceiver module 801 is further configured to:
  • Second instruction information is sent to the first device.
  • the second instruction information is used to indicate a common sparse mask.
  • the public sparse mask is used to instruct the first device to report some parameters obtained by training the first model by the first device.
  • the transceiver module 801 is also used to:
  • Receive fourth indication information from the third device is used to indicate the index of the N parameters with the largest absolute value among the K parameters of the second model of the third device, the K of the second model
  • the parameters are the K parameters obtained by the third device performing a round of training on the second model
  • the processing module 802 is also used to:
  • the common sparse mask is determined according to the third indication information and the fourth indication information.
  • FIG 9 is another structural schematic diagram of the second device according to the embodiment of the present application.
  • the second device 900 includes steps that can be performed by the second device in the embodiment shown in Figure 4.
  • steps please refer to the relevant introduction of the above method embodiment.
  • the second device 900 includes a transceiver module 901 and a processing module 902.
  • the transceiver module 901 is configured to receive first indication information from the first device.
  • the first indication information is used to indicate the index of the corresponding N parameters with the largest absolute value among the K parameters of the first model of the first device.
  • the K parameters of the first model are the K parameters obtained by the first device performing a round of training on the first model, K is an integer greater than or equal to the N, K is an integer greater than or equal to 1, N is an integer greater than or equal to an integer equal to 1;
  • the processing module 902 is configured to determine a public sparse mask according to the first indication information.
  • the public sparse mask is used to instruct the first device to report some parameters obtained by training the first model by the first device;
  • the transceiver module 901 is also configured to send second indication information to the first device, where the second indication information is used to indicate the common sparse mask.
  • the transceiver module 901 is also used to:
  • Receive third indication information from a third device where the third indication information is used to indicate the indexes of N parameters with the largest absolute values of corresponding values among K parameters of the second model of the third device, where the K parameters of the second model are K parameters obtained by the third device performing one round of training on the second model;
  • the processing module 902 is specifically used for:
  • the common sparse mask is determined according to the first indication information and the third indication information.
  • FIG. 10 is a schematic structural diagram of a terminal device 1000 provided by an embodiment of the present application.
  • the terminal device 1000 can be applied in the system as shown in Figure 1.
  • the terminal device 1000 can be the terminal device in the system of Figure 1 to perform the functions of the first device or the second device in the above method embodiment.
  • the terminal device 1000 includes a processor 1010 and a transceiver 1020.
  • the terminal device 1000 further includes a memory 1030.
  • the processor 1010, the transceiver 1020 and the memory 1030 can communicate with each other through internal connection channels and transmit control and/or data signals.
  • the memory 1030 is used to store computer programs, and the processor 1010 is used to retrieve data from the memory 1030.
  • the computer program is called and run to control the transceiver 1020 to send and receive signals.
  • the terminal device 1000 may also include an antenna 1040 for sending the uplink data or uplink control signaling output by the transceiver 1020 through wireless signals.
  • the above-mentioned processor 1010 and the memory 1030 can be combined into one processing device, and the processor 1010 is used to execute the program code stored in the memory 1030 to implement the above functions.
  • the memory 1030 can also be integrated in the processor 1010 or independent of the processor 1010 .
  • the processor 1010 may correspond to the processing module 602 in FIG. 6 .
  • the processor 1010 may correspond to the processing module 702 in FIG. 7 .
  • the processor 1010 may correspond to the processing module 802 in FIG. 8 .
  • the processor 1010 may correspond to the processor 902 in FIG. 9 .
  • the transceiver 1020 described above may correspond to the transceiver module 601 in FIG. 6 , or the transceiver 1002 may correspond to the transceiver module 701 in FIG. 7 .
  • the transceiver 1002 may correspond to the transceiver module 801 in FIG. 8 .
  • the transceiver 1002 may correspond to the transceiver module 901 in FIG. 9 .
  • the transceiver 1020 may also be called a transceiver unit.
  • the transceiver 1020 may include a receiver (or receiver, receiving circuit) and a transmitter (or transmitter, transmitting circuit). Among them, the receiver is used to receive signals, and the transmitter is used to transmit signals.
  • terminal device 1000 shown in Figure 10 can implement various processes involving the first device or the second device in the method embodiments shown in Figures 2 and 4.
  • the operations and/or functions of each module in the terminal device 1000 are respectively to implement the corresponding processes in the above device embodiment.
  • the above-mentioned processor 1010 may be used to perform actions implemented by the first device or the second device described in the previous device embodiments, and the transceiver 1020 may be used to perform the first device or the second device described in the previous device embodiments. sending and receiving actions.
  • the transceiver 1020 may be used to perform the first device or the second device described in the previous device embodiments. sending and receiving actions.
  • the above terminal device 1000 may also include a power supply 1050, used to provide power to various devices or circuits in the terminal device.
  • a power supply 1050 used to provide power to various devices or circuits in the terminal device.
  • the terminal device 1000 may also include one or more of the input unit 1060, the display unit 1070, the audio circuit 1080, the camera 1090, the sensor 1000, etc., the audio circuit A speaker 1082, a microphone 1084, etc. may also be included.
  • Figure 11 is a schematic structural diagram of a network device 1100 provided by an embodiment of the present application.
  • the network device 1100 can be applied in the system shown in Figure 1.
  • the network device 1100 can be in the system shown in Figure 1.
  • the network equipment is used to perform the functions of the first device or the second device in the above method embodiment. It should be understood that the following are only examples. In future communication systems, network equipment may have other forms and configurations.
  • the network device 1100 may include a CU, a DU, and an AAU.
  • a network device in an LTE communication system which is composed of one or more radio frequency units, such as a remote radio unit (RRU) and one or more base band units (BBU):
  • RRU remote radio unit
  • BBU base band units
  • the non-real-time part of the original BBU will be separated and redefined as CU, which is responsible for processing non-real-time protocols and services.
  • Some of the physical layer processing functions of the BBU will be merged with the original RRU and passive antenna into AAU, and the remaining functions of the BBU will be redefined as DU.
  • CU and DU are distinguished by the real-time nature of processing content, and AAU is a combination of RRU and antenna.
  • CU, DU, and AAU can be separated or combined. Therefore, there will be multiple network deployment forms.
  • One possible deployment form is shown in Figure 11, which is consistent with traditional 4G network equipment.
  • CU and DU are deployed on the same hardware. It should be understood that Figure 11 is only an example and does not limit the scope of protection of this application.
  • the deployment form can also be DU deployed in the BBU computer room, CU centralized deployment or DU centralized deployment, CU centralized at a higher level, etc.
  • the AAU 11100 that can implement transceiver functions is called a transceiver unit 11100, which corresponds to the transceiver module 601 in Figure 6 .
  • the AAU 11100 can implement the transceiver function, which is called the transceiver unit 11100, corresponding to the transceiver module 701 in Figure 7 .
  • the AAU 11100 can implement the transceiver function, which is called the transceiver unit 11100, corresponding to the transceiver module 801 in Figure 8 .
  • the AAU 11100 can implement the transceiver function, which is called the transceiver unit 11100, corresponding to the transceiver module 901 in Figure 9 .
  • the transceiver unit 11100 may also be called a transceiver, a transceiver circuit, a transceiver, etc., and may include at least one antenna 11101 and a radio frequency unit 11102.
  • the transceiver unit 11100 may include a receiving unit and a transmitting unit, the receiving unit may correspond to a receiver (or receiver, receiving circuit), and the transmitting unit may correspond to a transmitter (or transmitter, transmitting circuit).
  • the CU and DU 11200 can implement internal processing functions called processing units 11200, which correspond to the processing module 602 in Figure 6 .
  • the CU and DU 11200 may implement an internal processing function called a processing unit 11200, corresponding to the processing module 702 in FIG. 7 .
  • the CU and DU 11200 can implement internal processing functions called processing units 11200, which correspond to the processing module 802 in Figure 8 .
  • the CU and DU 11200 may implement an internal processing function called a processing unit 11200, corresponding to the processing module 902 in FIG. 9 .
  • the processing unit 11200 can control network devices, etc., and can be called a controller.
  • the AAU, CU and DU may be physically placed together or physically separated.
  • the network equipment is not limited to the form shown in Figure 11, and can also be in other forms: for example: including BBU and adaptive radio unit (ARU), or including BBU and active antenna unit (active antenna unit, AAU) ); it can also be customer premises equipment (CPE), or it can be in other forms, which is not limited by this application.
  • ARU adaptive radio unit
  • AAU active antenna unit
  • CPE customer premises equipment
  • the processing unit 11200 may be composed of one or more single boards. Multiple single boards may jointly support a single access standard wireless access network (such as an LTE network), or may support different access standards respectively. Wireless access network (such as LTE network, 5G network, future network or other networks).
  • the CU and DU 11200 also include a memory 11201 and a processor 11202.
  • the memory 11201 is used to store necessary instructions and data.
  • the processor 11202 is used to control the network device to perform necessary actions, for example, to control the network device to execute the operation process of the first device or the second device in the above method embodiment.
  • the memory 11201 and processor 11202 may serve one or more single boards. In other words, the memory and processor can be set independently on each board. It is also possible for multiple boards to share the same memory and processor. In addition, necessary circuits can also be installed on each board.
  • the network device 1100 shown in Figure 11 can implement the first device or the second device function involved in the method embodiments of Figures 2 and 4.
  • the operations and/or functions of each unit in the network device 1100 are respectively intended to implement the corresponding processes executed by the network device in the method embodiments of this application. To avoid repetition, detailed descriptions are appropriately omitted here.
  • the structure of the network device illustrated in Figure 11 is only one possible form, and should not constitute any limitation on the embodiment of the present application. This application does not exclude the possibility of other forms of network equipment structures that may appear in the future.
  • the above-mentioned CU and DU 11200 can be used to perform the actions implemented by the first device or the second device described in the previous method embodiment, and the AAU 11100 can be used to perform the actions of the first device or the second device described in the previous method embodiment. Send and receive actions.
  • the computer program product includes: computer program code.
  • the computer program code When the computer program code is run on a computer, it causes the computer to execute any one of the embodiments shown in Figures 2 and 4. method.
  • This application also provides a computer-readable medium, which stores program code.
  • program code When the program code is run on a computer, it causes the computer to execute any one of the embodiments shown in Figures 2 and 4. method.
  • This application also provides a communication system, which includes a first device and a second device.
  • the first device is used to perform some or all of the steps performed by the first device in the embodiments shown in Figures 2 and 4
  • the second device is used to perform the steps performed by the second device in the embodiments shown in Figures 2 and 4. some or all of the steps.
  • the communication system also includes a third device.
  • the third device is used to perform some or all of the steps performed by the third device in the embodiments shown in FIGS. 2 and 4 .
  • An embodiment of the present application also provides a chip device, including a processor, configured to call a computer program or computer instructions stored in the memory, so that the processor executes the method of the embodiment shown in FIG. 2 and FIG. 4 .
  • the input of the chip device corresponds to the receiving operation in the embodiment shown in FIG. 2 and FIG. 4, and the output of the chip device corresponds to the sending operation in the embodiment shown in FIG. 2 and FIG. 4. operate.
  • the processor is coupled to the memory through an interface.
  • the chip device further includes a memory, in which computer programs or computer instructions are stored.
  • the processor mentioned in any of the above places can be a general central processing unit, a microprocessor, an application-specific integrated circuit (ASIC), or one or more for controlling the above-mentioned Figure 2 and Figure 4 shows an integrated circuit for program execution of the method of the embodiment.
  • the memory mentioned in any of the above places can be read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), etc.
  • the disclosed systems, devices and methods can be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or can be integrated into another system, or some features can be ignored, or not implemented.
  • the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the devices or units may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or they may be distributed to multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
  • each functional unit in each embodiment of the present application can be integrated into one processing unit, each unit can exist physically alone, or two or more units can be integrated into one unit.
  • the above integrated units can be implemented in the form of hardware or software functional units.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions to cause a computer device (which can be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in various embodiments of this application.
  • the aforementioned storage media include: U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

本申请实施例提供一种通信方法以及相关装置,用于降低第一装置上报第一模型的相关信息的通信开销,节省通信资源。本申请提供的方法包括:第一装置接收来自第二装置的至少一个量化门限值;所述第一装置根据所述至少一个量化门限值对所述第一装置的第一模型的相关信息进行量化处理;所述第一装置向所述第二装置发送第一信息,所述第一信息用于指示量化处理后的所述第一模型的相关信息。

Description

通信方法以及相关装置 技术领域
本申请涉及通信技术领域,尤其涉及一种通信方法以及相关装置。
背景技术
分布式学习是实现联合学习的一种学习方法。具体的,多个节点设备利用本地数据训练得到本地模型,中心节点设备将多个本地模型融合得到全局模型。从而实现在保护节点设备的用户数据的隐私的前提下,实现联合学习。
多个节点设备可以分别训练其本地模型得到本地模型的相关参数。例如,本地模型的权重参数或权重梯度。然后,多个节点设备将本地模型的相关参数发送给中心节点设备。中心节点设备对多个节点设备发送的本地模型的相关参数进行融合得到全局模型的相关参数,并下发给各个节点设备。各个节点设备可以通过全局模型的相关参数更新该节点设备的本地模型。
由上述技术方案可知,各个节点设备分别向中心节点设备发送本地模型的相关参数。导致节点设备上报的数据量较大,通信开销较大。因此,节点设备如何以较低的通信开销来上报本地模型的相关参数,是亟待解决的问题。
发明内容
本申请实施例提供一种通信方法以及相关装置,用于降低第一装置上报第一模型的相关信息的通信开销,节省系统开销。
本申请第一方面提供一种通信方法,该通信方法可以由第一装置执行,第一装置可以是通信设备,也可以是通信设备中的组件(如,芯片(系统)),所述通信方法包括:
第一装置接收来自第二装置的至少一个量化门限值。然后,第一装置根据至少一个量化门限值对第一装置的第一模型的相关信息进行量化处理。第一装置向第二装置发送第一信息,第一信息用于指示量化处理后的第一模型的相关信息。从而降低第一装置上报第一模型的相关信息的通信开销,节省通信资源。
基于第一方面,一种可能的实现方式中,第一模型的相关信息包括:第一模型的输出参数或更新参数,更新参数包括第一模型的权重梯度或权重参数。在该实现方式中,示出了第一模型的相关信息包括的两种可能的参数,从而便于第二装置对各个装置上报的训练结果进行融合得到全局模型的相关信息。本申请中,每个装置上的模型可以理解为同一模型。为了区分不同装置上的模型,在第一装置侧,该模型可以称为第一模型。在第二装置侧,该模型可以称为全局模型。
基于第一方面,一种可能的实现方式中,在第一装置接收来自第二装置的至少一个量化门限值之前,方法还包括:第一装置向第二装置发送第二信息;其中,第二信息用于指示第一模型的相关信息经过处理得到的信息;或者,第二信息用于指示第一装置对第一模型进行第M轮训练得到的相关信息经过处理得到的信息,第一模型的相关信息是第一装置 对第一模型进行第Q轮训练得到的相关信息,M为大于或等于1且小于Q的整数,Q为大于1的整数。
在该实现方式中,第一装置可以向第二装置发送第二信息,从而便于第二装置确定该至少一个量化门限值。有利于第二装置确定合适的量化门限值,便于第一装置对第一模型的相关信息进行合理的量化处理。从而在保证第一装置上报的第一模型的相关信息的精度的情况下,降低第一装置上报第一模型的相关信息的开销。
基于第一方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的输出参数,第一模型的相关信息经过处理得到的信息包括第一模型的输出参数的取值的绝对值的平均值;或者,第一模型的相关信息包括第一模型的更新参数,第一模型的相关信息经过处理得到的信息包括第一模型的更新参数的取值的绝对值的平均值。在该实现方式中,示出了第一模型的相关信息的两种可能的实现方式,第一装置可以将第一模型的输出参数的取值的绝对值的平均值或第一模型的更新参数的取值的绝对值的平均值上报给第二装置。从而便于第二装置确定合适的量化门限值。
基于第一方面,一种可能的实现方式中,方法还包括:第一装置接收来自第二装置的第三信息,第三信息用于指示第一模型的全局信息。在该实现方式中,第一装置可以结合该第一模型的全局信息实现对第一模型的更新或训练。
基于第一方面,一种可能的实现方式中,第一模型的全局信息包括第一模型的全局输出参数;或者,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。
在该实现方式中,示出了第一模型的全局信息的两种实现方式。例如,第一模型的全局信息包括第一模型的全局输出参数,从而便于第一装置通过该全局输出参数对第一模型进行训练,有利于提升第一模型的训练性能,提升第一模型的准确性。例如,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。从而便于第一装置结合该全局更新参数和/或全局学习率对第一模型进行更新,有利于提升第一模型的准确性。
基于第一方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的N个参数,N为大于或等于1的整数;第一装置根据至少一个量化门限值对第一装置的第一模型的相关信息进行量化处理,包括:第一装置根据至少一个量化门限值对N个参数进行量化,得到量化处理后的N个参数;第一信息包括量化处理后的N个参数;第一装置向第二装置发送第一信息,包括:第一装置对量化处理后的N个参数进行调制得到N个第一信号;第一装置向第二装置发送N个第一信号。
在该实现方式中,第一信息包括量化处理后的N个参数。第一装置可以对该第一模型的N个参数进行量化处理,并将量化处理后的N个参数进行调制,再发送调制得到的N个第一信号。从而实现对第一信息的发送。
基于第一方面,一种可能的实现方式中,至少一个量化门限值包括第一量化门限值和第二量化门限值;第一装置根据至少一个量化门限值对N个参数进行量化处理,得到量化处理后的N个参数,包括:若N个参数中的第i个参数大于第一量化门限值时,第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数;或者,若N个参数中的第i个参数小于或等于第一量化门限值且大于或等于第二量化门限值时,第一装置 将第i个参数量化为第二值;或者,若N个参数中第i个参数小于第二量化门限值时,第一装置将第i个参数量化为第三值。在该实现方式中,示出了第一装置量化第i个参数的具体量化过程,从而便于方案的实施。进一步的,该至少一个量化门限值包括多个量化门限值,从而实现第一装置对第一模型的参数的量化精度更细,有利于提升第一装置更新第一模型的准确度,提升第一模型的训练性能。
基于第一方面,一种可能的实现方式中,第一装置对量化处理后的N个参数进行调制得到N个第一信号,包括:第一装置对量化处理后的第i个参数进行调制得到第i个第一信号,该第i个第一信号对应两个序列;当量化处理后的第i个参数为所述第一值时,第一装置发送两个序列中的第一个序列的发送功率小于第一装置发送两个序列中的第二个序列的发送功率;当量化处理后的第i个参数为第二值时,第一装置发送两个序列中的第一个序列的发送功率等于第一装置发送所述两个序列中的第二个序列的发送功率;当量化处理后的第i个参数为第三值时,第一装置发送所述两个序列中的第一个序列的发送功率大于第一装置发送两个序列中的第二个序列的发送功率。
在该实现方式中,第一装置将第一模型的N个参数中每个参数调制到两个序列上。第一装置控制发送该两个序列中每个序列分别采用的发送功率,从而便于第二装置确定该参数的取值。第一装置无需进行信道的估计和均衡,从而无需相应的导频开销。
基于第一方面,一种可能的实现方式中,当量化处理后的第i个参数为第一值时,两个序列中的第一个序列为非全0序列,第二个序列为全0序列;当量化处理后的第i个参数为第二值时,两个序列均为全0序列;当量化处理后的第i个参数为第三值时,两个序列中的第一个序列为全0序列,第二个序列为非全0序列。在该实现方式中,第一装置可以通过全0序列和/或非全0序列承载该量化处理后的第i个参数。在相同的发送总功率下,有利于第二装置识别该量化处理后的第i个参数的取值,提升功率利用效率。
基于第一方面,一种可能的实现方式中,第一装置向第二装置发送第一信息,包括:第一装置向第二装置发送L次第一信息,L为大于或等于1的整数。在该实现方式中,当发送次数L大于1时,第一装置重复发送该第一信息,有利于第二装置分别判决后选择出现次数最多的判决结果作为最好的判决结果。从而降低判决错误概率,进而提升模型训练的性能。
基于第一方面,一种可能的实现方式中,方法还包括:第一装置接收来自第二装置的第一指示信息,第一指示信息用于指示第一装置向第二装置发送第一信息的发送次数L。在该实现方式中,第一装置可以接收第二装置指示的发送次数,并按照该发送次数发送第一信息。从而有利于第二装置结合实际需求确定该发送次数,从而合理利用通信资源。
基于第一方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的量化误差补偿后的N个参数,量化误差补偿后的N个参数是第一装置根据第一装置对第一模型进行第Q轮训练得到的N个参数分别对应的量化误差对N个参数进行误差补偿得到的,N个参数中的第i个参数对应的量化误差是根据第一装置对第一模型进行第Q-1轮训练且经过量化误差补偿得到的第i个参数确定的,i为大于或等于1且小于或等于N的整数,N为大于或等于1的整数,Q为大于1的整数。
在该实现方式中,第一装置可以先对第一模型的N个参数进行量化误差补偿,再根据该至少一个量化门限值对该量化误差补偿后的N个参数进行量化处理。从而有利于提升第一装置更新第一模型的准确度,提升第一模型的训练性能。
基于第一方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的经过稀疏处理得到的N个参数;第一模型的经过稀疏处理得到的N个参数是第一装置根据公共稀疏掩码从第一模型的K个参数中选择N个参数,第一模型的K个参数是第一装置对第一模型进行一轮训练得到的参数,K为大于或等于N的整数,K为大于或等于1的整数,N为大于或等于1的整数。
在该实现方式中,第一装置可以先通过公共稀疏掩码从第一模型的K个参数选择N个参数,再根据该至少一个量化门限值对该N个参数进行量化处理。从而有利于降低第一装置上报第一模型的参数产生的开销。
基于第一方面,一种可能的实现方式中,公共稀疏掩码为比特序列,比特序列包括K个比特,K个比特与K个参数一一对应;当K个比特中的一个比特的取值为0时,用于指示第一装置不选择该比特对应的参数;当K个比特中的一个比特的取值为1时,用于指示第一装置选择该比特对应的参数。在该实现方式中,提供了公共稀疏掩码的一种具体形式,第一装置通过比特序列中的比特的取值选择哪些参数,操作简单方便。从而降低第一装置上报第一模型的参数的开销,降低通信资源的占用。
基于第一方面,一种可能的实现方式中,公共稀疏掩码是第一装置根据稀疏比例和伪随机数确定的,稀疏比例是第二装置向第一装置指示的。在该实现方式中,提供了公共稀疏掩码的一种生成方式,方便方案的实施。从而实现第一装置基于该公共稀疏掩码上报第一模型的部分参数,降低第一装置上报第一模型的参数产生的开销。
基于第一方面,一种可能的实现方式中,方法还包括:第一装置接收来自第二装置的第二指示信息,第二指示信息用于指示公共稀疏掩码。在该实现方式中,从而便于第一装置根据公共稀疏掩码从第一模型的K个参数选择N个参数。从而有利于降低第一装置上报第一模型的参数产生的开销。
基于第一方面,一种可能的实现方式中,方法还包括:第一装置向第二装置发送第三指示信息,第三指示信息用于指示K个参数中对应的取值的绝对值最大的N个参数的索引。
在该实现方式中,第一装置可以向第二装置指示其K个参数中对应的取值的绝对值最大的N个参数的索引。从而便于第二装置确定合适的公共稀疏掩码。第三指示信息用于指示K个参数中对应的取值的绝对值最大的N个参数的索引。有利于第一装置后续优先反馈变化较大的参数,从而提升模型训练的准确性,提升模型训练的性能。
基于第一方面,一种可能的实现方式中,第一模型为神经网络模型,第一模型的相关信息包括神经网络模型的其中P层的神经元的相关参数,P为大于或等于1的整数。在该实现方式中,第一装置可以上报的是神经网络模型中的某一层或某多层的参数。也就是第一装置以神经网络模型的层为单位上报该神经网络模型的参数,从而有利于第一装置准确上报各层的参数,提升模型训练的准确性。
本申请第二方面提供一种通信方法,该通信方法可以由第二装置执行,第二装置可以 是通信设备,也可以是通信设备中的组件(如,芯片(系统)),所述通信方法包括:
第二装置向第一装置发送至少一个量化门限值,至少一个量化门限值用于对第一装置的第一模型的相关信息进行量化处理;第二装置接收来自第一装置发送的第一信息,第一信息用于指示量化处理后的第一模型的相关信息。由上述技术方案可知,有利于降低第一装置上报第一模型的相关信息的通信开销,节省通信资源。
基于第二方面,一种可能的实现方式中,第一模型的相关信息包括:第一模型的输出参数或更新参数,更新参数包括第一模型的权重梯度或权重参数。在该实现方式中,示出了第一模型的相关信息包括的两种可能的参数,从而便于第二装置对各个装置上报的训练结果进行融合得到全局模型的相关信息。本申请中,各个装置上的模型可以理解为同一模型。为了区分不同装置上的模型,在第一装置侧,该模型可以称为第一模型,在第二装置侧,该模型可以称为全局模型。
基于第二方面,一种可能的实现方式中,方法还包括:
第二装置接收来自第一装置的第二信息;其中,第二信息用于指示第一模型的相关信息经过处理得到的信息;或者,第二信息用于指示第一装置对第一模型进行第M轮训练并经过处理得到的信息,第一模型的相关信息是第一装置对第一模型进行第Q轮训练得到的相关信息,M为大于或等于1且小于Q的整数,Q为大于1的整数;第二装置根据第二信息确定至少一个量化门限值。
在该实现方式中,第二装置接收来自第一装置的第二信息,从而实现第二装置根据第二信息确定至少一个量化门限值。有利于第二装置确定合适的量化门限值,便于第一装置对第一模型的相关信息进行合理的量化处理。从而在保证第一装置上报的第一模型的相关信息的精度的情况下,降低第一装置上报第一模型的相关信息的开销。
基于第二方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的输出参数,第一模型的相关信息经过处理得到的信息包括第一模型的输出参数的取值的绝对值的平均值;或者,第一模型的相关信息包括第一模型的更新参数,第一模型的相关信息经过处理得到的信息包括第一模型的更新参数的取值的绝对值的平均值。在该实现方式中,示出了第一模型的相关信息的两种可能的实现方式,第二装置可以接收来自第一装置的第一模型的输出参数的取值的绝对值的平均值或第一模型的更新参数的取值的绝对值的平均值。从而便于第二装置确定合适的量化门限值。
基于第二方面,一种可能的实现方式中,方法还包括:第二装置接收来自第三装置的第三信息;其中,第三信息用于指示第三装置的第二模型的相关信息经过处理得到的信息;或者,第三信息用于指示第三装置对第二模型进行第S轮训练并经过处理得到的信息,第二模型的相关信息是第三装置对第二模型进行第R轮训练得到的相关信息,S为大于或等于1且小于R的整数,R为大于1的整数;第二装置根据第二信息确定至少一个量化门限值,包括:第二装置根据第二信息和第三信息确定至少一个量化门限值。
在该实现方式中,第二装置还可以接收第三装置的第三信息,并联合第二信息和第三信息确定至少一个量化门限值。有利于第二装置确定合适的量化门限值,从而在保证第一装置上报的第一模型的相关信息的精度的情况下,降低第一装置上报第一模型的相关信息 的开销。
基于第二方面,一种可能的实现方式中,方法还包括:第二装置根据第一信息确定第一模型的全局信息;第二装置向第一装置发送第四信息,第四信息用于指示第一模型的全局信息。在该实现方式中,第二装置可以结合第一信息确定该第一模型的全局信息,并向第一装置发送该第一模型的全局信息。从而实现第一装置对第一模型的更新或训练。
基于第二方面,一种可能的实现方式中,第一模型的全局信息包括第一模型的全局输出参数;或者,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。
在该实现方式中,示出了第一模型的全局信息的两种实现方式。例如,第一模型的全局信息包括第一模型的全局输出参数,从而便于第一装置通过该全局输出参数对第一模型进行训练,有利于提升第一模型的训练性能,提升第一模型的准确性。例如,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。从而便于第一装置结合该全局更新参数和/或全局学习率对第一模型进行更新,有利于提升第一模型的准确性。
基于第二方面,一种可能的实现方式中,方法还包括:第二装置接收来自第三装置的第五信息,第五信息用于指示第三装置的第二模型的相关信息;第二装置根据第一信息确定第一模型的全局信息,包括:第二装置根据第一信息和第五信息确定第一模型的全局信息。在该实现方式中,第二装置还可以接收来自第三装置的第五信息,并联合第一信息和第五信息确定该第一模型的全局信息。有利于提升第二装置确定第一模型的全局信息的准确性,提升模型更新的准确度。
基于第二方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的N个参数,N为大于或等于1的整数;第二模型的相关信息包括第二模型的N个参数;第一信息包括量化处理后的第一模型的N个参数;第二装置接收来自第一装置发送的第一信息,包括:第二装置接收来自第一装置的N个第一信号,N个第一信号承载量化处理后的第一模型的N个参数,N个第一信号与量化处理后的第一模型的N个参数一一对应;第五信息包括量化处理后的第二模型的N个参数;第二装置接收来自第三装置的第五信息,包括:第二装置接收来自第三装置的N个第二信号,N个第二信号承载量化处理后的第二模型的N个参数,N个第二信号与量化处理后的第二模型的N个参数一一对应;第二装置根据第一信息和第五信息确定第一模型的全局信息,包括:第二装置根据N个第一信号和N个第二信号确定第一模型的全局信息。
基于第二方面,一种可能的实现方式中,N个第一信号中第i个第一信号对应第一序列和第二序列,N个第二信号中第i个第二信号对应第三序列和第四序列,第一装置发送第一序列采用的时频资源与第三装置发送第三序列采用的时频资源相同,第一装置发送第二序列采用的时频资源与第三装置发送第四序列采用的时频资源相同;第一模型的全局信息包括第一模型的N个全局参数;i为大于或等于1且小于或等于N的整数;第二装置根据N个第一信号和N个第二信号确定第一模型的全局信息,包括:第二装置确定第二装置接收第一序列和第三序列的第一信号能量和;第二装置确定第二装置接收第二序列和第四序列的第二信号能量和;第二装置根据第一信号能量和与第二信号能量和确定N个全局参数中的第i个全局参数。由此可知,第二装置可以通过第二装置接收第i个第一信号对应 的两个序列的信号能量以及接收第i个第二信号对应的两个序列的信号能量确定第i个全局参数。从而支持第二装置实现对多用户空中信号叠加传输的非相干接收,实现对衰落信道鲁棒。
基于第二方面,一种可能的实现方式中,第二装置根据第一信号能量和与第二信号能量和确定N个全局参数中的第i个全局参数,包括:若第一信号能量和与判决门限值的和小于第二信号能量和,则第二装置确定第i个全局参数的取值为第一值;或者,若第一信号能量和与判决门限值的和大于或等于第二信号能量和,且第二信号能量和与判决门限值的和大于或等于第一信号能量和,则第二装置确定第i个全局参数的取值为第二值;或者,若第二信号能量和与判决门限值的和小于第一信号能量和,则第二装置确第i个全局参数的取值为第三值。
在该实现方式中,示出了第二装置确定第i个全局参数的过程。由上述可知,第一信号能量和与第二信号能量和的三种可能的条件对应第i个全局参数的三种判决结果。从而实现对第i个全局参数的准确判决,有利于提升第一装置更新第一模型的准确度,提升第一模型的训练性能。
基于第二方面,一种可能的实现方式中,方法还包括:第二装置向第一装置发送第一指示信息,第一指示信息用于指示第一装置向第二装置发送第一信息的发送次数L,L为大于或等于1的整数。在该实现方式中,第二装置向第一装置指示发送第一信息的发送次数,使得第一装置按照该发送次数发送第一信息。从而有利于第二装置结合实际需求确定该发送次数,从而合理利用通信资源。
基于第二方面,一种可能的实现方式中,方法还包括:第二装置向第一装置发送第二指示信息,第二指示信息用于指示公共稀疏掩码,公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数。在该实现方式中,第二装置向第一装置发送第二指示信息,第二指示信息用于指示公共稀疏掩码。从而便于第一装置根据公共稀疏掩码从第一模型的K个参数选择N个参数。从而有利于降低第一装置上报第一模型的参数产生的开销。
基于第二方面,一种可能的实现方式中,方法还包括:第二装置接收来自第一装置的第三指示信息,第三指示信息用于指示第一装置对第一模型进行一轮训练得到的K个参数中对应的取值的绝对值最大的N个参数的索引;第二装置接收来自第三装置的第四指示信息,第四指示信息用于指示第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第二模型的K个参数是第三装置对第二模型进行一轮训练得到的K个参数;第二装置根据第三指示信息和第四指示信息确定公共稀疏掩码。在该实现方式中,各个装置指示其K个参数中对应的取值的绝对值最大的参数的索引,有利于第二装置根据第三指示信息和第四指示信息确定合适的公共稀疏掩码。这样第一装置根据该公共稀疏掩码可以优先反馈变化较大的参数,从而提升模型训练的准确性,提升模型训练的性能。
本申请第三方面提供一种通信方法,该通信方法可以由第一装置执行,第一装置可以是通信设备,也可以是通信设备中的组件(如,芯片(系统)),所述通信方法包括:
第一装置向第二装置发送第一指示信息,第一指示信息用于指示第一装置的第一模型 的K个参数中对应的取值的绝对值最大的N个参数的索引,第一模型的K个参数是第一装置对第一模型进行一轮训练得到的K个参数,K为大于或等于所述N的整数,K为大于或等于1的整数,N为大于或等于1的整数。然后,第一装置接收来自第二装置的第二指示信息。该第二指示信息用于指示公共稀疏掩码,公共稀疏掩码是第二装置根据第一指示信息确定的;公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数。
上述技术方案中,第一装置可以向第二装置上报第一指示信息,从而指示第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引。从而实现第二装置根据第一指示信息确定合适的公共稀疏掩码。第一装置接收来自第二装置的第二指示信息。该第二指示信息用于指示公共稀疏掩码。从而便于实现第一装置根据据该公共稀疏掩码可以优先反馈变化较大的参数。有利于降低第一装置上报第一模型的参数产生的开销,同时还提升了模型训练的准确性,提升模型训练的性能。
本申请第四方面提供一种通信方法,该通信方法可以由第二装置执行,第二装置可以是通信设备,也可以是通信设备中的组件(如,芯片(系统)),所述通信方法包括:
第二装置接收来自第一装置的第一指示信息,第一指示信息用于指示第一装置的第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第一模型的K个参数是第一装置对第一模型进行一轮训练得到的K个参数,K为大于或等于所述N的整数,K为大于或等于1的整数,N为大于或等于1的整数;第二装置根据第一指示信息确定公共稀疏掩码,公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数。然后,第二装置向第一装置发送第二指示信息,第二指示信息用于指示公共稀疏掩码。
上述技术方案中,第二装置接收来自第一装置的第一指示信息,该第一指示信息用于第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引。从而实现第二装置可以根据第一指示信息确定合适的公共稀疏掩码。便于第一装置根据该公共稀疏掩码可以优先反馈变化较大的参数,降低第一装置上报第一模型的参数产生的开销,同时还提升了模型训练的准确性,提升模型训练的性能。
基于第四方面,一种可能的实现方式中,方法还包括:第二装置接收来自第三装置的第三指示信息,第三指示信息用于指示第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第二模型的K个参数是第二装置对第二模型进行一轮训练得到的K个参数;第二装置根据第一指示信息确定公共稀疏掩码,包括:第二装置根据第一指示信息和第三指示信息确定公共稀疏掩码。
在该实现方式中,第二装置还可以结合第三装置上报的第三指示信息确定该公共稀疏掩码,从而便于第二装置为第一装置确定合适的公共稀疏掩码。实现第一装置根据该公共稀疏掩码可以优先反馈变化较大的参数,从而提升模型训练的准确性,提升模型训练的性能。
本申请第五方面提供一种第一装置,包括:
收发模块,用于接收来自第二装置的至少一个量化门限值;处理模块,用于根据至少一个量化门限值对第一装置的第一模型的相关信息进行量化处理;收发模块,还用于向第二装置发送第一信息,第一信息用于指示量化处理后的第一模型的相关信息。
基于第五方面,一种可能的实现方式中,第一模型的相关信息包括:第一模型的输出参数或更新参数,更新参数包括第一模型的权重梯度或权重参数。
基于第五方面,一种可能的实现方式中,收发模块还用于:向第二装置发送第二信息;其中,第二信息用于指示第一模型的相关信息经过处理得到的信息;或者,第二信息用于指示第一装置对第一模型进行第M轮训练得到的相关信息经过处理得到的信息,第一模型的相关信息是第一装置对第一模型进行第Q轮训练得到的相关信息,M为大于或等于1且小于Q的整数,Q为大于1的整数。
基于第五方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的输出参数,第一模型的相关信息经过处理得到的信息包括第一模型的输出参数的取值的绝对值的平均值;或者,第一模型的相关信息包括第一模型的更新参数,第一模型的相关信息经过处理得到的信息包括第一模型的更新参数的取值的绝对值的平均值。
基于第五方面,一种可能的实现方式中,收发模块还用于:接收来自第二装置的第三信息,第三信息用于指示第一模型的全局信息。
基于第五方面,一种可能的实现方式中,第一模型的全局信息包括第一模型的全局输出参数;或者,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。
基于第五方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的N个参数,N为大于或等于1的整数;处理模块具体用于:根据至少一个量化门限值对N个参数进行量化处理,得到量化处理后的N个参数;第一信息包括量化处理后的N个参数;收发模块具体用于:对量化处理后的N个参数进行调制得到N个第一信号;向第二装置发送N个第一信号。
基于第五方面,一种可能的实现方式中,至少一个量化门限值包括第一量化门限值和第二量化门限值;处理模块具体用于:
若N个参数中的第i个参数大于第一量化门限值时,将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数;或者,若N个参数中的第i个参数小于或等于第一量化门限值且大于或等于第二量化门限值时,将第i个参数量化为第二值;或者,若N个参数中第i个参数小于第二量化门限值时,将第i个参数量化为第三值。
基于第五方面,一种可能的实现方式中,收发模块具体用于:对量化处理后的第i个参数进行调制得到第i个第一信号,该第i个第一信号对应两个序列;当量化处理后的第i个参数为第一值时,第一装置发送两个序列中的第一个序列的发送功率小于第一装置发送所述两个序列中的第二个序列的发送功率;当量化处理后的第i个参数为第二值时,第一装置发送两个序列中的第一个序列的发送功率等于第一装置发送两个序列中的第二个序列的发送功率;当量化处理后的第i个参数为第三值时,第一装置发送两个序列中的第一个序列的发送功率大于第一装置发送所述两个序列中的第二个序列的发送功率。
基于第五方面,一种可能的实现方式中,当量化处理后的第i个参数为第一值时,两个序列中的第一个序列为非全0序列,第二个序列为全0序列;当量化处理后的第i个参数为第二值时,两个序列均为全0序列;当量化处理后的第i个参数为第三值时,两个序列中的第一个序列为全0序列,第二个序列为非全0序列。
基于第五方面,一种可能的实现方式中,收发模块具体用于:向第二装置发送L次第一信息,L为大于或等于1的整数。
基于第五方面,一种可能的实现方式中,收发模块还用于:接收来自第二装置的第一指示信息,第一指示信息用于指示第一装置向第二装置发送第一信息的发送次数L。
基于第五方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的量化误差补偿后的N个参数,量化误差补偿后的N个参数是第一装置根据第一装置对第一模型进行第Q轮训练得到的N个参数分别对应的量化误差对N个参数进行误差补偿得到的,Q为大于1的整数,所述N个参数中的第i个参数对应的量化误差是根据第一装置对第一模型进行第Q-1轮训练且经过量化误差补偿得到的第i个参数确定的。
基于第五方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的经过稀疏处理得到的N个参数;第一模型的经过稀疏处理得到的N个参数是第一装置根据公共稀疏掩码从第一模型的K个参数中选择N个参数,第一模型的K个参数是第一装置对第一模型进行第Q轮训练得到的参数,K为大于或等于N的整数,K为大于或等于1的整数。
基于第五方面,一种可能的实现方式中,公共稀疏掩码为比特序列,比特序列包括K个比特,K个比特与K个参数一一对应;当K个比特中的一个比特的取值为0时,用于指示第一装置不选择该比特对应的参数;当K个比特中的一个比特的取值为1时,用于指示第一装置选择该比特对应的参数。
基于第五方面,一种可能的实现方式中,公共稀疏掩码是第一装置根据稀疏比例和伪随机数确定的,稀疏比例是第二装置向第一装置指示的。
基于第五方面,一种可能的实现方式中,收发模块还用于:接收来自第二装置的第二指示信息,第二指示信息用于指示公共稀疏掩码。
基于第五方面,一种可能的实现方式中,收发模块还用于:向第二装置发送第三指示信息,第三指示信息用于指示K个参数中对应的取值的绝对值最大的N个参数的索引。
基于第五方面,一种可能的实现方式中,第一模型为神经网络模型,第一模型的相关信息包括神经网络模型的其中P层的神经元的相关参数,P为大于或等于1的整数。
本申请第六方面提供一种第二装置,包括:
收发模块,用于向第一装置发送至少一个量化门限值,至少一个量化门限值用于对第一装置的第一模型的相关信息进行量化处理;接收来自第一装置发送的第一信息,第一信息用于指示量化处理后的第一模型的相关信息。
基于第六方面,一种可能的实现方式中,第一模型的相关信息包括:第一模型的输出参数或更新参数,更新参数包括第一模型的权重梯度或权重参数。
基于第六方面,一种可能的实现方式中,收发模块还用于:接收来自第一装置的第二信息;其中,第二信息用于指示第一模型的相关信息经过处理得到的信息;或者,第二信息用于指示第一装置对第一模型进行第M轮训练并经过处理得到的信息,第一模型的相关信息是第一装置对第一模型进行第Q轮训练得到的相关信息,M为大于或等于1且小于Q的整数,Q为大于1的整数;第二装置还包括处理模块,处理模块用于根据第二信息确定至少一个量化门限值。
基于第六方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的输出参数,第一模型的相关信息经过处理得到的信息包括第一模型的输出参数的取值的绝对值的平均值;或者,第一模型的相关信息包括第一模型的更新参数,第一模型的相关信息经过处理得到的信息包括第一模型的更新参数的取值的绝对值的平均值。
基于第六方面,一种可能的实现方式中,收发模块还用于:接收来自第三装置的第三信息;其中,第三信息用于指示第三装置的第二模型的相关信息经过处理得到的信息;或者,第三信息用于指示第三装置对第二模型进行第S轮训练并经过处理得到的信息,第二模型的相关信息是第三装置对第二模型进行第R轮训练得到的相关信息,S为大于或等于1且小于R的整数,R为大于1的整数;处理模块,用于根据第二信息和第三信息确定至少一个量化门限值。
基于第六方面,一种可能的实现方式中,处理模块还用于:根据第一信息确定第一模型的全局信息;收发模块还用于:向第一装置发送第四信息,第四信息用于指示第一模型的全局信息。
基于第六方面,一种可能的实现方式中,第一模型的全局信息包括第一模型的全局输出参数;或者,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。
基于第六方面,一种可能的实现方式中,收发模块还用于:接收来自第三装置的第五信息,第五信息用于指示第三装置的第二模型的相关信息;处理模块具体用于:根据第一信息和第五信息确定第一模型的全局信息。
基于第六方面,一种可能的实现方式中,第一模型的相关信息包括第一模型的N个参数,N为大于或等于1的整数;第二模型的相关信息包括第二模型的N个参数;第一信息包括量化处理后的第一模型的N个参数;收发模块具体用于:接收来自第一装置的N个第一信号,N个第一信号承载第一模型的N个参数,N个第一信号与量化处理后的第一模型的N个参数一一对应;第五信息包括量化处理后的第二模型的N个参数;收发模块具体用于:接收来自第三装置的N个第二信号,N个第二信号承载量化处理后的第二模型的N个参数,N个第二信号与量化处理后的第二模型的N个参数一一对应;处理模块具体用于:根据N个第一信号和N个第二信号确定第一模型的全局信息。
基于第六方面,一种可能的实现方式中,N个第一信号中第i个第一信号对应第一序列和第二序列,N个第二信号中第i个第二信号对应第三序列和第四序列,第一装置发送第一序列采用的时频资源与第三装置发送第三序列采用的时频资源相同,第一装置发送第二序列采用的时频资源与第三装置发送所述第四序列采用的时频资源相同;第一模型的全局信息包括第一模型的N个全局参数;i为大于或等于1且小于或等于N的整数;处理模块具体用于:确定第二装置接收第一序列和第三序列的第一信号能量和;确定第二装置接收第二序列和第四序列的第二信号能量和;根据第一信号能量和与第二信号能量和确定N个全局参数中的第i个全局参数。
基于第六方面,一种可能的实现方式中,处理模块具体用于:若第一信号能量和与判决门限值的和小于第二信号能量和,则确定第i个全局参数的取值为第一值;或者,若第一信号能量和与判决门限值的和大于或等于第二信号能量和,且第二信号能量和与判决门 限值的和大于或等于第一信号能量和,则确定第i个全局参数的取值为第二值;或者,若第二信号能量和与判决门限值的和小于第一信号能量和,则确定第i个全局参数的取值为第三值。
基于第六方面,一种可能的实现方式中,收发模块还用于:向第一装置发送第一指示信息,第一指示信息用于指示第一装置向第二装置发送第一信息的发送次数L,L为大于或等于1的整数。
基于第六方面,一种可能的实现方式中,收发模块还用于:向第一装置发送第二指示信息,第二指示信息用于指示公共稀疏掩码,公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数。
基于第六方面,一种可能的实现方式中,收发模块还用于:接收来自第一装置的第三指示信息,第三指示信息用于指示第一装置对第一模型进行一轮训练得到的K个参数中对应的取值的绝对值最大的N个参数的索引;接收来自第三装置的第四指示信息,第四指示信息用于指示第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第二模型的K个参数是第三装置对第二模型进行一轮训练得到的K个参数;第二装置还包括处理模块,处理模块还用于:根据第三指示信息和第四指示信息确定公共稀疏掩码。
本申请第七方面提供一种第一装置,包括:
收发模块,用于向第二装置发送第一指示信息,第一指示信息用于指示第一装置的第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第一模型的K个参数是第一装置对第一模型进行一轮训练得到的K个参数,K为大于或等于所述N的整数,K为大于或等于1的整数,N为大于或等于1的整数;接收来自第二装置的第二指示信息;该第二指示信息用于指示公共稀疏掩码,公共稀疏掩码是第二装置根据第一指示信息确定的;公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数。
本申请第八方面提供一种第二装置,包括:
收发模块,用于接收来自第一装置的第一指示信息,第一指示信息用于指示第一装置的第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第一模型的K个参数是第一装置对第一模型进行一轮训练得到的K个参数,K为大于或等于所述N的整数,K为大于或等于1的整数,N为大于或等于1的整数;
处理模块,用于根据第一指示信息确定公共稀疏掩码,公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数;
收发模块,还用于向第一装置发送第二指示信息,第二指示信息用于指示公共稀疏掩码。
基于第八方面,一种可能的实现方式中,收发模块还用于:
接收来自第三装置的第三指示信息,第三指示信息用于指示第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第二模型的K个参数是第三装置对第二模型进行一轮训练得到的K个参数;
处理模块具体用于:
根据第一指示信息和第三指示信息确定公共稀疏掩码。
针对上述第五方面或第七方面,该第一装置可以为通信设备,所述收发模块可以是收发器,或,输入/输出接口;所述处理模块可以是处理器。
在另一种实现方式中,该第一装置为配置于通信设备中的芯片、芯片系统或电路。当该第一装置为配置于通信设备中的芯片、芯片系统或电路时,所述收发模块可以是该芯片、芯片系统或电路上的输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等;所述处理模块可以是处理器、处理电路或逻辑电路等。
针对上述第六方面或第八方面,该第二装置可以为通信设备,所述收发模块可以是收发器,或,输入/输出接口;所述处理模块可以是处理器。
在另一种实现方式中,该第二装置为配置于通信设备中的芯片、芯片系统或电路。当该第二装置为配置于通信设备中的芯片、芯片系统或电路时,所述收发模块可以是该芯片、芯片系统或电路上的输入/输出接口、接口电路、输出电路、输入电路、管脚或相关电路等;所述处理模块可以是处理器、处理电路或逻辑电路等。
本申请第九方面提供一种第一装置,该第一装置包括:处理器和存储器。该存储器中存储有计算机程序或计算机指令,该处理器用于调用并运行该存储器中存储的计算机程序或计算机指令,使得处理器实现如第一方面或第三方面的任意一种实现方式。
可选的,该第一装置还包括收发器,该处理器用于控制该收发器收发信号。
本申请第十方面提供一种第二装置,该第二装置包括:处理器和存储器。该存储器中存储有计算机程序或计算机指令,该处理器用于调用并运行该存储器中存储的计算机程序或计算机指令,使得处理器实现如第二方面或第四方面的任意一种实现方式。
可选的,该第二装置还包括收发器,该处理器用于控制该收发器收发信号。
本申请第十一方面提供一种第一装置,包括处理器和接口电路,所述处理器用于通过接口电路与其它装置通信,并执行上述第一方面或第三方面所述的方法。该处理器包括一个或多个。
本申请第十二方面提供一种第二装置,包括处理器和接口电路,所述处理器用于通过接口电路与其它装置通信,并执行上述第二方面或第四方面所述的方法。该处理器包括一个或多个。
本申请第十三方面提供一种第一装置,包括处理器,用于与存储器相连,用于调用所述存储器中存储的程序,以执行上述第一方面或第三方面所述的方法。该存储器可以位于该第一装置之内,也可以位于该第一装置之外。且该处理器包括一个或多个。
本申请第十四方面提供一种第二装置,包括处理器,用于与存储器相连,用于调用所述存储器中存储的程序,以执行上述第二方面或第四方面所述的方法。该存储器可以位于该第二装置之内,也可以位于该第二装置之外。且该处理器包括一个或多个。
在一种实现方式中,上述第五方面、第七方面、第九方面、第十一方面、第十三方面的第一装置,可以是芯片(系统)。
在一种实现方式中,上述第六方面、第八方面、第十方面、第十二方面、第十四方面的第二装置,可以是芯片(系统)。
本申请第十五方面提供一种包括指令的计算机程序产品,其特征在于,当其在计算机 上运行时,使得该计算机执行如第一方面至第四方面中任一方面中的任一种的实现方式。
本申请第十六方面提供一种计算机可读存储介质,包括计算机指令,当该指令在计算机上运行时,使得计算机执行如第一方面至第四方面中任一方面中的任一种实现方式。
本申请第十七方面提供一种芯片装置,包括处理器,用于调用存储器中的计算机程序或计算机指令,以使得该处理器执行上述第一方面至第四方面中任一方面中的任一种实现方式。
可选的,该处理器通过接口与该存储器耦合。
本申请第十八方面提供一种通信系统,该通信系统包括如第五方面的第一装置和如第六方面的第二装置;或者,该通信系统包括如第七方面的第一装置和如第八方面的第二装置。
从以上技术方案可以看出,本申请实施例具有以下优点:
上述技术方案中,第一装置接收来自第二装置的至少一个量化门限值。然后,第一装置根据至少一个量化门限值对第一装置的第一模型的相关信息进行量化处理。第一装置向第二装置发送第一信息,第一信息用于指示量化处理后的第一模型的相关信息。从而降低第一装置上报第一模型的相关信息的通信开销,节省通信资源。
附图说明
图1为本发明实施例应用的通信系统的一个示意图;
图2为本申请实施例通信方法的一个实施例示意图;
图3为本申请实施例通信方法的一个流程示意图;
图4为本申请实施例通信方法的另一个实施例示意图;
图5为本申请实施例公共稀疏掩码的一个生成示意图;
图6为本申请实施例第一装置的一个结构示意图;
图7为本申请实施例第一装置的另一个结构示意图;
图8为本申请实施例第二装置的一个结构示意图;
图9为本申请实施例第二装置的另一个结构示意图;
图10为本申请实施例终端设备的一个结构示意图;
图11为本申请实施例网络设备的一个结构示意图。
具体实施方式
本申请实施例提供了一种通信方法以及相关装置,用于降低第一装置上报第一模型的相关信息的通信开销,节省通信资源。
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或 多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
在本申请的描述中,除非另有说明,“/”表示“或”的意思,例如,A/B可以表示A或B。本文中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。此外,“至少一个”是指一个或多个,“多个”是指两个或两个以上。“以下至少一项(个)”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b,或c中的至少一项(个),可以表示:a,b,c;a和b;a和c;b和c;或a和b和c。其中a,b,c可以是单个,也可以是多个。
可以理解,在本申请中,“指示”可以包括直接指示、间接指示、显示指示、隐式指示。当描述某一指示信息用于指示A时,可以理解为该指示信息携带A、直接指示A,或间接指示A。
本申请中,指示信息所指示的信息,称为待指示信息。在具体实现过程中,对待指示信息进行指示的方式有很多种,例如但不限于,可以直接指示待指示信息,如待指示信息本身或者该待指示信息的索引等,也可以通过指示其他信息来间接指示待指示信息,其中,该其他信息与待指示信息之间存在关联关系。还可以仅仅指示待指示信息的一部分,而待指示信息的其他部分则是已知的或者提前约定的。例如,还可以借助预先约定(例如协议规定)的各个信息的排列顺序来实现对特定信息的指示,从而在一定程度上降低指示开销。
待指示信息可以作为一个整体一起发送,也可以分成多个子信息分开发送,而且这些子信息的发送周期和/或发送时机可以相同,也可以不同。具体发送方法本申请不进行限定。其中,这些子信息的发送周期和/或发送时机可以是预先定义的,例如根据协议预先定义的,也可以是发射端设备通过向接收端设备发送配置信息来配置的。
本申请的技术方案可以应用于第三代合作伙伴计划(3rd generation partnership project,3GPP)相关的蜂窝通信系统。例如,第四代(4th generation,4G)通信系统、第五代(5th generation,5G)通信系统、第五代通信系统之后的通信系统。例如,第六代通信系统。例如,第四代通信系统可以包括长期演进(long term evolution,LTE)通信系统。第五代通信系统可以包括新无线(new radio,NR)通信系统。本申请的技术方案也可以应用于无线保真(wireless fidelity,WiFi)系统,支持多种无线技术融合的通信系统、设备到设备(device-to-device,D2D)系统,车联网(vehicle to everything,V2X)通信系统等。
本申请的技术方案适用的通信系统包括第一装置和第二装置。可选的,通信系统还包括第三装置。
下面介绍第一装置、第二装置的一些可能的形态。对于其他形态本申请仍适用,下述实现方式不属于对本申请的限定。
1、第一装置为第一终端设备或第一终端设备内的芯片,第二装置为网络设备或网络设备内的芯片。在该实现方式中,第一装置和第二装置可以执行本申请提供的通信方法。
可选的,第三装置为第二终端设备或第二终端设备内的芯片。第三装置可以执行本申请提供的通信方法。
需要说明的是,上述是以第一终端设备和第二终端设备为例进行介绍。实际应用中,网络设备可以与更多终端设备执行本申请提供的通信方法。
2、第一装置为第一网络设备或第一网络设备内的芯片,第二装置为终端设备或终端设备内的芯片。在该实现方式中,第一装置和第二装置可以执行本申请提供的通信方法。
可选的,第三装置为第二网络设备或第二网络设备内的芯片。第三装置可以执行本申请提供的通信方法。
需要说明的是,上述是以第一网络设备和第二网络设备为例进行介绍。实际应用中,终端设备可以与更多网络设备可以执行本申请提供的通信方法。
3、第一装置为第一终端设备或第一终端设备内的芯片,第二装置为第二终端设备或第二终端设备的芯片。在该实现方式中,第一装置和第二装置可以执行本申请提供的通信方法。
可选的,第三装置为第三终端设备或第三终端设备内的芯片。第三装置可以执行本申请提供的通信方法。
需要说明的是,上述是以第一终端设备、第二终端设备和第三终端设备为例进行介绍。实际应用中,第一终端设备可以与更多终端设备执行本申请提供的通信方法。
下面介绍本申请涉及的终端设备和网络设备。
终端设备是具有无线收发功能的设备,还具有计算能力。终端设备可以通过本地的数据进行机器学习的训练,并向网络设备发送终端设备训练得到的模型的相关信息。
终端设备可以指用户设备(user equipment,UE)、接入终端、用户单元(subscriber unit)、用户站、移动台(mobile station)、远方站、远程终端、移动设备、用户终端、无线通信设备、用户代理或用户装置。终端设备还可以是卫星电话、蜂窝电话、智能手机、无线数据卡、无线调制解调器、机器类型通信设备、可以是无绳电话、会话启动协议(session initiation protocol,SIP)电话、无线本地环路(wireless local loop,WLL)站、个人数字处理(personal digital assistant,PDA)、具有无线通信功能的手持设备、计算设备或连接到无线调制解调器的其它处理设备、车载设备、高空飞机上搭载的通信设备、可穿戴设备、无人机、机器人、D2D中的终端、V2X中的终端、虚拟现实(virtual reality,VR)终端设备、增强现实(augmented reality,AR)终端设备、工业控制(industrial control)中的无线终端、无人驾驶(self driving)中的无线终端、远程医疗(remote medical)中的无线终端、智能电网(smart grid)中的无线终端、运输安全(transportation safety)中的无线终端、智慧城市(smart city)中的无线终端、智慧家庭(smart home)中的无线终端或者未来通信网络中的终端设备等,本申请不作限制。
网络设备具有无线收发功能,还具有计算能力。网络设备用于与终端设备进行通信。或者说,网络设备可以是一种将终端设备接入到无线网络的设备。例如,网络设备可以为 具有计算能力的网络节点。例如,网络设备可以为网络侧(例如,接入网或核心网)的人工智能(artificial intelligence,AI)节点、算力节点、具有AI能力的接入网节点。网络设备可以对多个终端设备训练的模型进行融合,再发送给这些终端设备。从而实现多个终端设备之间的联合学习。
网络设备可以为无线接入网中的节点。网络设备可以称为基站,还可以称为无线接入网(radio access network,RAN)节点或RAN设备。网络设备可以是LTE中的演进型基站(evolved Node B,eNB或eNodeB),或者5G网络中的下一代节点B(next generation node B,gNB)或者未来演进的公共陆地移动网络(public land mobile network,PLMN)中的基站,宽带网络业务网关(broadband network gateway,BNG),汇聚交换机或者非第三代合作伙伴项目(3rd generation partnership project,3GPP)接入设备等。可选的,本申请实施例中的网络设备可以包括各种形式的基站。例如,宏基站,微基站(也称为小站),中继站,接入点,5G之后演进的通信系统中实现基站功能的设备,WiFi系统中的接入点(access point,AP),传输点(transmitting and receiving point,TRP)、发射点(transmitting point,TP),移动交换中心、D2D通信、V2X设备通信或机器到机器(machine-to-machine,M2M)通信中承担基站功能的设备等。网络设备还可以包括云接入网(cloud radio access network,C-RAN)系统中的集中式单元(centralized unit,CU)和分布式单元(distributed unit,DU、非陆地通信网络(non-terrestrial network,NTN)通信系统中的网络设备,即可以部署于高空平台或者卫星,本申请不作限制。
下面介绍本申请适用的一种可能的通信系统。
图1为本申请实施例应用的通信系统的一个示意图。请参阅图1,通信系统包括终端设备101、终端设备102、网络设备103、网络设备104和服务器105。终端设备101可以与网络设备103建立通信连接,终端设备102可以与网络设备103建立通信连接。
一种可能的实现方式中,终端设备101、终端设备102与网络设备103可以执行本申请提供的通信方法。从而降低终端设备上报其模型的相关信息的开销,节省通信开销。
需要说明的是,上述图1仅仅是一种示例。实际应用中,该通信系统中包括至少一个网络设备和至少一个终端设备。
分布式学习是实现联合学习的一种学习方法。具体的,多个节点设备利用本地数据训练得到本地模型,中心节点设备将多个本地模型融合得到全局模型。从而实现在保护节点设备的用户数据的隐私的前提下,实现联合学习。
多个节点设备可以分别训练其本地模型得到本地模型的相关参数。例如,本地模型的权重参数或权重梯度。然后,多个节点设备将本地模型的相关参数发送给中心节点设备。中心节点设备对多个节点设备发送的本地模型的相关参数进行融合得到全局模型的相关参数,并下发给各个节点设备。各个节点设备可以通过全局模型的相关参数更新该节点设备的本地模型。由上述技术方案可知,各个节点设备分别向中心节点设备发送本地模型的相关参数。导致节点设备上报的数据量较大,通信开销较大。因此,节点设备如何以较低的通信开销来上报本地模型的相关参数,是亟待解决的问题。
下面介绍本申请涉及的数学符号。
mean(x):表示求向量x中的所有元素的平均值。
abs(y):表示求向量y中每个元素的绝对值。
mean(x 1,y 1):表示求元素x 1和元素y 1的平均值。
下面结合具体实施例介绍本申请的技术方案。
图2为本申请实施例通信方法的一个实施例示意图。请参阅图2,方法包括:
201、第二装置向第一装置发送至少一个量化门限值。相应的,第一装置接收来自第二装置的至少一个量化门限值。
该至少一个量化门限值用于第一装置对第一模型的相关信息进行量化处理。可选的,该第一模型可以是第二装置为第一装置配置的模型。可选的,第一模型可以为神经网络模型。
可选的,第一模型的相关信息是第一装置对第一模型进行一轮训练得到的。
可选的,第一模型的相关信息包括第一模型的输出参数或更新参数。第一模型的输出参数可以理解为第一模型的输出数据,为了便于描述,后文将统一称为输出参数。第一模型的更新参数包括第一模型的权重参数或权重梯度。例如,第一模型为神经网络模型,第一模型的相关信息包括神经网络模型的输出参数。或者,第一模型的相关信息包括神经网络模型中的权重参数或权重梯度。
一种可能的实现方式中,第一装置为第一终端设备,第二装置为网络设备,该至少一个量化门限值可以承载于下行控制信息、无线资源控制(radio resource control,RRC)消息或媒体接入控制控制元素(medium accesscontrol control element,MAC CE)中。
另一种可能的实现方式中,第一装置为网络设备,第二装置为终端设备,该至少一个量化门限值可以承载于上行控制信息。
下面介绍第二装置确定该至少一个量化门限值的一种可能的实现方式。可选的,图2所示的实施例还包括步骤201a和步骤201b。步骤201a和步骤201b可以在步骤201之前执行。
201a、第一装置向第二装置发送第二信息。相应的,第二装置接收来自第一装置的第二信息。
下面介绍第二信息的两种可能的实现方式。
实现方式1:第二信息用于指示第一模型的相关信息经过处理得到的信息。
可选的,第二信息包括该第一模型的相关信息经过处理得到的信息,或者,第二信息指示该第一模型的相关信息经过处理得到的信息。
例如,第一模型的相关信息包括第一模型的输出参数。第一模型的相关信息经过处理得到的信息包括第一模型的输出参数的绝对值的平均值或加权值。例如,第一模型的输出参数包括第一模型的输出参数A、输出参数B和输出参数C。第一装置对输出参数A、输出参数B和输出参数C分别对应的绝对值进行平均得到输出参数的绝对值的平均值。第二信息包括该第一模型的输出参数的绝对值的平均值或加权值。或者,第二信息指示该第一模型的输出参数的绝对值的平均值或加权值。
例如,第二信息为指示信息,该指示信息的取值与第一模型的输出参数的绝对值的平 均值或加权值之间的对应关系可以如表1或表2所示:
表1
指示信息的取值 输出参数的绝对值的平均值或加权值
00 0.25
01 0.5
10 0.75
11 1
例如,第一模型的相关信息包括第一模型的更新参数。第一模型的相关信息经过处理得到的信息包括第一模型的更新参数的绝对值的平均值或加权值。例如,第一模型的更新参数包括第一装置对第一模型进行第Q轮训练得到的权重梯度
Figure PCTCN2022119814-appb-000001
权重梯度
Figure PCTCN2022119814-appb-000002
和权重梯度
Figure PCTCN2022119814-appb-000003
第一装置对权重梯度
Figure PCTCN2022119814-appb-000004
权重梯度
Figure PCTCN2022119814-appb-000005
和权重梯度
Figure PCTCN2022119814-appb-000006
分别对应的绝对值进行平均,得到第一模型的权重梯度的绝对值的平均值。第二信息包括该第一模型的更新参数的绝对值的平均值或加权值。或者,第二信息指示该第一模型的更新参数的绝对值的平均值或加权值。例如,第二信息为指示信息,该指示信息的取值与第一模型的更新参数的绝对值的平均值或加权值之间的对应关系可以如表2所示:
表2
指示信息的取值 更新参数的绝对值的平均值或加权值
00 0.5
01 1
10 1.5
11 2
实现方式2:第二信息用于指示第一装置对第一模型进行第M轮训练得到的相关信息经过处理得到的信息。第一模型的相关信息是第一装置对第一模型进行第Q轮训练得到的相关信息。M为大于或等于1且小于Q的整数,Q为大于1的整数。
实现方式2中,第二信息包括第一装置对第一模型进行第M轮训练得到的相关信息经过处理得到的信息;或者,第二信息指示第一装置对第一模型进行第M轮训练得到的相关信息经过处理得到的信息。关于第一装置对第一模型进行第M轮训练得到的相关信息经过处理得到的信息可以参阅前述第一模型的相关信息经过处理得到的信息的相关介绍。
实现方式2与实现方式1类似,具体可以参阅实现方式1的相关介绍。
一种可能的实现方式中,第一装置为终端设备,第二装置为网络设备,该第二信息可以承载于下行控制信息、RRC消息或MAC CE中。另一种可能的实现方式中,第一装置为网络设备,第二装置为终端设备,该第二信息可以承载于上行控制信息。
201b、第二装置根据第二信息确定该至少一个量化门限值。
例如,至少一个量化门限值包括一个量化门限值。第二信息包括该第一模型的权重梯度的绝对值的平均值。该量化门限值γ 1=mean(abs(Δw Q))*a,a为控制因子,用于控制量化处理的区间,a的取值范围为[0,+∞)。abs(Δw Q)表示第一装置对第一模型进行第Q轮训练得到的权重梯度的绝对值。
例如,至少一个量化门限值包括两个量化门限值,分别为第一量化门限值和第二量化 门限值。第一量化门限值γ 1=mean(abs(Δw Q))*a,第二量化门限值-γ 1=-mean(abs(Δw Q))*a。abs(Δw Q)表示第一装置对第一模型进行第Q轮训练得到的权重梯度的绝对值。
可选的,图2所示的实施例还包括步骤201c。步骤201c可以在步骤201之前执行。
201c、第三装置向第二装置发送第三信息。相应的,第二装置接收来自第三装置的第三信息。
第三信息用于指示第三装置的第二模型的相关信息经过处理得到的信息。或者,第三信息用于指示第三装置对第二模型进行第S轮训练并经过处理得到的信息。第二模型的相关信息是第三装置对第二模型进行第R轮训练得到的相关信息。S为大于或等于1且小于R的整数,R为大于1的整数。第三信息与第二信息类似,具体可以参阅前述关于第二信息的相关介绍。
需要说明的是,第二模型可以是第二装置为第三装置配置的模型。第一模型和第二模型可以是同一模型,例如,第一模型和第二模型都为第二装置配置的全局模型。本文第一模型和第二模型是为了区别第一装置和第二装置上的模型,实际可以是同一模型。
基于上述步骤201c,可选的,上述步骤201b具体包括:
第二装置根据第二信息和第三信息确定该至少一个量化门限值。
例如,第二信息包括第一模型的权重梯度的绝对值的平均值。第三信息包括第二模型的权重梯度的绝对值的平均值。第二装置根据第一模型的权重梯度的绝对值的平均值和第二模型的权重梯度的绝对值的平均值确定该至少一个量化门限值。例如,该至少一个量化门限值包括两个量化门限值,分别为第一量化门限值和第二量化门限值。第一量化门限值γ 1=mean(mean(abs(Δw Q)),mean(abs(Δw R)))*a,第二量化门限值-γ 1=-mean(mean(abs(Δw Q)),mean(abs(Δw R)))*a。第一装置对第一模型进行第Q轮训练得到的N个权重梯度通过向量Δw Q表示。第二装置对第二模型进行第R轮训练得到的N个权重梯度通过向量Δw R表示。
需要说明的是,上述步骤201a至步骤201c仅仅是以第二装置根据第一装置的第二信息和第三装置的第三信息确定该至少一个量化门限值为例进行本申请的技术方案。实际应用中,第二装置可以接收多个装置指示的模型的相关信息,并结合这些模型的相关信息确定该至少一个量化门限值,具体本申请不做限定。
需要说明的是,步骤201c与步骤201a之间没有固定的执行顺序。可以先执行步骤201a,再执行步骤201c;或者,可以先执行步骤201c,再执行步骤201a;或者,依据情况同时执行步骤201a和步骤201c,具体本申请不做限定。
202、第一装置根据至少一个量化门限值对第一装置的第一模型的相关信息进行量化处理。
由前述介绍可知,第一模型的相关信息包括第一模型的输出参数或更新参数。这里以第一模型的相关信息包括第一模型的N个参数为例介绍本申请的技术方案。N为大于或等于1的整数。因此上述步骤202具体包括:第一装置根据至少一个量化门限值对第一模型的N个参数进行量化处理,得到量化处理后的N个参数。例如,如图3所示,第一装置对第一模型进行第Q轮训练得到第一模型的相关信息。然后,第一装置对第一模型的相关信 息进行量化处理。
一种可能的实现方式中,该至少一个量化门限值包括一个量化门限值γ 1。第一模型的相关信息包括第一模型的N个参数。上述步骤202具体包括:若N个参数中第i个参数大于该量化门限值γ 1,则第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数。若N个参数中第i个参数小于或等于该量化门限值γ 1,则第一装置将第i个参数量化为第三值。或者,上述步骤202具体包括:若N个参数中第i个参数大于或等于该量化门限值γ 1,则第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数。若N个参数中第i个参数小于该量化门限值γ 1,则第一装置将第i个参数量化为第三值。
例如,第一值为+1,第三值为-1。第一模型的N个参数为第一模型的N个权重梯度。该N个权重梯度中第i个权重梯度表示为
Figure PCTCN2022119814-appb-000007
当该权重梯度
Figure PCTCN2022119814-appb-000008
大于量化门限值γ 1,则该权重梯度
Figure PCTCN2022119814-appb-000009
量化为+1,当该权重梯度
Figure PCTCN2022119814-appb-000010
小于或等于量化门限值γ 1,则该权重梯度
Figure PCTCN2022119814-appb-000011
量化为-1。量化处理后的第i个权重梯度s i可以通过如下公式1表示:
Figure PCTCN2022119814-appb-000012
上述示出了第一装置对第一模型的N个参数中第i个参数的量化过程,对于该N个参数中的其他参数的量化过程同样适用,具体这里不再一一说明。
需要说明的是,可选的,若N个参数中第i个参数大于或等于该量化门限值γ 1,则第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数。若N个参数中第i个参数小于或等于该量化门限值γ 1,则第一装置将第i个参数量化为第三值。也就是说,如果第i个参数等于该量化门限值γ 1,第一装置可以将该第i个参数量化为第一值或第三值。那么对于该情况,第一装置可以通过随机量化处理的方式随机将第i个参数量化为第一值或第三值。
另一种可能的实现方式中,至少一个量化门限值包括两个量化门限值,分别为第一量化门限值γ 1和第二量化门限值-γ 1。第一模型的相关信息包括第一模型的N个参数。上述步骤202具体包括:若N个参数中的第i个参数大于第一量化门限值γ 1时,第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数;若N个参数中的第i个参数小于或等于第一量化门限值γ 1且大于或等于第二量化门限值-γ 1时,第一装置将第i个参数量化为第二值;若N个参数中第i个参数小于第二量化门限值-γ 1时,第一装置将第i个参数量化为第三值。或者,上述步骤202具体包括:若N个参数中的第i个参数大于或等于第一量化门限值γ 1时,第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数;若N个参数中的第i个参数小于第一量化门限值γ 1且大于第二量化门限值-γ 1时,第一装置将第i个参数量化为第二值;或者,若N个参数中第i个参数小于或等于第二量化门限值-γ 1时,第一装置将第i个参数量化为第三值。
例如,第一值为+1,第二值为0,第三值为-1。第一模型的N个参数为第一模型的N个权重梯度。该N个权重梯度中第i个权重梯度表示为
Figure PCTCN2022119814-appb-000013
当该权重梯度
Figure PCTCN2022119814-appb-000014
大于第一量化门限值γ 1,则该权重梯度
Figure PCTCN2022119814-appb-000015
量化为+1。当该权重梯度
Figure PCTCN2022119814-appb-000016
小于第二量化门限值-γ 1,则该权重梯度
Figure PCTCN2022119814-appb-000017
量化为-1。当该权重梯度
Figure PCTCN2022119814-appb-000018
小于或等于第一量化门限值γ 1且大于或等于第二量化门限值-γ 1时,则该权重梯度
Figure PCTCN2022119814-appb-000019
量化为0。因此,量化处理后的第i个权重梯 度s i可以通过如下公式2表示:
Figure PCTCN2022119814-appb-000020
上述示出了第一装置对第一模型的N个参数中第i个参数的量化过程,对于该N个参数中的其他参数的量化过程同样适用,具体这里不再一一说明。上述实现方式中,第一装置可以通过多个量化门限值量化第一模型的参数,有利于提升量化精度。提升模型的收敛速度和性能。进一步的,由上述公式2可知,s i可以取值为0,表示当该第i个参数的取值落在第二量化门限值至第一量化门限值之间的区间范围时,第一装置可以不更新该第i个参数。例如,如果该第i个参数是由于训练噪声带来的,那么第一装置不更新该第i个参数,有利于提高第二装置训练得到的第一模型的准确性。
需要说明的是,可选的,若N个参数中的第i个参数大于或等于第一量化门限值γ 1时,第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数;若N个参数中的第i个参数小于或等于第一量化门限值γ 1且大于或等于第二量化门限值-γ 1时,第一装置将第i个参数量化为第二值;若N个参数中第i个参数小于第二量化门限值-γ 1时,第一装置将第i个参数量化为第三值。也就是说对于第i个参数来说,如果第i个参数等于第一量化门限值γ 1,第一装置可以将该第i个参数量化为第一值或第二值。那么对于该情况,第一装置可以通过随机量化处理的方式随机将第i个参数量化为第一值或第二值。
需要说明的是,可选的,若N个参数中的第i个参数大于第一量化门限值γ 1时,第一装置将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数;若N个参数中的第i个参数小于或等于第一量化门限值γ 1且大于或等于第二量化门限值-γ 1时,第一装置将第i个参数量化为第二值;若N个参数中第i个参数小于或等于第二量化门限值-γ 1时,第一装置将第i个参数量化为第三值。也就是说对于第i个参数来说,如果第i个参数等于第二量化门限值-γ 1,第一装置可以将该第i个参数量化为第二值或第三值。那么对于该情况,第一装置可以通过随机量化处理的方式随机将第i个参数量化为第二值或第三值。
上述示出了该至少一个量化门限值包括一个量化门限值和两个量化门限值的示例。实际应用中,该至少一个量化门限值可以包括三个量化门限值,四个量化门限值,或更多量化门限值,具体本申请不做限定,这里不再一一示例。
可选的,上述步骤202中,第一模型的相关信息包括第一模型的量化误差补偿后的N个参数。关于量化误差补偿后的N个参数请参阅下述步骤202a的相关介绍。
可选的,图2所示的实施例还包括步骤202a,步骤202a可以在步骤202之前执行。
202a、第一装置根据第一模型的N个参数分别对应的量化误差对N个参数进行误差补偿,得到量化误差补偿后的N个参数。
第一模型的N个参数是第一装置对第一模型进行第Q轮训练得到的。该N个参数中的第i个参数对应的量化误差是第一装置对第一模型进行第Q-1轮训练且经过量化误差补偿得到的第i个参数确定的。
例如,第一模型的N个参数中第i个参数为第i个权重梯度
Figure PCTCN2022119814-appb-000021
量化误差补偿后的第i个权重梯度可以表示为
Figure PCTCN2022119814-appb-000022
其中,
Figure PCTCN2022119814-appb-000023
表示量化误差补偿后的第Q-1轮训练得到的第i个权重梯度,η为全局学习率。
Figure PCTCN2022119814-appb-000024
表示对
Figure PCTCN2022119814-appb-000025
进行量化处理。
需要说明的是,第一装置可以确定第Q+1轮训练得到的第i个参数对应的量化误差
Figure PCTCN2022119814-appb-000026
Figure PCTCN2022119814-appb-000027
从而便于第一装置对第Q+1轮训练得到的N个参数进行量化误差补偿。
基于上述步骤202a,上述第一模型的相关信息包括量化误差补偿后的N个参数。可选的,上述步骤202具体包括:第一装置根据该至少一个量化门限值对量化误差补偿后的N个参数进行量化处理。具体的量化处理过程请参阅前述步骤202的相关介绍。
由此可知,上述步骤202a中第一装置对第一模型的N个参数分别对应的量化误差对该N个参数进行量化误差补偿,从而有利于提高第二装置更新第一模型的准确性,提升模型训练的性能。
可选的,上述步骤202中,第一模型的相关信息包括第一模型的经过稀疏处理的N个参数。关于第一模型的经过稀疏处理的N个参数请参阅下述步骤202b的相关介绍。
可选的,图2所示的实施例还包括步骤202b。步骤202b可以在步骤202之前执行。
202b、第一装置根据公共稀疏掩码从第一模型的K个参数中选择N个参数,得到第一模型的经过稀疏处理的N个参数。
一种可能的实现方式中,第一模型的K个参数是第一装置对第一模型进行一轮训练得到的。
另一种可能的实现方式中,第一模型的K个参数是第一装置对第一模型进行一轮训练并经过量化误差补偿得到的。第一装置对该K个参数进行量化误差补偿的过程与前述步骤202a类似,具体可以参阅前述步骤202a的相关介绍。
可选的,该公共稀疏掩码为比特序列,该比特序列包括K个比特。K个比特与该K个参数一一对应。当K个比特中的一个比特的取值为0时,用于指示第一装置不选择该比特对应的参数。当K个比特中的一个比特的取值为1时,用于指示第一装置选择该比特对应的参数。或者,当K个比特中的一个比特的取值为0时,用于指示第一装置选择该比特对应的参数。当K个比特中的一个比特的取值为1时,用于指示第一装置不选择该比特对应的参数。例如,K个参数包括第一模型的十个权重梯度。比特序列为1000111100,该比特序列从高位到低位与十个权重梯度一一对应。例如,比特序列的第一个比特对应该十个权重梯度中的第一个权重梯度。比特序列的第二个比特对应该十个权重梯度中的第二个权重梯度,以此类推。比特序列的第十个比特对应该十个权重梯度中的第十个权重梯度。那么可知,该第一模型的相关信息包括该十个权重梯度中的第一个权重梯度、第五个权重梯度、第六个权重梯度、第七个权重梯度以及第八个权重梯度。
下面介绍第一装置获取公共稀疏掩码的两种可能的实现方式。
实现方式1:公共稀疏掩码是第一装置根据稀疏比例和伪随机数确定的。该稀疏比例是第二装置向第一装置指示的。
需要说明的是,多个装置需要采用相同的公共稀疏掩码,从而实现多个装置中各个装置向第二装置发送各个装置上配置的模型的相同索引的参数。并且,该多个装置可以通过相同的时频资源发送索引相同的参数。有利于降低多个装置上报模型参数所需的通信资源。提高通信资源的利用率。从而支持第二装置在同一时频资源上接收到多个装置发送的索引相同的参数。即支持第二装置通过空中信号的叠加实现模型融合。
需要说明的是,第二装置在不同的训练阶段可以向第一装置指示不同的稀疏比例。例如,在训练开始阶段,该稀疏比例可以较小。这样方便第二装置获取更多模型的相关信息,实现模型的快速收敛。在训练收敛阶段,该稀疏比例可以较大。
实现方式2:下面结合步骤201e介绍实现方式2。
可选的,图2所示的实施例还包括步骤201e。步骤201e可以在步骤202b之前执行。
202b、第二装置向第一装置发送第二指示信息。相应的,第一装置接收来自第二装置的第二指示信息。第二指示信息用于指示公共稀疏掩码。
后文结合图4所示的实施例介绍第二装置确定该公共稀疏掩码的一种可能的实现方式,具体请参阅后文图4所示的实施例的相关介绍。
基于上述步骤202b,可选的,上述步骤202具体包括:第一装置根据该至少一个量化门限值对该第一模型的经过稀疏处理的N个参数进行量化处理。具体的量化处理过程可以参阅前述步骤202中的相关介绍。例如,如图3所示,第一装置对第一模型进行第Q轮训练得到第一模型的K个参数。然后,第一装置对第一模型的K个参数进行量化误差,得到量化误差补偿后的K个参数。第一装置根据公共稀疏掩码从量化误差补偿后的K个参数中选择N个参数,再根据该至少一个量化门限值对该N个参数进行量化处理。
由此可知,上述步骤202b中第一装置根据公共稀疏掩码选择第一模型的部分参数,从而有利于降低第一装置上报第一模型的参数的开销。
上述步骤201e与上述步骤201a、步骤201b、步骤201c和步骤201之间没有固定的执行顺序。可以先执行步骤201e,再执行步骤201a、步骤201b、步骤201c和步骤201。或者,可以先执行步骤201a、步骤201b、步骤201c和步骤201,再执行步骤201e;或者,依据情况同时步骤201e、步骤201a、步骤201b、步骤201c和步骤201。
203、第一装置向第二装置发送第一信息。第一信息用于指示量化处理后的第一模型的相关信息。相应的,第二装置接收来自第一装置的第一信息。
一种可能的实现方式中,第一信息包括量化处理后的第一模型的相关信息。例如,第一模型的相关信息包括第一模型的N个参数,第一信息包括量化处理后的第一模型的N个参数。
另一种可能的实现方式中,第一信息为指示信息,该指示信息指示量化处理后的第一模型的相关信息。
可选的,第一模型的相关信息包括量化处理后的第一模型的N个参数。下面介绍上述步骤203的一种可能的实现方式。可选的,上述步骤203具体包括步骤2003a和步骤2003b。
2003a、第一装置对量化处理后的第一模型的N个参数进行调制得到N个第一信号。其中,N个第一信号与该N个参数一一对应。
2003b、第一装置向第二装置发送N个第一信号。相应的,第二装置接收来自第一装置的N个第一信号。
下面结合上述公式2所示的量化示例介绍上述步骤2003a和步骤2003b。
第一装置对该量化处理后的第一模型的N个参数中的第i个参数进行调制得到第i个第一信号。该第i个第一信号对应两个序列,该两个序列中每个序列包括至少一个符号。 下面介绍第一装置发送该两个序列的两种可能的实现方式,从而便于第二装置确定该量化处理后的第i个参数的取值。
实现方式1:当量化处理后的第i个参数为第一值时,第一装置发送两个序列中的第一个序列的发送功率小于第一装置发送两个序列中的第二个序列的发送功率。当量化处理后的第i个参数为第二值时,第一装置发送该两个序列中的第一个序列的发送功率等于第一装置发送该两个序列中的第二个序列的发送功率。当量化处理后的第i个参数为第三值时,第一装置发送该两个序列中的第一个序列的发送功率大于第一装置发送两个序列中的第二个序列的发送功率。
可选的,当量化处理后的第i个参数为第一值时,该两个序列中的第一个序列为全0序列,第二个序列为非全0序列。当量化处理后的第i个参数为第二值时,该两个序列均为全0序列。当量化处理后的第i个参数为第三值时,两个序列中的第一个序列为非全0序列,第二个序列为全0序列。例如,第一值为+1,第二值为0,第三值为-1。第i个第一信号承载第i个参数s i,第i个参数对应的两个序列。对于第i个参数的各种取值下,对应的两个序列(即序列1和序列2)分别如表3所示:
表3
s i +1 0 -1
序列1 0 0 c 1
序列2 c 0 0 0
其中,c 0和c 1均为特定长度的序列。例如,c 0的长度和c 1的长度都为1,即都包括一个符号。可选的,c 0和c 1均可以为Zadoff–Chu序列,该Zadoff–Chu序列可以简称为ZC序列。
实现方式2:当量化处理后的第i个参数为第一值时,第一装置发送两个序列中的第一个序列的发送功率大于第一装置发送两个序列中的第二个序列的发送功率。当量化处理后的第i个参数为第二值时,第一装置发送该两个序列中的第一个序列的发送功率等于第一装置发送该两个序列中的第二个序列的发送功率。当量化处理后的第i个参数为第三值时,第一装置发送该两个序列中的第一个序列的发送功率小于第一装置发送两个序列中的第二个序列的发送功率。
可选的,当量化处理后的第i个参数为第一值时,该两个序列中的第一个序列为非全0序列,第二个序列为全0序列。当量化处理后的第i个参数为第二值时,该两个序列均为全0序列。当量化处理后的第i个参数为第三值时,两个序列中的第一个序列为全0序列,第二个序列为非全0序列。例如,第一值为+1,第二值为0,第三值为-1。第i个第一信号承载第i个参数s i,第i个参数对应的两个序列。对于第i个参数的各种取值下,对应的两个序列(即序列1和序列2)分别如表4所示:
表4
s i +1 0 -1
序列1 c 0 0 0
序列2 0 0 c 1
关于c 0和c 1请参阅前述的相关介绍,这里不再说明。
需要说明的是,上述示出了第一值、第二值和第三值的一种可能的示例。实际应用中,第一值、第二值和第三值也可以是其他取值,具体本申请不做限定。例如,第一值为0.7,第二值为0,第三值为-0.7。
如图3所示,第一装置对第一模型的N个参数进行量化处理后,得到量化处理后的第一模型的N个参数。第一装置对量化处理后的第一模型的N个参数进行调制,再将调制得到的序列映射到相应的时频资源上,并进行波形成型得到该N个第一信号。第一装置向第二装置发送该N个第一信号。由上述介绍可知,第一装置将第一模型的N个参数中每个参数调制到两个序列上。第一装置控制发送该两个序列中每个序列分别采用的发送功率,从而便于第二装置确定该参数的取值。第一装置无需进行信道的估计和均衡,从而无需相应的导频开销。
由此可知,第一装置接收来自第二装置的至少一个量化门限值。然后,第一装置根据至少一个量化门限值对第一装置的第一模型的相关信息进行量化处理。第一装置向第二装置发送第一信息,第一信息用于指示量化处理后的第一模型的相关信息。从而降低第一装置上报第一模型的相关信息的通信开销,节省通信资源。
可选的,图2所示的实施例还包括步骤204和步骤205。步骤204和步骤205可以在步骤203之后执行。
204、第二装置根据第一信息确定第一模型的全局信息。
第一模型的全局信息包括第一模型的全局输出参数。或者,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。第一模型的全局输出参数可以理解为第一模型的全局输出数据。第一模型的全局更新参数包括第一模型的全局权重参数或全局权重梯度。
可选的,第一模型的全局信息包括第一模型的N个全局参数,全局参数为输出参数或更新参数。关于N个全局参数的确定过程可以参阅后文的相关介绍。
可选的,第一信息包括量化处理后的第一模型的N个参数,第二装置可以根据该第一模型的N个参数确定全局学习率η。
例如,量化处理后的第一模型的N个参数包括第一装置对第一模型进行第Q轮训练且经过量化处理得到的N个权重梯度。具体通过向量Δw Q表示该第一模型的N个权重梯度。即向量Δw Q包括第一装置对第一模型进行第Q轮训练得到的N个权重梯度。第二装置可以确定全局学习率η=mean(abs(Δw q))。向量Δw q包括向量Δw Q中的量化处理后的不为0的权重参数。
需要说明的是,可选的,第一装置也可以向第二装置发送第六信息。该第六信息用于指示该第一模型的N个参数中经过量化处理后的不为0的参数的取值的绝对值的平均值。第二装置根据该第六信息确定该全局学习率。
例如,该第一模型的N个参数是第一装置对第一模型进行第Q轮训练得到的N个权重梯度,具体通过向量Δw Q表示该第一模型的N个权重梯度。那么第二装置可以确定全局学习率η=mean(abs(Δw q)),mean(abs(Δw q))是第一装置通过第六信息向第二装置指示的,abs(Δw q)是向量Δw Q中量化处理后的不为0的权重参数的取值的绝对值。
需要说明的是,可选的,全局学习率η是可变的。例如,全局学习率η是随着训练轮数 变化的常数。
需要说明的是,在上述步骤204中是以第二装置根据第一信息确定该全局学习率。实际应用中,第二装置可以根据第二信息确定该全局学习率。可选的,第二装置根据第二信息和第三信息确定该全局学习率,具体本申请不做限定。
一种可能的实现方式中,第一模型为神经网络模型。第一模型的相关信息包括神经网络模型的全部层的神经元的相关参数。基于该实现方式中,上述步骤204中第一模型的全局信息中包括的第一模型的N个全局参数是全部层的神经元的全局参数。
该实现方式中,该至少一个量化门限值和全局学习率是针对神经网络模型中各层的神经元统一设置的。
另一种可能的实现方式中,第一模型为神经网络模型。第一模型的相关信息包括神经网络模型的其中P层的神经元的相关参数,P为大于或等于1的整数。
基于该实现方式中,上述步骤204中第一模型的全局信息中包括的第一模型的N个全局参数是该P层的神经元的全局参数。
该实现方式中,该至少一个量化门限值和全局学习率是针对神经网络模型中该P层的神经元统一设置的。对于该神经网络模型的除了该P层之外的其他层的神经元,应当另外确定对应的量化门限值和全局学习率。
可选的,图2所示的实施例还包括步骤203a,步骤203a可以在步骤204之前执行。
203a、第三装置向第二装置发送第五信息。第五信息用于指示量化处理后的第二模型的相关信息。相应的,第二装置接收来自第三装置的第五信息。
具体的,量化处理后的第二模型的相关信息是第三装置根据该至少一个量化门限值量对第二模型的相关信息进行量化处理得到的。具体的量化处理过程可以参阅前述步骤202的相关介绍。
可选的,第二模型的相关信息包括第二模型的N个参数。关于第二模型请参阅前述的相关介绍。可选的,上述步骤203a具体包括步骤1和步骤2。
步骤1:第三装置对第二模型的N个参数进行调制得到N个第二信号。N个第二信号承载第二模型的N个参数,N个第二信号与第二模型的N个参数一一对应。
步骤2:第三装置向第二装置发送该N个第二信号。相应的,第二装置接收来自第三装置的N个第二信号。
步骤1至步骤2与前述步骤2003a至步骤2003b类似,具体可以参阅前述步骤2003a至步骤2003b的相关介绍,这里不再赘述。
可选的,N个第一信号中第i个第一信号对应第一序列和第二序列。第一序列是第i个第一信号对应的两个序列中的第一个序列,第二序列是第i个第一信号对应的两个序列中的第二个序列。N个第二信号中第i个第二信号对应第三序列和第四序列。第三序列是第i个第二信号对应的两个序列中的第一个序列,第四序列是第i个第二信号对应的两个序列中的第二个序列。i为大于或等于1且小于或等于N的整数。第一装置发送第一序列采用的时频资源与第三装置发送第三序列采用的时频资源相同。第一装置发送第二序列采用的时频资源与第三装置发送第四序列采用的时频资源相同。从而支持第二装置实现对多 用户空中信号叠加传输的非相干接收。
需要说明的是,步骤203和步骤203a之间没有固定的执行顺序。可以先执行步骤203,再执行步骤203a;或者,先执行步骤203a,再执行步骤203;或者,依据情况同时执行步骤203和步骤203a,具体本申请不做限定。
基于上述步骤203和步骤203a,可选的,上述步骤204具体包括:第二装置根据第一信息和第五信息确定第一模型的全局信息。
具体的,第二装置根据N个第一信号和N个第二信号确定该第一模型的全局信息。下面以N个第一信号中第i个第一信号对应第一序列和第二序列,N个第二信号中第i个第二信号对应第三序列和第四序列为例介绍上述步骤204的一种可能的实现方式。其中,第一装置发送第一序列采用的时频资源与第三装置发送第三序列采用的时频资源相同。第一装置发送第二序列采用的时频资源与第三装置发送第四序列采用的时频资源相同。
可选的,上述步骤204具体包括步骤204a至步骤204c。
204a、第二装置确定第二装置接收第一序列和第三序列的第一信号能量和。
例如,该第一信号能量和可以表示为
Figure PCTCN2022119814-appb-000028
204b、第二装置确定第二装置接收第二序列和第四序列的第二信号能量和。
例如,第二信号能量和可以表示为
Figure PCTCN2022119814-appb-000029
204c、第二装置根据第一信号能量和和第二信号能量和确定N个全局参数中的第i个全局参数。
基于前述步骤2003b中的实现方式1,可选的,上述步骤204c具体包括:
若第一信号能量和与判决门限值的和小于第二信号能量和,则第二装置确定第i个全局参数的取值为第一值;或者,若第一信号能量和与判决门限值的和大于或等于第二信号能量和,且第二信号能量和与判决门限值的和大于或等于第一信号能量和,则第二装置确定第i个全局参数的取值为第二值;或者,若第二信号能量和与判决门限值的和小于第一信号能量和,则第二装置确定第i个全局参数的取值为第三值。
例如,第一值为+1,第二值为0,第三值为-1。第一模型的全局信息包括第一模型的N个全局权重梯度,N个全局权重梯度第i个全局权重梯度a i可以表示为公式3:
Figure PCTCN2022119814-appb-000030
其中,γ 2为判决门限值,第一信号能量和可以表示为
Figure PCTCN2022119814-appb-000031
第二信号能量和可以表示为
Figure PCTCN2022119814-appb-000032
基于前述步骤2003b中的实现方式2,可选的,上述步骤204c具体包括:
若第一信号能量和大于第二信号能量和与判决门限值的和,则第二装置确定第i个全局参数的取值为第一值;或者,若第一信号能量和小于或等于第二信号能量和与判决门限值的和,且第二信号能量和小于或等于第一信号能量和与判决门限值的和,则第二装置确定第i个全局参数的取值为第二值;或者,若第二信号能量和大于第一信号能量和与判决门限值的和,则第二装置确定第i个全局参数的取值为第三值。
例如,第一值为+1,第二值为0,第三值为-1。第一模型的全局信息包括第一模型的N个全局权重梯度,N个全局权重梯度第i个全局权重梯度a i可以表示为公式4:
Figure PCTCN2022119814-appb-000033
其中,γ 2为判决门限值,第一信号能量和可以表示为
Figure PCTCN2022119814-appb-000034
第二信号能量和可以表示为
Figure PCTCN2022119814-appb-000035
上述步骤204a至步骤204c的过程示出了第二装置确定第i个全局参数的过程。第二装置可以采用类似的过程确定该N个全局参数中的其他全局参数,具体这里不再一一说明。
需要说明的是,第二装置可以结合该N个第一信号和/或该N个第二信号确定该判决门限值。例如,第一装置向第二装置发送第i个第一信号,第三装置向第二装置发送第i个第二信号。第i个第一信号和第i个第二信号占用相同的时频资源。第二装置在该时频资源上接收到的叠加信号y i。对于其他第一信号和第二信号同样类似,这里不再一一举例说明。例如,判决门限值γ 2=mean(abs(|y 2i| 2-|y 2i-1| 2),0<i≤N,i为整数)*b。其中,
Figure PCTCN2022119814-appb-000036
表示第一信号能量和,
Figure PCTCN2022119814-appb-000037
表示第二信号能量和,关于第一信号能量和和第二信号能量和请参阅前述的相关介绍。b是控制因子,用于控制判决的门限,影响全局参数中非0元素的个数和第一模型的更新。
由此可知,第二装置可以通过第二装置接收第i个第一信号对应的两个序列的信号能量以及接收第i个第二信号对应的两个序列的信号能量确定第i个全局参数。从而支持第二装置实现对多用户空中信号叠加传输的非相干接收,实现对衰落信道鲁棒。
可选的,第二装置可以根据第一信息和第五信息确定该全局学习率。
例如,量化处理后的第一模型的N个参数包括第一装置对第一模型进行第Q轮训练且经过量化处理得到的N个权重梯度。具体通过向量Δw Q表示该第一模型的N个权重梯度。即向量Δw Q包括第一装置对第一模型进行第Q轮训练得到的N个权重梯度。量化处理后的第二模型的N个参数包括第二装置对第二模型进行第R轮训练且经过量化处理得到的N个权重梯度。具体通过向量Δw R表示该第二模型的N个权重梯度。即向量Δw R包括第二装置对第二模型进行第Q轮训练得到的N个权重梯度。因此,第二装置可以确定全局学习率η=mean(mean(abs(Δw q)),mean(abs(Δw r)))。向量Δw q包括向量Δw Q中量化处理后的不为0的权重参数。向量Δw r包括向量Δw R中的量化处理后的不为0的权重梯度。
需要说明的是,可选的,第一装置可以向第二装置发送第六信息。该第六信息用于指示该第一模型的N个参数中经过量化处理后的不为0的参数的取值的绝对值。第三装置向第二装置发送第七信息。该第七信息用于指示第二模型的N个参数中经过量化处理后的不为0的参数的取值的绝对值的平均值。第二装置根据第六信息和第七信息确定该全局学习率。
例如,量化处理后的第一模型的N个参数包括第一装置对第一模型进行第Q轮训练且经过量化处理得到的N个权重梯度。具体通过向量Δw Q表示该第一模型的N个权重梯度。即向量Δw Q包括第一装置对第一模型进行第Q轮训练得到的N个权重梯度。量化处理后的第二模型的N个参数包括第二装置对第二模型进行第R轮训练且经过量化处理得到的N个权重梯度。具体通过向量Δw R表示该第二模型的N个权重梯度。即向量Δw R包括第二装置对 第二模型进行第Q轮训练得到的N个权重梯度。第一装置通过第六信息向第二装置指示向量Δw Q中量化处理后的不为0的权重梯度的取值的绝对值的平均值mean(abs(Δw q))。第三装置通过第七信息向第二装置指示向量Δw R中量化处理后的不为0的权重梯度的取值的绝对值的平均值mean(abs(Δw r))。那么第二装置可以确定全局学习率η=mean(mean(abs(Δw q)),mean(abs(Δw r)))。
205、第二装置向第一装置发送第四信息。第四信息用于指示第二装置确定的第一模型的全局信息。相应的,第一装置接收来自第二装置的第四信息。
其中,第四信息包括第二装置确定的第一模型的全局信息。或者,第四信息指示第二装置确定的第一模型的全局信息。例如,第二装置对该第一模型的全局信息进行编码或调制得到该第四信息,并通过第四信息向第一装置指示该第一模型的全局信息。关于第一模型的全局信息请参阅前述的相关介绍。
例如,第四信息包括第二装置确定的第一模型的N个全局权重梯度。该N个全局权重梯度通过向量A表示。因此第一装置可以将第一模型的权重参数更新为w Q=w Q-1+η*A。w Q-1为第一装置对第一模型进行第Q-1轮更新得到第一模型的全局权重参数。w Q为第一装置对第一模型进行第Q轮更新得到的第一模型的全局权重参数。η为全局学习率。
例如,第四信息包括第二装置确定的第一模型的N个全局输出参数。第一装置可以对第一模型进行第Q+1轮训练得到第一模型的N个实际输出参数。第一装置根据该N个实际输出参数和该N个全局输出参数训练第一模型得到该第一模型的权重参数。
可选的,图2所示的实施例还包括步骤201d。步骤201d可以在步骤203之前执行。
201d、第二装置向第一装置发送第一指示信息。第一指示信息用于指示第一装置向第二装置发送第一信息的发送次数L。相应的,第一装置接收来自第二装置的第一指示信息。其中,L为大于或等于1的整数。
基于上述步骤201d,可选的,上述步骤203具体包括:第一装置向第二装置发送L次第一信息。相应的,第二装置接收L次来自第一装置的第一信息。
在该实现方式中,第二装置可以指示第一装置重复多次向第二装置发送第一信息。由上述步骤204的相关介绍可知,第二装置基于能量的梯度判决会因为信道噪声和信号非相干叠加的随机性出现判决错误。因此,第一装置重复发送该第一信息,有利于第二装置分别判决后选择出现次数最多的判决结果作为最好的判决结果,从而降低判决错误概率,进而提升模型训练的性能。
例如,如图3所示,第一装置将第一模型的N个参数进行量化处理后,得到量化处理后的第一模型的N个参数。第一装置对量化处理后的第一模型的N个参数进行调制。第一装置可以按照发送次数L将调制得到的序列映射到相应的时频资源上,并进行波形成型得到相应的第一信号。第一装置向第二装置发送该第一信号。例如,L等于2,第一装置可以重复两次将调制得到的序列映射到相应的时频资源上。
需要说明的是,可选的,发送次数L可以结合模型的训练阶段、参与模型训练的用户数量以及信道的信噪比中的至少一项因素设定。例如,在训练的后期阶段,参与模型训练的用户数较少以及信噪比较低的情况下,发送次数可以较大。
需要说明的是,上述是以第二装置结合第一信息确定全局学习率的方式为例介绍本申 请的技术方案。第二装置可以结合第一信息和/或第三信息确定该全局学习率,具体本申请不做限定。
需要说明的是,上述图2所示的实施例中介绍的是第二装置根据第二信息和第三信息确定至少一个量化门限值的方案。实际应用中,第二装置可以向第一装置发送第三信息。第一装置根据第二信息和第三信息自行确定该至少一个量化门限值,具体本申请不做限定。
本申请还提供另一个实施例,该实施例与图2所示的实施例类似,不同的地方在于:步骤204。上述步骤204替换为步骤2004a,本实施例还包括步骤2004b和步骤2004c。步骤2004b和步骤2004c可以在步骤205之前执行。
2004a、第二装置向第四装置发送第一信息。相应的,第四装置接收来自第二装置的第一信息。
关于第一信息请参阅前述图2所示的实施例中的步骤203的相关介绍。例如,第二装置为网络设备,第四装置为服务器。服务器可以接收来自网络设备发送的第一信息。
2004b、第四装置根据该第一信息确定第一模型的全局信息。
步骤2004b与前述图2所示的实施例中的步骤204类似,具体可以参阅前述图2所示的实施例中的步骤204的相关介绍。
可选的,本实施例还包括步骤2004d,步骤2004d可以在步骤2004b之前执行。
2004d、第二装置向第四装置发送第五信息。相应的,第四装置接收来自第二装置的第五信息。
关于第五信息请参阅前述图2所示的实施例中的步骤203a中的相关介绍。
需要说明的是,步骤2004a与步骤2004d之间没有固定的执行顺序。可以先执行步骤2004a,再执行步骤2004d;或者,先执行步骤2004d,再执行步骤2004a;或者依据情况同时执行步骤2004a和步骤2004d。
2004c、第四装置向第二装置发送第四信息,该第四信息用于指示确定的第二模型的全局信息。相应的,第二装置接收来自第四装置的第四信息。
关于第四信息请参阅前述图2所示的实施例中的步骤205的相关介绍,这里不再赘述。
需要说明的是,第一装置可以为第一终端设备。第二装置可以为网络设备。第三装置可以为第二终端设备。第四装置可以为服务器。上述实施例介绍的是服务器获取该网络设备所管理的终端设备的模型的相关信息,并结合这些模型的相关信息确定第一模型的全局信息的过程。实际应用中,服务器可以获取多个网络设备分别所管理的终端设备的模型的相关信息,并结合这些模型的相关信息确定第一模型的全局信息的过程,具体本申请不做限定。
下面结合图4介绍第二装置确定公共稀疏掩码的一种可能的实现方式。
图4为本申请实施例通信方法的另一个实施例示意图。请参阅图4,方法包括:
401、第一装置向第二装置发送第三指示信息。第三指示信息用于指示第一装置对第一模型进行一轮训练得到的K个参数中对应的取值的绝对值最大的N个参数的索引。相应的,第二装置接收来自第一装置的第三指示信息。
该第一模型的K个参数是第一装置对第一模型进行一轮训练得到的。第一装置确定该 K个参数中对应的取值的绝对值最大的N个参数。然后,第一装置向第二装置发送该第三指示信息。
可选的,第三指示信息为比特序列,该比特序列包括K个比特,K个比特与该第一模型的K个参数一一对应。当比特序列中的一个比特的取值为0时,表示该第一装置不指示该比特对应的参数;当比特序列中的一个比特的取值为1时,表示该第一装置指示该比特对应的参数。关于比特序列的相关示例可以参阅后文图5的相关介绍。
402、第三装置向第二装置发送第四指示信息。第四指示信息用于指示第二装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引。相应的,第二装置接收来自第一装置的第四指示信息。
该第二模型的K个参数是第三装置对第二模型进行一轮训练得到的。第三装置确定该第二模型的K个参数中对应的取值的绝对值最大的N个参数。然后,第三装置向第二装置发送该第四指示信息。
可选的,第四指示信息的形式与第三指示信息类似,具体可以参阅前述步骤401中的相关介绍,这里不再赘述。
403、第二装置根据第三指示信息和第四指示信息确定公共稀疏掩码。
关于公共稀疏掩码请参阅前述图2所示的实施例中的相关介绍,这里不再赘述。
需要说明的是,上述图4介绍的是第二装置根据第三指示信息和第四指示信息确定公共稀疏掩码的过程。实际应用中,第二装置可以接收多个装置中每个装置发送的用于指示该装置的模型的K个参数中对应的取值的绝对值最大的N个参数的指示信息。然后,第二装置结合该多个装置的指示信息确定该公共稀疏掩码。
例如,如图5所示,网络设备可以接收来自多个终端设备中每个终端设备的用于指示该终端设备的模型的K个参数中对应的取值的绝对值最大的N个参数的指示信息。如图5所示,第一终端设备向网络设备发送第一比特序列,该第一比特序列为110010100。该第一比特序列中每个比特对应第一终端设备的模型的K个参数中的一个参数,即K等于9。例如,该第一比特序列中第一个比特对应K个参数中的第一个参数,第二个比特对应K个参数中的第二个参数,以此类推,最后一个比特对应K个参数中的最后一个参数。该第一比特序列中取值为1的比特对应的参数是该九个参数中对应的取值的绝对值最大的四个参数。第一终端设备通过该第一比特序列向该网络设备指示该四个参数的索引。
对于第二终端设备和第三终端设备同样类似。例如,第二终端设备向网络设备发送第二比特序列,该第二比特序列为101000101。该第二比特序列中每个比特对应第二终端设备的模型的K个参数中的一个参数,即K等于9。该第二比特序列中取值为1的比特对应的参数是该九个参数中对应的取值的绝对值最大的四个参数。第二终端设备通过该第二比特序列向该网络设备指示该四个参数的索引。第三终端设备向网络设备发送第三比特序列,该第三比特序列为110001001。该第三比特序列中每个比特对应第三终端设备的模型的K个参数中的一个参数,即K等于9。该第三比特序列中取值为1的比特对应的参数是该九个参数中对应的取值的绝对值最大的四个参数。第三终端设备通过该第三比特序列向该网络设备指示该四个参数的索引。可选的,网络设备根据第一比特序列、第二比特序列和第 三比特序列确定公共稀疏掩码。如图5所示,该公共稀疏掩码为比特序列,具体为110001101。网络设备通过该比特序列指示终端设备上报该比特序列中取值为1的比特对应的模型参数。从而降低终端设备上报模型参数的开销,节省通信资源。
上述各个方法实施例可以单独实施,也可以结合实施。各实施例中涉及的术语和相关技术可以互相参考。也就是说不同实施例之间不矛盾或逻辑上没有冲突的技术方案之间是可以相互结合的,具体本申请不做限定。
下面对本申请实施例提供的第一装置进行描述。请参阅图6,图6为本申请实施例第一装置的一个结构示意图。第一装置600可以用于执行图2和图4所示的实施例中第一装置执行的步骤,具体请参阅上述方法实施例的相关介绍。
第一装置600包括收发模块601和处理模块602。
收发模块601,用于接收来自第二装置的至少一个量化门限值;
处理模块602,用于根据至少一个量化门限值对第一装置600的第一模型的相关信息进行量化处理;
收发模块601,还用于向第二装置发送第一信息,第一信息用于指示量化处理后的第一模型的相关信息。
一种可能的实现方式中,第一模型的相关信息包括:第一模型的输出参数或更新参数,更新参数包括第一模型的权重梯度或权重参数。
另一种可能的实现方式中,收发模块601还用于:
向第二装置发送第二信息;其中,第二信息用于指示第一模型的相关信息经过处理得到的信息;或者,
第二信息用于指示第一装置600对第一模型进行第M轮训练得到的相关信息经过处理得到的信息,第一模型的相关信息是第一装置600对第一模型进行第Q轮训练得到的相关信息,M为大于或等于1且小于Q的整数,Q为大于1的整数。
另一种可能的实现方式中,第一模型的相关信息包括第一模型的输出参数,第一模型的相关信息经过处理得到的信息包括第一模型的输出参数的取值的绝对值的平均值;或者,
第一模型的相关信息包括第一模型的更新参数,第一模型的相关信息经过处理得到的信息包括第一模型的更新参数的取值的绝对值的平均值。
另一种可能的实现方式中,收发模块601还用于:
接收来自第二装置的第三信息,第三信息用于指示第一模型的全局信息。
另一种可能的实现方式中,第一模型的全局信息包括第一模型的全局输出参数;或者,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。
另一种可能的实现方式中,第一模型的相关信息包括第一模型的N个参数,N为大于或等于1的整数;处理模块602具体用于:
根据至少一个量化门限值对N个参数进行量化处理,得到量化处理后的N个参数;
收发模块601具体用于:
对量化处理后的N个参数进行调制得到N个第一信号;
向第二装置发送N个第一信号。
另一种可能的实现方式中,至少一个量化门限值包括第一量化门限值和第二量化门限值;处理模块602具体用于:
若N个参数中的第i个参数大于第一量化门限值时,将第i个参数量化为第一值,i为大于或等于1且小于或等于N的整数;或者,
若N个参数中的第i个参数小于或等于第一量化门限值且大于或等于第二量化门限值时,将第i个参数量化为第二值;或者,
若N个参数中第i个参数小于第二量化门限值时,将第i个参数量化为第三值。
另一种可能的实现方式中,收发模块601具体用于:
对量化处理后的第i个参数进行调制得到第i个第一信号,该第i个第一信号对应两个序列;
当量化处理后的第i个参数为第一值时,第一装置600发送两个序列中的第一个序列的发送功率小于第一装置600发送所述两个序列中的第二个序列的发送功率;当量化处理后的第i个参数为第二值时,第一装置600发送两个序列中的第一个序列的发送功率等于第一装置600发送两个序列中的第二个序列的发送功率;当量化处理后的第i个参数为第三值时,第一装置600发送两个序列中的第一个序列的发送功率大于第一装置600发送所述两个序列中的第二个序列的发送功率。
另一种可能的实现方式中,当量化处理后的第i个参数为第一值时,两个序列中的第一个序列为非全0序列,第二个序列为全0序列;当量化处理后的第i个参数为第二值时,两个序列均为全0序列;当量化处理后的第i个参数为第三值时,两个序列中的第一个序列为全0序列,第二个序列为非全0序列。
另一种可能的实现方式中,收发模块601具体用于:
向第二装置发送L次第一信息,L为大于或等于1的整数。
另一种可能的实现方式中,收发模块601还用于:
接收来自第二装置的第一指示信息,第一指示信息用于指示第一装置600向第二装置发送第一信息的发送次数L。
另一种可能的实现方式中,第一模型的相关信息包括第一模型的量化误差补偿后的N个参数,量化误差补偿后的N个参数是第一装置600根据第一装置600对第一模型进行第Q轮训练得到的N个参数分别对应的量化误差对N个参数进行误差补偿得到的,N个参数中的第i个参数对应的量化误差是根据第一装置600对第一模型进行第Q-1轮训练得到的第i个参数和量化误差补偿后的第Q-1轮训练得到的第i个参数确定的,i为大于或等于1且小于或等于N的整数,N为大于或等于1的整数,Q为大于1的整数。
另一种可能的实现方式中,第一模型的相关信息包括第一模型的经过稀疏处理得到的N个参数;第一模型的经过稀疏处理得到的N个参数是第一装置600根据公共稀疏掩码从第一模型的K个参数中选择N个参数,第一模型的K个参数是第一装置600对第一模型进行一轮训练得到的参数,K为大于或等于N的整数,K为大于或等于1的整数,N为大于或等于1的整数。
另一种可能的实现方式中,公共稀疏掩码为比特序列,比特序列包括K个比特,K个 比特与K个参数一一对应;当K个比特中的一个比特的取值为0时,用于指示第一装置600不选择该比特对应的参数;当K个比特中的一个比特的取值为1时,用于指示第一装置600选择该比特对应的参数。
另一种可能的实现方式中,公共稀疏掩码是第一装置600根据稀疏比例和伪随机数确定的,稀疏比例是第二装置向第一装置600指示的。
另一种可能的实现方式中,收发模块601还用于:
接收来自第二装置的第二指示信息,第二指示信息用于指示公共稀疏掩码。
另一种可能的实现方式中,收发模块601还用于:
向第二装置发送第三指示信息,第三指示信息用于指示K个参数中对应的取值的绝对值最大的N个参数的索引。
另一种可能的实现方式中,第一模型为神经网络模型,第一模型的相关信息包括神经网络模型的其中P层的神经元的相关参数,P为大于或等于1的整数。
图7为本申请实施例第一装置的另一个结构示意图。请参阅图7,第一装置700可以用于执行图4所示的实施例中第一装置执行的步骤,具体请参阅上述方法实施例的相关介绍。
第一装置700包括收发模块701。可选的,第一装置700还包括处理模块702。
收发模块701,用于向第二装置发送第一指示信息,第一指示信息用于指示第一装置700的第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第一模型的K个参数是第一装置700对第一模型进行一轮训练得到的K个参数,K为大于或等于所述N的整数,K为大于或等于1的整数,N为大于或等于1的整数;接收来自第二装置的第二指示信息;该第二指示信息用于指示公共稀疏掩码,公共稀疏掩码是第二装置根据第一指示信息确定的;公共稀疏掩码用于指示第一装置700上报第一装置训练第一模型得到的部分参数。
下面对本申请实施例提供的第二装置进行描述。请参阅图8,图8为本申请实施例第二装置的一个结构示意图。第二装置800可以用于执行图2和图4所示的实施例中第二装置执行的步骤,具体请参阅上述方法实施例的相关介绍。
第二装置800包括收发模块801。可选的,第二装置800还包括处理模块802。
收发模块801,用于向第一装置发送至少一个量化门限值,至少一个量化门限值用于对第一装置的第一模型的相关信息进行量化处理;接收来自第一装置发送的第一信息,第一信息用于指示量化处理后的第一模型的相关信息。
一种可能的实现方式中,第一模型的相关信息包括:第一模型的输出参数或更新参数,更新参数包括第一模型的权重梯度或权重参数。
另一种可能的实现方式中,收发模块801还用于:
接收来自第一装置的第二信息;其中,第二信息用于指示第一模型的相关信息经过处理得到的信息;或者,
第二信息用于指示第一装置对第一模型进行第M轮训练并经过处理得到的信息,第一模型的相关信息是第一装置对第一模型进行第Q轮训练得到的相关信息,M为大于或等于1 且小于Q的整数,Q为大于1的整数;
处理模块802,用于根据第二信息确定至少一个量化门限值。
另一种可能的实现方式中,第一模型的相关信息包括第一模型的输出参数,第一模型的相关信息经过处理得到的信息包括第一模型的输出参数的取值的绝对值的平均值;或者,
第一模型的相关信息包括第一模型的更新参数,第一模型的相关信息经过处理得到的信息包括第一模型的更新参数的取值的绝对值的平均值。
另一种可能的实现方式中,收发模块801还用于:
接收来自第三装置的第三信息;其中,第三信息用于指示第三装置的第二模型的相关信息经过处理得到的信息;或者,第三信息用于指示第三装置对第二模型进行第S轮训练并经过处理得到的信息,第二模型的相关信息是第三装置对第二模型进行第R轮训练得到的相关信息,S为大于或等于1且小于R的整数,R为大于1的整数;
处理模块802,用于根据第二信息和第三信息确定至少一个量化门限值。
另一种可能的实现方式中,处理模块802还用于:
根据第一信息确定第一模型的全局信息;
收发模块801还用于:
向第一装置发送第四信息,第四信息用于指示第一模型的全局信息。
另一种可能的实现方式中,第一模型的全局信息包括第一模型的全局输出参数;或者,第一模型的全局信息包括第一模型的全局更新参数和/或全局学习率。
另一种可能的实现方式中,收发模块801还用于:
接收来自第三装置的第五信息,第五信息用于指示第三装置的第二模型的相关信息;
处理模块802具体用于:
根据第一信息和第五信息确定第一模型的全局信息。
另一种可能的实现方式中,第一模型的相关信息包括第一模型的N个参数,N为大于或等于1的整数;第二模型的相关信息包括第二模型的N个参数;
收发模块801具体用于:
接收来自第一装置的N个第一信号,N个第一信号承载第一模型的N个参数,N个第一信号与第一模型的N个参数一一对应;
收发模块801具体用于:
接收来自第三装置的N个第二信号,N个第二信号承载第二模型的N个参数,N个第二信号与第二模型的N个参数一一对应;
处理模块802具体用于:
根据N个第一信号和N个第二信号确定第一模型的全局信息。
另一种可能的实现方式中,N个第一信号中第i个第一信号对应第一序列和第二序列,N个第二信号中第i个第二信号对应第三序列和第四序列,第一装置发送第一序列采用的时频资源与第三装置发送第三序列采用的时频资源相同,第一装置发送第二序列采用的时频资源与第三装置发送所述第四序列采用的时频资源相同;第一模型的全局信息包括第一模型的N个全局参数;i为大于或等于1且小于或等于N的整数;处理模块802具体用于:
确定第二装置800接收第一序列和第三序列的第一信号能量和;
确定第二装置800接收第二序列和第四序列的第二信号能量和;
根据第一信号能量和与第二信号能量和确定N个全局参数中的第i个全局参数。
另一种可能的实现方式中,处理模块802具体用于:
若第一信号能量和小于第二信号能量和与判决门限值的和,则确定第i个全局参数的取值为第一值;或者,
若第一信号能量和大于或等于第二信号能量和与判决门限值的和,且第二信号能量和小于或等于第一信号能量和与判决门限值的和,则确定第i个全局参数的取值为第二值;或者,
若第二信号能量和大于第一信号能量和与判决门限值的和,则确定第i个全局参数的取值为第三值。
另一种可能的实现方式中,收发模块801还用于:
向第一装置发送第一指示信息,第一指示信息用于指示第一装置向第二装置800发送第一信息的发送次数L,L为大于或等于1的整数。
另一种可能的实现方式中,收发模块801还用于:
向第一装置发送第二指示信息,第二指示信息用于指示公共稀疏掩码,公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数。
另一种可能的实现方式中,收发模块801还用于:
接收来自第一装置的第三指示信息,第三指示信息用于指示第一装置对第一模型进行一轮训练得到的K个参数中对应的取值的绝对值最大的N个参数的索引;
接收来自第三装置的第四指示信息,第四指示信息用于指示第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第二模型的K个参数是第三装置对第二模型进行一轮训练得到的K个参数;
处理模块802还用于:
根据第三指示信息和第四指示信息确定公共稀疏掩码。
图9为本申请实施例第二装置的另一个结构示意图。请参阅图9,第二装置900包括可以用于执行图4所示的实施例中第二装置执行的步骤,具体请参阅上述方法实施例的相关介绍。
第二装置900包括收发模块901和处理模块902。
收发模块901,用于接收来自第一装置的第一指示信息,第一指示信息用于指示第一装置的第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第一模型的K个参数是第一装置对第一模型进行一轮训练得到的K个参数,K为大于或等于所述N的整数,K为大于或等于1的整数,N为大于或等于1的整数;
处理模块902,用于根据第一指示信息确定公共稀疏掩码,公共稀疏掩码用于指示第一装置上报第一装置训练第一模型得到的部分参数;
收发模块901,还用于向第一装置发送第二指示信息,第二指示信息用于指示公共稀疏掩码。
一种可能的实现方式中,收发模块901还用于:
接收来自第三装置的第三指示信息,第三指示信息用于指示第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,第二模型的K个参数是第三装置对第二模型进行一轮训练得到的K个参数;
处理模块902具体用于:
根据第一指示信息和第三指示信息确定公共稀疏掩码。
本申请实施例还提供一种终端设备。图10是本申请实施例提供的终端设备1000的结构示意图。该终端设备1000可应用于如图1所示的系统中,例如终端设备1000可以为图1系统中的终端设备,用以执行上述方法实施例中第一装置或第二装置的功能。
如图所示,该终端设备1000包括处理器1010和收发器1020。可选地,该终端设备1000还包括存储器1030。其中,处理器1010、收发器1020和存储器1030之间可以通过内部连接通路互相通信,传递控制和/或数据信号,该存储器1030用于存储计算机程序,该处理器1010用于从该存储器1030中调用并运行该计算机程序,以控制该收发器1020收发信号。可选地,终端设备1000还可以包括天线1040,用于将收发器1020输出的上行数据或上行控制信令通过无线信号发送出去。
上述处理器1010可以和存储器1030可以合成一个处理装置,处理器1010用于执行存储器1030中存储的程序代码来实现上述功能。具体实现时,该存储器1030也可以集成在处理器1010中,或者独立于处理器1010。例如,该处理器1010可以与图6中的处理模块602对应。或者,该处理器1010可以与图7中的处理模块702对应。或者,该处理器1010可以与图8中的处理模块802对应。或者,该处理器1010可以与图9中的处理器902对应。
上述收发器1020可以与图6中的收发模块601对应,或者,该收发器1002可以与图7中的收发模块701对应。或者,该收发器1002可以与图8中的收发模块801对应。或者,该收发器1002可以与图9中的收发按模块901对应。该收发器1020也可以称为收发单元。收发器1020可以包括接收器(或称接收机、接收电路)和发射器(或称发射机、发射电路)。其中,接收器用于接收信号,发射器用于发射信号。
应理解,图10所示的终端设备1000能够实现图2和图4所示方法实施例中涉及第一装置或第二装置的各个过程。终端设备1000中的各个模块的操作和/或功能,分别为了实现上述装置实施例中的相应流程。具体可参见上述装置实施例中的描述,为避免重复,此处适当省略详述描述。
上述处理器1010可以用于执行前面装置实施例中描述的由第一装置或第二装置内部实现的动作,而收发器1020可以用于执行前面装置实施例中描述的第一装置或第二装置的收发动作。具体请见前面装置实施例中的描述,此处不再赘述。
可选地,上述终端设备1000还可以包括电源1050,用于给终端设备中的各种器件或电路提供电源。
除此之外,为了使得终端设备的功能更加完善,该终端设备1000还可以包括输入单元1060、显示单元1070、音频电路1080、摄像头1090和传感器1000等中的一个或多个,所述音频电路还可以包括扬声器1082、麦克风1084等。
本申请还提供一种网络设备。请参阅图11,图11是本申请实施例提供的网络设备1100的结构示意图,该网络设备1100可应用于如图1所示的系统中,例如网络设备1100可以为图1所示的系统中的网络设备,用以执行上述方法实施例中第一装置或第二装置的功能。应理解以下仅为示例,未来通信系统中,网络设备可以有其他形态和构成。
举例来说,在5G通信系统中,网络设备1100可以包括CU、DU和AAU,相比于LTE通信系统中的网络设备由一个或多个射频单元,如远端射频单元(remote radio unit,RRU)和一个或多个基带单元(base band unit,BBU)来说:
原BBU的非实时部分将分割出来,重新定义为CU,负责处理非实时协议和服务、BBU的部分物理层处理功能与原RRU及无源天线合并为AAU、BBU的剩余功能重新定义为DU,负责处理物理层协议和实时服务。简而言之,CU和DU,以处理内容的实时性进行区分、AAU为RRU和天线的组合。
CU、DU、AAU可以采取分离或合设的方式,所以,会出现多种网络部署形态,一种可能的部署形态如图11所示与传统4G网络设备一致,CU与DU共硬件部署。应理解,图11只是一种示例,对本申请的保护范围并不限制,例如,部署形态还可以是DU部署在BBU机房,CU集中部署或DU集中部署,CU更高层次集中等。
所述AAU11100可以实现收发功能称为收发单元11100,与图6中的收发模块601对应。或者,所述AAU11100可以实现收发功能称为收发单元11100,与图7中的收发模块701对应。或者,所述AAU11100可以实现收发功能称为收发单元11100,与图8中的收发模块801对应。或者,所述AAU11100可以实现收发功能称为收发单元11100,与图9中的收发模块901对应。可选地,该收发单元11100还可以称为收发机、收发电路、或者收发器等,其可以包括至少一个天线11101和射频单元11102。可选地,收发单元11100可以包括接收单元和发送单元,接收单元可以对应于接收器(或称接收机、接收电路),发送单元可以对应于发射器(或称发射机、发射电路)。
所述CU和DU11200可以实现内部处理功能称为处理单元11200,与图6中的处理模块602对应。或者,所述CU和DU11200可以实现内部处理功能称为处理单元11200,与图7中的处理模块702对应。所述CU和DU11200可以实现内部处理功能称为处理单元11200,与图8中的处理模块802对应。或者,所述CU和DU11200可以实现内部处理功能称为处理单元11200,与图9中的处理模块902对应。可选地,该处理单元11200可以对网络设备进行控制等,可以称为控制器。所述AAU与CU和DU可以是物理上设置在一起,也可以物理上分离设置的。
另外,网络设备不限于图11所示的形态,也可以是其它形态:例如:包括BBU和自适应无线单元(adaptive radio unit,ARU),或者包括BBU和有源天线单元(active antenna unit,AAU);也可以为客户终端设备(customer premises equipment,CPE),还可以为其它形态,本申请不限定。
在一个示例中,所述处理单元11200可以由一个或多个单板构成,多个单板可以共同支持单一接入制式的无线接入网(如LTE网),也可以分别支持不同接入制式的无线接入网(如LTE网,5G网,未来网络或其他网)。所述CU和DU11200还包括存储器11201和处理 器11202。所述存储器11201用以存储必要的指令和数据。所述处理器11202用于控制网络设备进行必要的动作,例如用于控制网络设备执行上述方法实施例中关于第一装置或第二装置的操作流程。所述存储器11201和处理器11202可以服务于一个或多个单板。也就是说,可以每个单板上单独设置存储器和处理器。也可以是多个单板共用相同的存储器和处理器。此外每个单板上还可以设置有必要的电路。
应理解,图11所示的网络设备1100能够实现图2和图4的方法实施例中涉及的第一装置或第二装置功能。网络设备1100中的各个单元的操作和/或功能,分别为了实现本申请方法实施例中由网络设备执行的相应流程。为避免重复,此处适当省略详述描述。图11示例的网络设备的结构仅为一种可能的形态,而不应对本申请实施例构成任何限定。本申请并不排除未来可能出现的其他形态的网络设备结构的可能。
上述CU和DU11200可以用于执行前面方法实施例中描述的由第一装置或第二装置内部实现的动作,而AAU 11100可以用于执行前面方法实施例中描述的第一装置或第二装置的收发动作。具体请见前面方法实施例中的描述,此处不再赘述。
本申请还提供一种计算机程序产品,该计算机程序产品包括:计算机程序代码,当该计算机程序代码在计算机上运行时,使得该计算机执行图2和图4所示实施例中任意一个实施例的方法。
本申请还提供一种计算机可读介质,该计算机可读介质存储有程序代码,当该程序代码在计算机上运行时,使得该计算机执行图2和图4所示实施例中任意一个实施例的方法。
本申请还提供一种通信系统,该通信系统包括第一装置和第二装置。第一装置用于执行上述图2和图4所示的实施例中第一装置执行的部分或全部步骤,第二装置用于执行上述图2和图4所示的实施例中第二装置执行的部分或全部步骤。
可选的,该通信系统还包括第三装置。第三装置用于执行图2和图4所示的实施例中第三装置执行的部分或全部步骤。
本申请实施例还提供一种芯片装置,包括处理器,用于调用该存储器中存储的计算机程度或计算机指令,以使得该处理器执行上述图2和图4所示的实施例的方法。
一种可能的实现方式中,该芯片装置的输入对应上述图2和图4所示的实施例中的接收操作,该芯片装置的输出对应上述图2和图4所示的实施例中的发送操作。
可选的,该处理器通过接口与存储器耦合。
可选的,该芯片装置还包括存储器,该存储器中存储有计算机程度或计算机指令。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制上述图2和图4所示的实施例的方法的程序执行的集成电路。上述任一处提到的存储器可以为只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
所属领域的技术人员可以清楚地了解到,为描述方便和简洁,上述提供的任一种通信装置中相关内容的解释及有益效果均可参考上文提供的对应的方法实施例,此处不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统,装置和方法,可以通 过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (45)

  1. 一种通信方法,其特征在于,所述方法包括:
    第一装置接收来自第二装置的至少一个量化门限值;
    所述第一装置根据所述至少一个量化门限值对所述第一装置的第一模型的相关信息进行量化处理;
    所述第一装置向所述第二装置发送第一信息,所述第一信息用于指示量化处理后的所述第一模型的相关信息。
  2. 根据权利要求1所述的方法,其特征在于,所述第一模型的相关信息包括:所述第一模型的输出参数或更新参数,所述更新参数包括所述第一模型的权重梯度或权重参数。
  3. 根据权利要求1或2所述的方法,其特征在于,在所述第一装置接收来自第二装置的至少一个量化门限值之前,所述方法还包括:
    所述第一装置向所述第二装置发送第二信息;其中,第二信息用于指示所述第一模型的相关信息经过处理得到的信息;或者,
    所述第二信息用于指示所述第一装置对所述第一模型进行第M轮训练得到的相关信息经过处理得到的信息,所述第一模型的相关信息是所述第一装置对所述第一模型进行第Q轮训练得到的相关信息,所述M为大于或等于1且小于所述Q的整数,所述Q为大于1的整数。
  4. 根据权利要求3所述的方法,其特征在于,所述第一模型的相关信息包括所述第一模型的输出参数,所述第一模型的相关信息经过处理得到的信息包括所述第一模型的输出参数的取值的绝对值的平均值;或者,
    所述第一模型的相关信息包括所述第一模型的更新参数,所述第一模型的相关信息经过处理得到的信息包括所述第一模型的更新参数的取值的绝对值的平均值。
  5. 根据权利要求1至4中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一装置接收来自所述第二装置的第三信息,所述第三信息用于指示所述第一模型的全局信息。
  6. 根据权利要求5所述的方法,其特征在于,所述第一模型的全局信息包括所述第一模型的全局输出参数;或者,所述第一模型的全局信息包括所述第一模型的全局更新参数和/或全局学习率。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述第一模型的相关信息包括所述第一模型的N个参数,所述N为大于或等于1的整数;
    所述第一装置根据所述至少一个量化门限值对所述第一装置的第一模型的相关信息进行量化处理,包括:
    所述第一装置根据所述至少一个量化门限值对所述N个参数进行量化处理,得到量化处理后的N个参数;
    所述第一信息包括所述量化处理后的N个参数;所述第一装置向所述第二装置发送第一信息,包括:
    所述第一装置对所述量化处理后的N个参数进行调制得到N个第一信号;
    所述第一装置向所述第二装置发送所述N个第一信号。
  8. 根据权利要求7所述的方法,其特征在于,所述至少一个量化门限值包括第一量化门限值和第二量化门限值;所述第一装置根据所述至少一个量化门限值对所述N个参数进行量化处理,得到量化处理后的N个参数,包括:
    若所述N个参数中的第i个参数大于所述第一量化门限值时,所述第一装置将所述第i个参数量化为第一值,所述i为大于或等于1且小于或等于N的整数;或者,
    若所述N个参数中的第i个参数小于或等于所述第一量化门限值且大于或等于所述第二量化门限值时,所述第一装置将所述第i个参数量化为第二值;或者,
    若所述N个参数中第i个参数小于所述第二量化门限值时,所述第一装置将所述第i个参数量化为第三值。
  9. 根据权利要求7或8所述的方法,其特征在于,所述第一装置对所述量化处理后的N个参数进行调制得到N个第一信号,包括:
    所述第一装置对量化处理后的第i个参数进行调制得到第i个第一信号,该第i个第一信号对应两个序列;
    当所述量化处理后的第i个参数为所述第一值时,所述第一装置发送所述两个序列中的第一个序列的发送功率小于所述第一装置发送所述两个序列中的第二个序列的发送功率;当所述量化处理后的第i个参数为所述第二值时,所述第一装置发送所述两个序列中的第一个序列的发送功率等于所述第一装置发送所述两个序列中的第二个序列的发送功率;当所述量化处理后的第i个参数为所述第三值时,所述第一装置发送所述两个序列中的第一个序列的发送功率大于所述第一装置发送所述两个序列中的第二个序列的发送功率。
  10. 根据权利要求9所述的方法,其特征在于,当所述量化处理后的第i个参数为所述第一值时,所述两个序列中的第一个序列为非全0序列,第二个序列为全0序列;当所述量化处理后的第i个参数为所述第二值时,所述两个序列均为全0序列;当所述量化处理后的第i个参数为所述第三值时,所述两个序列中的第一个序列为全0序列,第二个序列为非全0序列。
  11. 根据权利要求1至10中任一项所述的方法,其特征在于,所述第一装置向所述第二装置发送第一信息,包括:
    所述第一装置向所述第二装置发送L次所述第一信息,所述L为大于或等于1的整数。
  12. 根据权利要求11所述的方法,其特征在于,所述方法还包括:
    所述第一装置接收来自所述第二装置的第一指示信息,所述第一指示信息用于指示所述第一装置向所述第二装置发送所述第一信息的发送次数L。
  13. 根据权利要求1至12中任一项所述的方法,其特征在于,所述第一模型的相关信息包括所述第一模型的量化误差补偿后的N个参数,所述量化误差补偿后的N个参数是所述第一装置根据所述第一装置对所述第一模型进行第Q轮训练得到的N个参数分别对应的量化误差对所述N个参数进行误差补偿得到的,所述N个参数中的第i个参数对应的量化误差是根据所述第一装置对所述第一模型进行第Q-1轮训练且经过量化误差补偿得到的第i个参数确定的,所述i为大于或等于1且小于或等于N的整数,所述N为大于或等于1 的整数,所述Q为大于1的整数。
  14. 根据权利要求1至13中任一项所述的方法,其特征在于,所述第一模型的相关信息包括所述第一模型的经过稀疏处理得到的N个参数;所述第一模型的经过稀疏处理得到的N个参数是所述第一装置根据公共稀疏掩码从所述第一模型的K个参数中选择N个参数,所述第一模型的K个参数是所述第一装置对所述第一模型进行一轮训练得到的参数,所述K为大于或等于N的整数,所述K为大于或等于1的整数,所述N为大于或等于1的整数。
  15. 根据权利要求14所述的方法,其特征在于,所述公共稀疏掩码为比特序列,所述比特序列包括K个比特,所述K个比特与所述K个参数一一对应;当所述K个比特中的一个比特的取值为0时,用于指示所述第一装置不选择所述比特对应的参数;当所述K个比特中的一个比特的取值为1时,用于指示所述第一装置选择所述比特对应的参数。
  16. 根据权利要求14或15所述的方法,其特征在于,所述公共稀疏掩码是所述第一装置根据稀疏比例和伪随机数确定的,所述稀疏比例是所述第二装置向所述第一装置指示的。
  17. 根据权利要求14或15所述的方法,其特征在于,所述方法还包括:
    所述第一装置接收来自所述第二装置的第二指示信息,所述第二指示信息用于指示所述公共稀疏掩码。
  18. 根据权利要求14至17中任一项所述的方法,其特征在于,所述方法还包括:
    所述第一装置向所述第二装置发送第三指示信息,所述第三指示信息用于指示所述K个参数中对应的取值的绝对值最大的N个参数的索引。
  19. 根据权利要求1至18中任一项所述的方法,其特征在于,所述第一模型为神经网络模型,所述第一模型的相关信息包括所述神经网络模型的其中P层的神经元的相关参数,所述P为大于或等于1的整数。
  20. 一种通信方法,其特征在于,所述方法包括:
    第二装置向第一装置发送至少一个量化门限值,所述至少一个量化门限值用于对所述第一装置的第一模型的相关信息进行量化处理;
    所述第二装置接收来自所述第一装置发送的第一信息,所述第一信息用于指示量化处理后的所述第一模型的相关信息。
  21. 根据权利要求20所述的方法,其特征在于,所述第一模型的相关信息包括:所述第一模型的输出参数或更新参数,所述更新参数包括所述第一模型的权重梯度或权重参数。
  22. 根据权利要求20或21所述的方法,其特征在于,所述方法还包括:
    所述第二装置接收来自所述第一装置的第二信息;其中,所述第二信息用于指示所述第一模型的相关信息经过处理得到的信息;或者,
    所述第二信息用于指示所述第一装置对所述第一模型进行第M轮训练并经过处理得到的信息,所述第一模型的相关信息是所述第一装置对所述第一模型进行第Q轮训练得到的相关信息,所述M为大于或等于1且小于所述Q的整数,所述Q为大于1的整数;
    所述第二装置根据所述第二信息确定所述至少一个量化门限值。
  23. 根据权利要求22所述的方法,其特征在于,所述第一模型的相关信息包括所述第一模型的输出参数,所述第一模型的相关信息经过处理得到的信息包括所述第一模型的输 出参数的取值的绝对值的平均值;或者,
    所述第一模型的相关信息包括所述第一模型的更新参数,所述第一模型的相关信息经过处理得到的信息包括所述第一模型的更新参数的取值的绝对值的平均值。
  24. 根据权利要求22或23所述的方法,其特征在于,所述方法还包括:
    所述第二装置接收来自第三装置的第三信息;其中,所述第三信息用于指示所述第三装置的第二模型的相关信息经过处理得到的信息;或者,所述第三信息用于指示所述第三装置对所述第二模型进行第S轮训练并经过处理得到的信息,所述第二模型的相关信息是所述第三装置对所述第二模型进行第R轮训练得到的相关信息,所述S为大于或等于1且为小于所述R的整数,所述R为大于1的整数;
    所述第二装置根据所述第二信息确定所述至少一个量化门限值,包括:
    所述第二装置根据所述第二信息和所述第三信息确定所述至少一个量化门限值。
  25. 根据权利要求20至24中任一项所述的方法,其特征在于,所述方法还包括:
    所述第二装置根据所述第一信息确定所述第一模型的全局信息;
    所述第二装置向所述第一装置发送第四信息,所述第四信息用于指示所述第一模型的全局信息。
  26. 根据权利要求25所述的方法,其特征在于,所述第一模型的全局信息包括所述第一模型的全局输出参数;或者,所述第一模型的全局信息包括所述第一模型的全局更新参数和/或全局学习率。
  27. 根据权利要求25或26所述的方法,其特征在于,所述方法还包括:
    所述第二装置接收来自第三装置的第五信息,所述第五信息用于指示所述第三装置的第二模型的相关信息;
    所述第二装置根据所述第一信息确定所述第一模型的全局信息,包括:
    所述第二装置根据所述第一信息和所述第五信息确定所述第一模型的全局信息。
  28. 根据权利要求27所述的方法,其特征在于,所述第一模型的相关信息包括所述第一模型的N个参数,所述N为大于或等于1的整数;所述第二模型的相关信息包括所述第二模型的N个参数;所述第一信息包括量化处理后的所述第一模型的N个参数;所述第五信息包括量化处理后的所述第二模型的N个参数;
    所述第二装置接收来自所述第一装置发送的第一信息,包括:
    所述第二装置接收来自所述第一装置的N个第一信号,所述N个第一信号承载所述量化处理后的第一模型的N个参数,所述N个第一信号与所述量化处理后的第一模型的N个参数一一对应;
    所述第二装置接收来自第三装置的第五信息,包括:
    所述第二装置接收来自所述第三装置的N个第二信号,所述N个第二信号承载所述量化处理后的第二模型的N个参数,所述N个第二信号与所述量化处理后的第二模型的N个参数一一对应;
    所述第二装置根据所述第一信息和所述第五信息确定所述第一模型的全局信息,包括:
    所述第二装置根据所述N个第一信号和所述N个第二信号确定所述第一模型的全局信 息。
  29. 根据权利要求28所述的方法,其特征在于,所述N个第一信号中第i个第一信号对应第一序列和第二序列,所述N个第二信号中第i个第二信号对应第三序列和第四序列,所述第一装置发送所述第一序列采用的时频资源与所述第三装置发送所述第三序列采用的时频资源相同,所述第一装置发送所述第二序列采用的时频资源与所述第三装置发送所述第四序列采用的时频资源相同;所述第一模型的全局信息包括所述第一模型的N个全局参数;所述i为大于或等于1且小于或等于所述N的整数;
    所述第二装置根据所述N个第一信号和所述N个第二信号确定所述第一模型的全局信息,包括:
    所述第二装置确定所述第二装置接收所述第一序列和所述第三序列的第一信号能量和;
    所述第二装置确定所述第二装置接收所述第二序列和所述第四序列的第二信号能量和;
    所述第二装置根据所述第一信号能量和与所述第二信号能量和确定所述N个全局参数中的第i个全局参数。
  30. 根据权利要求29所述的方法,其特征在于,所述第二装置根据所述第一信号能量和与所述第二信号能量和确定所述N个全局参数中的第i个全局参数,包括:
    若所述第一信号能量和与判决门限值的和小于所述第二信号能量和,则所述第二装置确定所述第i个全局参数的取值为第一值;或者,
    若所述第一信号能量和与所述判决门限值的和大于或等于所述第二信号能量和,且所述第二信号能量和与所述判决门限值的和大于或等于所述第一信号能量和,则所述第二装置确定所述第i个全局参数的取值为第二值;或者,
    若所述第二信号能量和与所述判决门限值的和小于所述第一信号能量和,则所述第二装置确定所述第i个全局参数的取值为第三值。
  31. 根据权利要求20至30中任一项所述的方法,其特征在于,所述方法还包括:
    所述第二装置向所述第一装置发送第一指示信息,所述第一指示信息用于指示所述第一装置向所述第二装置发送所述第一信息的发送次数L,所述L为大于或等于1的整数。
  32. 根据权利要求20至31中任一项所述的方法,其特征在于,所述方法还包括:
    所述第二装置向所述第一装置发送第二指示信息,所述第二指示信息用于指示公共稀疏掩码,所述公共稀疏掩码用于指示所述第一装置上报所述第一装置训练所述第一模型得到的部分参数。
  33. 根据权利要求32所述的方法,其特征在于,所述方法还包括:
    所述第二装置接收来自所述第一装置的第三指示信息,所述第三指示信息用于指示所述第一装置对所述第一模型进行一轮训练得到的K个参数中对应的取值的绝对值最大的N个参数的索引;
    所述第二装置接收来自所述第三装置的第四指示信息,所述第四指示信息用于指示所述第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,所述第二模型的K个参数是所述第三装置对所述第二模型进行一轮训练得到的K个参数;
    所述第二装置根据所述第三指示信息和所述第四指示信息确定所述公共稀疏掩码。
  34. 一种通信方法,其特征在于,所述方法包括:
    第一装置向第二装置发送第一指示信息,所述第一指示信息用于指示所述第一装置的第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引,所述第一模型的K个参数是所述第一装置对所述第一模型进行一轮训练得到的K个参数,所述K为大于或等于所述N的整数,所述K为大于或等于1的整数,所述N为大于或等于1的整数;
    所述第一装置接收来自第二装置的第二指示信息,所述第二指示信息用于指示公共稀疏掩码,所述公共稀疏掩码是所述第二装置根据所述第一指示信息确定的;所述公共稀疏掩码用于指示所述第一装置上报所述第一装置训练所述第一模型得到的部分参数。
  35. 一种通信方法,其特征在于,所述方法包括:
    第二装置接收来自第一装置的第一指示信息,所述第一指示信息用于指示所述第一装置的第一模型的K个参数中对应的取值的绝对值最大的N个参数的索引,所述第一模型的K个参数是所述第一装置对所述第一模型进行一轮训练得到的K个参数,所述K为大于或等于所述N的整数,所述K为大于或等于1的整数,所述N为大于或等于1的整数;
    所述第二装置根据所述第一指示信息确定公共稀疏掩码,所述公共稀疏掩码用于指示所述第一装置上报所述第一装置训练所述第一模型得到的部分参数;
    所述第二装置向所述第一装置发送第二指示信息,所述第二指示信息用于指示所述公共稀疏掩码。
  36. 根据权利要求35所述的方法,其特征在于,所述方法还包括:
    所述第二装置接收来自第三装置的第三指示信息,所述第三指示信息用于指示所述第三装置的第二模型的K个参数中对应的取值的绝对值最大的N个参数的索引,所述第二模型的K个参数是所述第三装置对所述第二模型进行一轮训练得到的K个参数;
    所述第二装置根据所述第一指示信息确定公共稀疏掩码,包括:
    所述第二装置根据所述第一指示信息和所述第三指示信息确定所述公共稀疏掩码。
  37. 一种第一装置,其特征在于,所述第一装置包括收发模块和处理模块;所述收发模块用于执行如权利要求1至19中任一项所述的收发操作,所述处理模块用于执行如权利要求1至19中任一项所述的处理操作。
  38. 一种第一装置,其特征在于,所述第一装置包括收发模块;所述收发模块用于执行如权利要求34所述的收发操作。
  39. 一种第二装置,其特征在于,所述第二装置包括收发模块,所述收发模块用于执行如权利要求20至33中任一项所述的收发操作。
  40. 根据权利要求39所述的第二装置,其特征在于,所述第二装置还包括处理模块;所述处理模块用于执行如权利要求20至33中任一项所述的处理操作。
  41. 一种第二装置,其特征在于,所述第二装置包括收发模块和处理模块,所述收发模块用于执行如权利要求35或36所述的收发操作,所述处理模块用于执行如权利要求35或36所述的处理操作。
  42. 一种装置,其特征在于,所述装置包括处理器;所述处理器用于执行存储器中的计算机程序或计算机指令以执行如权利要求1至19中任一项所述的方法;或者,所述处理器 用于执行所述存储器中的计算机程序或计算机指令以执行如权利要求20至33中任一项所述的方法;或者,所述处理器用于执行所述存储器中的计算机程序或计算机指令以执行如权利要求34所述的方法;或者,所述处理器用于执行所述存储器中的计算机程序或计算机指令以执行如权利要求35或36所述的方法。
  43. 根据权利要求42所述的装置,其特征在于,所述装置还包括所述存储器。
  44. 一种计算机可读存储介质,其特征在于,其上存储有计算机程序,所述计算机程序被装置执行时,使得所述装置执行如权利要求1至19中任一项所述的方法,或者,使得所述装置执行如权利要求20至33中任一项所述的方法,或者,使得所述装置执行如权利要求34所述的方法,或者,使得所述装置执行如权利要求35或36所述的方法。
  45. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1至19中任一项所述的方法,或者,使得所述计算机执行如权利要求20至33中任一项所述的方法,或者,使得所述计算机执行如权利要求34所述的方法,或者,使得所述计算机执行如权利要求35或36所述的方法。
PCT/CN2022/119814 2022-09-20 2022-09-20 通信方法以及相关装置 WO2024060002A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/119814 WO2024060002A1 (zh) 2022-09-20 2022-09-20 通信方法以及相关装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/119814 WO2024060002A1 (zh) 2022-09-20 2022-09-20 通信方法以及相关装置

Publications (1)

Publication Number Publication Date
WO2024060002A1 true WO2024060002A1 (zh) 2024-03-28

Family

ID=90453574

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/119814 WO2024060002A1 (zh) 2022-09-20 2022-09-20 通信方法以及相关装置

Country Status (1)

Country Link
WO (1) WO2024060002A1 (zh)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021053381A1 (zh) * 2019-09-20 2021-03-25 字节跳动有限公司 神经网络模型的压缩与加速方法、数据处理方法及装置
CN113887746A (zh) * 2021-10-21 2022-01-04 新智我来网络科技有限公司 基于联合学习的降低通信压力的方法及装置
CN114301573A (zh) * 2021-11-24 2022-04-08 超讯通信股份有限公司 联邦学习模型参数传输方法及系统
CN114692844A (zh) * 2020-12-25 2022-07-01 中科寒武纪科技股份有限公司 数据处理装置、数据处理方法及相关产品

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021053381A1 (zh) * 2019-09-20 2021-03-25 字节跳动有限公司 神经网络模型的压缩与加速方法、数据处理方法及装置
CN114692844A (zh) * 2020-12-25 2022-07-01 中科寒武纪科技股份有限公司 数据处理装置、数据处理方法及相关产品
CN113887746A (zh) * 2021-10-21 2022-01-04 新智我来网络科技有限公司 基于联合学习的降低通信压力的方法及装置
CN114301573A (zh) * 2021-11-24 2022-04-08 超讯通信股份有限公司 联邦学习模型参数传输方法及系统

Similar Documents

Publication Publication Date Title
WO2019095929A1 (zh) 通信方法和通信设备
JP2023126240A (ja) 指示情報送信方法、指示情報受信方法、デバイス、およびシステム
CN116347566B (zh) Ppdu的上行参数指示方法及相关装置
WO2020191587A1 (zh) 同步信号块传输方法、装置及存储介质
JP2023549790A (ja) Ppdu内の空間再使用パラメータフィールドを決定する方法、および関連装置
WO2018228177A1 (zh) 一种调度请求的传输方法及相关设备
WO2024060002A1 (zh) 通信方法以及相关装置
TWI792824B (zh) 時間資源分配和接收方法及相關裝置
WO2022199346A1 (zh) 指示方法及相关产品
JP2023553662A (ja) 空間再利用パラメータを指示し空間再利用パラメータフィールドを決定する方法および装置
WO2022247739A1 (zh) 一种数据传输方法和相关装置
WO2023116746A1 (zh) 一种波束确定方法及其装置
WO2024131889A1 (zh) 一种通信方法、装置、芯片及模组设备
WO2023197950A1 (zh) 一种通信方法及相关装置
WO2024045193A1 (zh) 波束上报方法、装置、终端、网络设备和介质
WO2024103271A1 (zh) 一种通信方法以及相关装置
WO2023071918A1 (zh) 传输方法及装置
WO2023160254A1 (zh) 一种通信方法和装置
WO2023208082A1 (zh) 一种通信方法、装置、芯片及模组设备
WO2023206270A1 (zh) 物理上行共享信道配置方法、装置、通信设备和存储介质
WO2023232004A1 (zh) 跨链路干扰测量方法及装置、计算机可读存储介质
WO2022228478A1 (zh) 信息传输方法、通信装置、计算机可读存储介质和芯片
WO2024055939A1 (zh) 一种测量资源的配置方法及装置
WO2023202629A1 (zh) 一种时隙格式指示方法及通信装置
WO2024061093A1 (zh) 一种上行预编码的指示方法及通信装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22959014

Country of ref document: EP

Kind code of ref document: A1