CN114793453A

CN114793453A - Training method, training device and storage medium

Info

Publication number: CN114793453A
Application number: CN202080003605.XA
Authority: CN
Inventors: 牟勤; 洪伟; 赵中原; 王屹东; 熊可欣
Original assignee: Beijing University of Posts and Telecommunications; Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing University of Posts and Telecommunications; Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2020-11-23
Filing date: 2020-11-23
Publication date: 2022-07-26
Also published as: WO2022104799A1

Abstract

The disclosure relates to a training method, a training apparatus, and a storage medium. The training method comprises the following steps: training a first training model in response to receiving a model training request, wherein the model training request comprises model compression parameters; and obtaining a first compression model of the first training model based on the first training model and the model compression parameters. The compression model has the same effect as the training model, so that the signaling overhead of the transmission model is reduced, the precision and the reliability of the model can be ensured, and the safety of user information is further ensured.

Description

Training method, training device and storage medium

Technical Field

The present disclosure relates to the field of wireless communications technologies, and in particular, to a training method, a training apparatus, and a storage medium.

Background

The communication network has the characteristics of ultra-high speed, ultra-low time delay, ultra-high reliability and ultra-multi-connection in order to meet the requirements of multi-service scenes, and the service scenes, the corresponding requirements and the characteristics of the communication network bring unprecedented challenges to the deployment, operation and maintenance of the communication network.

In the related technology, artificial intelligence is introduced to improve the resource utilization rate of a communication network, the terminal service experience, the automation and intelligent control and management of the communication network, and a model obtained through deep learning of the artificial intelligence can have better performance. However, the high amount of memory space and the consumption of computing resources make it difficult to effectively apply the method to each hardware platform, and the method has the disadvantages of high communication overhead, low precision and low security.

Disclosure of Invention

To overcome the problems in the related art, the present disclosure provides a training method, a training apparatus, and a storage medium.

According to a first aspect of the embodiments of the present disclosure, there is provided a training method applied to a first node, the method including:

training a first training model in response to receiving a model training request, wherein the model training request comprises model compression parameters; and obtaining a first compression model of the first training model based on the first training model and the model compression parameters.

In one embodiment, the model compression parameters include a plurality of model compression options;

obtaining a first compression model of the first training model based on the first training model and the model compression parameters, including:

determining a first model compression option in the plurality of model compression options, and compressing the first training model based on the first model compression option to obtain a second compression model; determining a first loss function according to the output of the first training model, the output of the second compression model and a sample parameter set used for training the first training model; and updating the second compression model parameters based on the first loss function to obtain the first compression model.

In one embodiment, the determining a first loss function from the output of the first training model, the output of the second compression model, and the set of sample parameters used to train the first training model comprises:

determining a first cross entropy between the output of the second compression model and the set of sample parameters, and determining a first relative entropy divergence between the output of the second compression model and the output of the first training model; determining the first loss function based on the first cross entropy and a first relative entropy divergence.

In one embodiment, the method further comprises:

determining a second loss function for updating the first training model parameters according to the output of the first training model, the output of the second compression model, and the set of sample parameters used to train the first training model.

In one embodiment, determining a second loss function for updating parameters of the first training model based on an output of the first training model, an output of the second compression model, and a set of sample parameters used to train the first training model comprises:

determining a second cross entropy between the output of the first training model and the set of sample parameters, and determining a second relative entropy divergence between the output of the first training model and the output of the second compression model; determining the second loss function based on the second cross entropy and a second relative entropy divergence.

In one embodiment, the model compression parameters include model training patterns including a single training node pattern for training a single of the first training models and a multiple training node pattern for training a plurality of the first training models;

the number of first training models is determined based on the model training patterns.

In one embodiment, the method further comprises:

sending a second indication message, the second indication message comprising a corresponding number of first compression models to the model training patterns.

In one embodiment, the method further comprises:

receiving a third indication message, the third indication message comprising an indication to determine a training model.

In one embodiment, the model training patterns comprise a multi-training node pattern, the method further comprising:

receiving a fourth indication message; the fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first training model based on the number of the first compression models; based on the third compression model, the model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.

In one embodiment, the method further comprises:

receiving a fifth indication message, wherein the fifth indication message is used for indicating that the training of the first compression model is finished.

According to a second aspect of the embodiments of the present disclosure, there is provided a training method applied to a second node, the method including:

sending a model training request; the model training request comprises model compression parameters, the model compression parameters are used for compressing a first training model to obtain a first compression model, and the first training model is obtained based on the model training request.

In one embodiment, the method further comprises:

receiving a second indication message, wherein the second indication message comprises a number of first compression models corresponding to the model training mode.

In one embodiment, the method further comprises:

sending a third indication message, the third indication message comprising an indication to determine a training model.

sending a fourth indication message; the fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first compression model based on the number of the first training models.

In one embodiment, the method further comprises:

and sending a fifth indication message, wherein the fifth indication message is used for indicating the end of training the first compression model.

In one embodiment, the method further comprises:

and receiving a subscription requirement, and sending a model training request based on the subscription requirement.

According to a third aspect of the embodiments of the present disclosure, there is provided a training apparatus applied to a first node, the apparatus including:

the model training and compression module is used for training a first training model in response to receiving a model training request, wherein the model training request comprises model compression parameters; and obtaining a first compression model of the first training model based on the first training model and the model compression parameters.

the model training and compressing module is used for determining a first model compression option in the plurality of model compression options and compressing the first training model based on the first model compression option to obtain a second compression model; determining a first loss function based on an output of the first training model, an output of the second compression model, and a set of sample parameters used to train the first training model; and updating the second compression model parameters based on the first loss function to obtain the first compression model.

In one embodiment, the apparatus further comprises a data processing and storage module;

the data processing and storage module is to determine a first cross entropy between an output of the second compression model and the set of sample parameters, and to determine a first relative entropy divergence between the output of the second compression model and an output of the first training model; determining the first loss function based on the first cross entropy and a first relative entropy divergence.

In one embodiment, the data processing and storage module is further configured to determine a second loss function for updating the first training model parameters based on the output of the first training model, the output of the second compression model, and the set of sample parameters used to train the first training model.

In one embodiment, the data processing and storage module is further configured to determine a second cross entropy between the output of the first training model and the set of sample parameters, and determine a second relative entropy divergence between the output of the first training model and the output of the second compression model; determining the second loss function based on the second cross entropy and a second relative entropy divergence.

In one embodiment, the model compression parameters include model training patterns including a single training node pattern for training a single one of the first training models and a multi-training node pattern for training a plurality of the first training models; the number of first training models is determined based on the model training patterns.

In one embodiment, the device network communication module;

the first network communication module is configured to send a second indication message, where the second indication message includes a number of first compression models corresponding to the number of model training patterns.

In one embodiment, the first network communication module is further configured to receive a third indication message, where the third indication message includes an indication to determine a training model.

In one embodiment, the first network communication module is further configured to receive a fourth indication message; the fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first training model based on the number of the first compression models; based on the third compression model, the model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.

In one embodiment, the first network communication module is further configured to receive a fifth indication message, where the fifth indication message is used to indicate that training of the first compression model is finished.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a training apparatus applied to a second node, the apparatus including:

the second network communication module is used for sending a model training request; the model training request comprises model compression parameters, the model compression parameters are used for compressing a first training model to obtain a first compression model, and the first training model is obtained based on the model training request.

In one embodiment, the second network communication module is further configured to receive a second indication message, the second indication message including a number of the first compression models corresponding to the model training pattern.

In one embodiment, the second network communication module is further configured to send a third indication message, where the third indication message includes an indication to determine a training model.

In one embodiment, the model training mode comprises a multi-training-node mode, and the network communication module is further configured to send a fourth indication message; the fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first compression model based on the number of the first training models.

In one embodiment, the second network communication module is further configured to send a fifth indication message, where the fifth indication message is used to indicate that training of the first compression model is finished.

In one embodiment, the apparatus further comprises a traffic management module;

and the business management module is used for receiving subscription requirements and sending model training requests based on the subscription requirements.

According to a fifth aspect of embodiments of the present disclosure, there is provided a training apparatus, comprising:

a processor; a memory for storing processor-executable instructions; wherein the processor is configured to: performing the training method as described in the first aspect or any one of the embodiments of the first aspect, or performing the training method as described in the second aspect or any one of the embodiments of the second aspect.

According to a sixth aspect of embodiments of the present disclosure, there is provided a non-transitory computer-readable storage medium, wherein instructions of the storage medium, when executed by a processor of a mobile terminal, enable the mobile terminal to perform a training method as described in the first aspect or any one of the implementations of the first aspect, or perform a training method as described in the second aspect or any one of the implementations of the second aspect.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: in the method, the trained model is compressed, and the parameters of the compressed model are updated, so that the compressed model has the same effect as the trained model, the signaling overhead of the transmission model is reduced, the precision and the reliability of the model can be ensured, and the safety of user information is further ensured.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a schematic system architecture diagram of a compression method according to the present disclosure.

FIG. 2 is a flow chart illustrating a training method according to an exemplary embodiment.

FIG. 3 is a flow chart illustrating another training method in accordance with an exemplary embodiment.

FIG. 4 is a flowchart illustrating yet another training method according to an exemplary embodiment.

FIG. 5 is a flowchart illustrating yet another training method according to an exemplary embodiment.

Fig. 6 is a flowchart of an embodiment of determining a first compression model by a single training node pattern in a training method according to the present disclosure.

Fig. 7 is a flowchart of an embodiment of determining a first compression model by a multi-training-node mode in a training method according to the present disclosure.

Fig. 8 is a schematic diagram of a protocol and an interface of a model training and compression decision part in a training method provided by the present disclosure.

Fig. 9 is a schematic diagram of a protocol and an interface of model training and compression parts in a single training node mode in a training method provided by the present disclosure.

Fig. 10 is a schematic diagram of protocols and interfaces of model training and compression parts in a multi-training node mode in a training method provided by the present disclosure.

Fig. 11 is a schematic diagram of a protocol and an interface of a wireless data transmission part in a training method provided by the present disclosure.

FIG. 12 is a block diagram illustrating a training apparatus according to an exemplary embodiment.

FIG. 13 is a block diagram illustrating another exercise device according to an example embodiment.

FIG. 14 is a block diagram illustrating an apparatus for training in accordance with an example embodiment.

FIG. 15 is a block diagram illustrating another apparatus for training in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The communication network has the characteristics of ultra-high speed, ultra-low time delay, ultra-high reliability and ultra-multi connection in order to meet the requirements of multi-service scenes, and the service scenes, the corresponding requirements and the characteristics of the communication network bring unprecedented challenges to the deployment, operation and maintenance of the communication network.

With the breakthrough of artificial intelligence technology, especially in the aspects of richness of deep learning algorithm, improvement of hardware computing capacity, introduction of massive data in a new generation of communication network and the like, powerful support is provided for the intellectualization of the new generation of network. And the resource utilization rate of the communication network is further improved by using artificial intelligence, the terminal service experience of the communication network is improved, and the automation and intelligent control and management of the communication network are realized.

In the related art, the implementation process of training a model by using a deep learning algorithm includes: the model request node determines a model structure and a model training mode according to model/analysis subscription requirements, wherein the model training mode comprises a single training node mode and a multi-training node mode. And the model request node sends the model structure and the model training mode to the model training node, and the model training node independently carries out model training work or participates in multi-training-node cooperative model training according to the model training mode. After model training is completed, the model training nodes send the models to the model request nodes, and the model request nodes conduct federal averaging on the models sent by the model request nodes in the multi-training-node mode to obtain the global model. And the model request node checks whether the obtained model meets the model/analysis subscription requirement, and if so, the model request node sends the obtained model to the model/analysis party. If not, repeating the model training process until the model obtained by the model request node meets the model/analysis subscription requirement. Therefore, the following disadvantages are included in the related art:

(1) the data volume of the model is relatively large, and particularly under a multi-training-node mode, the model needs to be transmitted for many times between the model training node and the model request node, so that the communication overhead is greatly increased.

(2) The model data volume transmitted between the model training node and the model request node is large, and the condition of wireless resource shortage can be aggravated, so that the probability of data transmission errors is increased, the reliability of the model received by the model request node is reduced, and the model precision cannot be guaranteed.

(3) The model obtained by training the model training node by adopting the local data is sent to the model request node without any processing, so that the risk that the relevant information of the terminal and the network data is reversely deduced after the model is maliciously intercepted in the network is increased, and the safety of the terminal privacy data cannot be ensured.

Based on the defects in the related art, the present disclosure provides a training method to solve the problems of high communication overhead, insufficient model precision, security of terminal privacy data, and the like. The training method provided by the disclosure determines the model structure and the model training mode according to the network service requirements (such as the subscription requirements of the model), and makes various model compression options by fully considering factors such as local available computing power, communication conditions, training sample characteristics and the like of model training nodes, so as to reduce unnecessary communication overhead and improve the utilization rate of wireless network resources, and applies deep learning to network intelligent work in a more efficient and safe manner.

Fig. 1 is a schematic diagram of a system architecture of a training method provided in the present disclosure. As shown in fig. 1, the system includes a core network portion and a radio access network portion. A terminal (User) is accessed into a base station through a wireless channel, the base stations are connected through an Xn interface, the base station is accessed into a terminal Port Function (UPF) Network element of a core Network through an N3 interface, the UPF Network element is accessed into a Session Management Function (SMF) Network element through an N4 interface, and the SMF Network element is accessed into a bus structure of the core Network and is connected with other Network functions (Network functions, NF) of the core Network.

It is understood that the communication system between the network device and the terminal shown in fig. 1 is only a schematic illustration, and other network devices, such as a wireless relay device and a wireless backhaul device, may also be included in the wireless communication system, which is not shown in fig. 1. The number of network devices and the number of terminals included in the wireless communication system are not limited in the embodiments of the present disclosure.

It is further understood that the wireless communication system of the embodiment of the present disclosure is a network providing a wireless communication function. The wireless communication system may employ different communication technologies, such as Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), Time Division Multiple Access (TDMA), Frequency Division Multiple Access (FDMA), Orthogonal Frequency Division Multiple Access (OFDMA), single Carrier FDMA (SC-FDMA), Carrier Sense Multiple Access/Collision Avoidance (Carrier Multiple Access with collagen Access). Networks can be classified into 2G (english: generation) networks, 3G networks, 4G networks, or future evolution networks, such as 5G networks, according to factors such as capacity, rate, and delay of different networks, and a 5G network can also be referred to as a New Radio Network (NR). For ease of description, this disclosure will sometimes simply refer to a wireless communication network as a network.

Further, the network devices referred to in this disclosure may also be referred to as radio access network devices. The radio access network device may be: a base station, an evolved node B (enb), a home base station, an Access Point (AP), a wireless relay node, a wireless backhaul node, a Transmission Point (TP), a Transmission and Reception Point (TRP) in a wireless fidelity (WIFI) system, and the like, and may also be a gNB in an NR system, or may also be a component or a part of a device constituting the base station. When a vehicle networking (V2X) communication system, the network device may also be a vehicle-mounted device. It should be understood that, in the embodiments of the present disclosure, the specific technology and the specific device form adopted by the network device are not limited.

Further, the Terminal referred to in this disclosure may also be referred to as a Terminal device, a User Equipment (UE), a Mobile Station (MS), a Mobile Terminal (MT), and the like, and is a device that provides voice and/or data connectivity to a User, for example, the Terminal may be a handheld device having a wireless connection function, a vehicle-mounted device, and the like. Currently, some examples of terminals are: a smart Phone (Mobile Phone), a Pocket Computer (PPC), a palm top Computer, a Personal Digital Assistant (PDA), a notebook Computer, a tablet Computer, a wearable device, or a vehicle-mounted device, etc. Furthermore, when being a communication system of the internet of vehicles (V2X), the terminal device may also be a vehicle-mounted device. It should be understood that the embodiments of the present disclosure do not limit the specific technologies and the specific device forms adopted by the terminal.

FIG. 2 is a flow chart illustrating a training method according to an exemplary embodiment. As shown in fig. 2, the training method is used in the first node and includes the following steps.

In step S11, in response to receiving the model training request, a first training model is trained.

In an embodiment of the present disclosure, the first node is a model training node, and the model training node is referred to as the first node for convenience of description; the second node is a model request node, and similarly, this disclosure refers to the model request node as the second node for ease of description. Wherein, the model training request comprises model compression parameters.

The model compression parameters include at least one of the following:

the method comprises a model training structure, a plurality of model compression options and a model training mode.

In embodiments of the present disclosure, the model compression option is determined based on model subscription requirements received by the second node (e.g., the model requesting node). The second node determines to send a model training request according to the received model subscription requirement, after the first node (for example, the model training node) receives the model training request sent by the second node, the first node sends third indication information, responds to the model training request information, determines to train the first training model based on the local sample parameter set and the model training structure, and determines related parameters required by model compression. The response model training request information sent by the first node further includes one or more of local computing capability of the first node, communication conditions, characteristics of a training sample parameter set, and the like.

In step S12, a first compression model of the first training model is obtained based on the first training model and the model compression parameters.

In the embodiment of the disclosure, the first node compresses the first training model based on the model compression option in the model compression parameters and the relevant parameters required by the model compression. The relevant parameters required by model compression are determined by the first node based on the model compression parameters sent by the second node, the local computing capacity and other parameters, and the model compression options comprise model precision and model parameter data volume. The model compression parameters comprise a plurality of model compression options, and the plurality of model compression options are determined by the second node based on one or more of local computing capacity, communication conditions, training sample parameter set characteristics and the like reported by the plurality of first nodes.

FIG. 3 is a flow chart illustrating a training method according to an exemplary embodiment. As shown in fig. 3, obtaining a compression model of the first training model based on the first training model and the model compression parameters includes the following steps.

In step S21, a first model compression option is determined among the plurality of model compression options, and the first training model is compressed based on the first model compression option to obtain a second compression model.

In the embodiment of the disclosure, the first node determines a first model compression option for model compression according to one or more of local computing power, communication conditions and training samples in the plurality of model compression options. And determining a matrix representing the accuracy contribution value of each channel in the network according to the model accuracy included in the first model compression option and the relevant parameters required by the model compression determined in the training process, and identifying the matrix by using a symbol g. Compressing the first training model according to the requirement of the model parameter data volume in the model compression option to obtain a second compression model, and using a symbol theta ^S A second compression model is identified. The compression of the first training model by using the matrix g and the first model compression option may adopt the following implementation:

the first node takes the data quantity of the model parameters as a constraint condition, and designs a pruning matrix X to reserve a channel which greatly contributes to the precision in the model. And the first node takes the element sum of each column of the pruning matrix X as an unknown item, reserves the channel corresponding to the item with the largest column element in the matrix according to the size of the element in each column of the matrix g, and prunes other channels. After a pruning matrix X is solved, the X is adopted to prune theta to obtain a second compression modelθ ^S 。

In the embodiment of the disclosure, the first node selects a proper model compression option, compresses the training model according to the model compression option and then transmits the compressed training model to the second node, and reduces the data volume of the model as much as possible under the condition of keeping most precision of the deep learning model.

In step S22, a first loss function is determined based on the output of the first training model, the output of the second compression model, and the set of sample parameters used to train the first training model.

In an embodiment of the disclosure, the sample training parameter set further comprises a sample verification parameter set and determines at least one data pair of an input and an output of the sample verification parameter set. The first node inputs the input and output data pairs of the sample verification parameter set into the first training model and the second compression model, and determines the output of the first training model and the output of the second compression model, and the output of the corresponding sample verification parameter set, wherein the output is the real value corresponding to the model input.

Further, a first cross entropy between the output of the second compression model and the set of sample parameters (i.e. the input and output data of the set of sample validation parameters versus the true value) is determined, a first relative entropy divergence between the output of the second compression model and the output of the first training model is determined, and the sum of the first cross entropy and the first relative entropy divergence is determined as a loss function of the second compression model. The present disclosure determines a loss function of the second compression model as the first loss function for ease of discrimination. And determining a plurality of first loss functions based on a plurality of input and output data pairs in the sample parameter set according to the above embodiment, determining an average value of the plurality of first loss functions, and updating the parameters of the second compression model by a gradient descent method according to the average value of the plurality of first loss functions to obtain the first compression model.

Wherein, the first loss function (i.e. the loss function of the second compression model) is expressed by the following formula:

in the formula (I), the compound is shown in the specification,

a loss function for the second compression model;

a first cross entropy between the second compression model output value and the real value of the input and output data pairs of the sample verification parameter set; d _KL (p ^S ||p ₂ ) A first relative entropy divergence for the first training model output value and the second compression model output value; p is a radical of ^S A second compression model output value; p is a radical of formula ₂ Values are output for the first training model.

In the embodiment of the present disclosure, it should be noted that the first training model is a first training model in which parameters of the first training model are updated based on a loss function of the first training model, in other words, in the embodiment of the present disclosure, a loss function of the first training model is preferentially determined, and a loss function (i.e., a first loss function) of the second compression model is determined after parameters of the first training model are updated based on a loss function of the first training model. The present disclosure determines a loss function of the first training model as the second loss function for ease of discrimination. As described above, the sample training parameter set further includes a sample verification parameter set, and the first node determines at least one data pair of an input and an output of the sample verification parameter set. The first node inputs the input and output data pairs of the sample verification parameter set into the first training model and the second compression model, and determines the output of the first training model and the output of the second compression model, and the output of the corresponding sample verification parameter set, wherein the output is the real value corresponding to the model input.

Further, a second cross entropy between the output of the first training model and the set of sample parameters (i.e., the input and output data of the set of sample validation parameters versus the true values) is determined, a second relative entropy divergence between the output of the first training model and the output of the second compression model is determined, and a sum of the second cross entropy and the second relative divergence is determined as a second loss function. According to the embodiment, a plurality of second loss functions are determined based on a plurality of input and output data pairs in the sample parameter set, an average value of the plurality of second loss functions is determined, and the first training model parameter is updated by a gradient descent method according to the average value of the plurality of second loss functions, so that an updated first training model is obtained.

Wherein, the second loss function (i.e. the loss function of the first training model) is expressed by the following formula:

L _θ ＝L _C +D _KL (p ₁ ||p ^S )

in the formula, L _θ A loss function for the first training model; l is a radical of an alcohol _C A second cross entropy between the first training model output value and the real value of the input and output data pairs of the sample validation parameter set; d _KL (p ₁ ||p ^S ) Relative entropy divergence for the first training model output value and the second compression model output value; p is a radical of formula ^S A second compression model output value; p is a radical of formula ₁ Values are output for the first training model.

In embodiments of the present disclosure, the model training patterns in the model compression parameters include a single training node pattern for training a single first training model and a multi-training node pattern for training a plurality of first training models.

The first node determines the number of first training models to be trained according to model training modes included in the model training parameters, and if the model training mode is single training node training, determines that one first training model is trained based on a single first node, wherein the training mode is as described above. And if the model training mode is multi-training-node training, determining to train a plurality of first training models based on the plurality of first nodes, setting different sequence marks for the plurality of first nodes, and training the plurality of first training models. The mth model training node (i.e., mth first node) will be described as an example. A multi-training node model is illustrated.

In the embodiment of the present disclosure, the first node determines a first model compression option for model compression according to one or more of local computing power, communication conditions and training samples in the plurality of model compression options. And determining a matrix representing the accuracy contribution value of each channel in the network according to the model accuracy included in the first model compression option and the relevant parameters required by the model compression determined in the training process, and identifying the matrix by using a symbol g. Compressing the first training model according to the requirement of the model parameter data quantity in the model compression option to obtain a second compression model, and using the symbol theta ^S A second compression model is identified. The compression of the first training model by using the matrix g and the first model compression option may adopt the following implementation modes:

the first node takes the data quantity of the model parameters as a constraint condition to design a pruning matrix X so as to reserve a channel which contributes greatly to the precision in the model. And the first node takes the element sum of each column of the pruning matrix X as an unknown item, reserves a channel corresponding to the item with the largest column element in the matrix g according to the size of the element in each column of the matrix g, and prunes other channels. After the pruning matrix X is obtained, X is adopted to theta _m Pruning to obtain the mth second compression model

In an embodiment of the disclosure, the sample training parameter set further comprises a sample verification parameter set, at least one data pair determining an input and an output of the sample verification parameter set. And the first node inputs the input and output data pairs of the sample verification parameter set into the mth first training model and the mth second compression model, and determines the output of the mth first training model and the output of the mth second compression model and the output of the corresponding sample verification parameter set, wherein the output is the true value corresponding to the model input.

Further, an mth first cross entropy between an output of the mth second compression model and the set of sample parameters (i.e., the input and output data of the set of sample verification parameters versus the true values) is determined, an mth first relative entropy divergence between the output of the mth second compression model and an output of the mth first training model is determined, and a sum of the mth first cross entropy and the mth first relative entropy divergence is determined as a loss function of the mth second compression model. The present disclosure determines the loss function of the mth second compression model as the mth first loss function for easy discrimination. And determining a plurality of mth first loss functions based on the plurality of input and output data pairs in the sample parameter set according to the embodiment, determining an average value of the plurality of mth first loss functions, and updating the mth second compression model parameter by adopting a gradient descent method according to the average value of the plurality of mth first loss functions to obtain the mth first compression model.

Wherein, the mth first loss function (i.e. the loss function of the second compression model) is expressed by the following formula:

in the formula (I), the compound is shown in the specification,

a loss function for the mth second compression model;

a first cross entropy between the mth second compression model output value and the real value of the input and output data pairs of the sample verification parameter set;

for the mth first training model output value and the mth second compression modelA first relative entropy divergence of the output value;

is the mth second compression model output value; p is a radical of formula _m And outputting a value for the mth first training model.

In the embodiment of the present disclosure, it should be noted that the mth first training model is the mth first training model obtained by updating the mth first training model parameter based on the loss function of the mth first training model, in other words, in the embodiment of the present disclosure, the loss function of the mth first training model is preferentially determined, and the loss function of the mth second compression model (i.e., the mth first loss function) is determined after the first training model parameter is updated based on the loss function of the mth first training model. The present disclosure determines a loss function of the mth first training model as the mth second loss function for easy discrimination. As described above, the sample training parameter set further includes a sample verification parameter set, and the first node determines at least one data pair of an input and an output of the sample verification parameter set. And the first node inputs the input and output data pairs of the sample verification parameter set into the mth first training model and the mth second compression model, and determines the output of the mth first training model and the output of the mth second compression model and the output of the corresponding sample verification parameter set, wherein the output is the true value corresponding to the model input.

Further, a second cross entropy between the output of the mth first training model and the set of sample parameters (i.e., the input and output data of the set of sample validation parameters versus the true values) is determined, a second relative entropy divergence between the output of the mth first training model and the output of the mth second compression model is determined, and a sum of the mth second cross entropy and the mth second relative divergence is determined as an mth second loss function. According to the embodiment, a plurality of mth second loss functions are determined based on a plurality of input and output data pairs in the sample parameter set, an average value of the plurality of mth second loss functions is determined, and the mth first training model parameters are updated by a gradient descent method according to the average value of the plurality of mth second loss functions, so that an updated mth first training model is obtained.

Wherein, the mth second loss function (i.e. the loss function of the first training model) is expressed by the following formula:

in the formula (I), the compound is shown in the specification,

a loss function for the mth first training model; l is _C A second cross entropy between the first training model output value and the real value of the input and output data pairs of the sample validation parameter set;

the relative entropy divergence of the mth first training model output value and the second compression model output value;

is the mth second compression model output value; p is a radical of _m And outputting the value for the mth first training model.

Of course, in the embodiment of the present disclosure, the first compression model may also be determined by other model compression methods, such as model sparsification and parameter quantization, and the present disclosure is not specifically limited herein.

FIG. 4 is a flow chart illustrating a training method according to an exemplary embodiment. As shown in fig. 4, obtaining a compression model of the first training model based on the first training model and the model compression parameters includes the following steps.

In step S31, a fourth indication message is received.

In an embodiment of the disclosure, the second node determines the first compression model from the received second indication message. If the first compression model is one, determining whether the first compression model meets the model subscription requirement or meets the analysis subscription requirement. And if the number of the first compression models is multiple, carrying out federal averaging on the multiple first compression models to obtain a third compression model after federal averaging or a global model, and determining whether the third compression model after federal averaging meets the model subscription requirement or whether the third compression model after federal averaging meets the analysis subscription requirement. In one embodiment, if one first compression model or a third compression model after the federal average of a plurality of first compression models does not meet the subscription requirement, a fourth indication message is sent, wherein the fourth indication message is used for indicating the third compression model. The first node receives the fourth indication message.

In step S32, the model compression parameters are re-determined based on the third compression model, and the first compression model is updated based on the re-determined model compression parameters.

In the embodiment of the present disclosure, the first node re-determines the model compression parameters according to the third compression model indicated by the fourth indication message, updates the first compression model based on the re-determined model compression parameters, determines the loss function of the first compression model, and re-updates the parameters of the first compression model until the second node determines the compression model satisfying the model subscription requirement.

In an exemplary embodiment of the present disclosure, in another implementation manner, when the second node determines that the first compression model meets the model subscription requirement, it determines to send a fifth indication message, where the fifth indication message is used to indicate that training of the first compression model is finished. And after receiving the fifth indication message, the first node determines that the training of the first compression model is finished, and the second node sends the determined compression model to the model subscriber.

In the embodiment of the present disclosure, after obtaining one or more first compression models corresponding to the model training mode, the first node sends a second indication message to the second node through a wireless channel. Wherein the second indication message includes a number of the first compression models corresponding to the model training pattern.

The embodiment of the invention solves the problem of overlarge data volume of the deep learning model, effectively relieves the condition of shortage of wireless resources, and reduces the problem of data transmission errors under the condition of network congestion, thereby improving the reliability of model transmission in a wireless network and ensuring the precision of the model. According to the method, the first node is compressed by the model obtained by training the local training parameter set and then uploaded to the second node, the user privacy data are kept locally, meanwhile, the difficulty of the network in carrying out reverse reasoning on the model is greatly increased, and the safety of user information is further guaranteed.

Based on the same/similar concept, the embodiment of the disclosure also provides a training method.

FIG. 5 is a flowchart illustrating a training method according to an exemplary embodiment. As shown in fig. 5, the training method is used in the second node and includes the following steps.

In step S41, a model training request is transmitted.

In the embodiment of the present disclosure, the model training request includes a model compression parameter, the model compression parameter is used to compress the first training model to obtain a first compression model, and the first training model is obtained based on the training of the model training request.

The model compression parameters include at least one of the following:

In embodiments of the present disclosure, the model compression option is determined based on model subscription requirements received by the second node (e.g., the model requesting node). The second node determines to send a model training request according to the received model subscription requirement, after the first node (for example, the model training node) receives the model training request sent by the second node, the first node sends third indication information, responds to the model training request information, determines to train the first training model based on the local sample parameter set and the model training structure, and determines relevant parameters required by model compression. The information of the response model training request sent by the first node further includes one or more of the local computing capability of the first node, communication conditions, characteristics of a training sample parameter set, and the like.

In an embodiment of the present disclosure, the first node compresses the first training model based on the model compression options in the model compression parameters and the relevant parameters required for model compression. The model compression options comprise model precision and model parameter data volume. The model compression parameters comprise a plurality of model compression options, and the plurality of model compression options are determined by the second node based on one or more of local computing capacity reported by the plurality of first nodes, communication conditions, training sample parameter set characteristics and the like.

In an embodiment of the present disclosure, the model training patterns in the model compression parameters include a single training node pattern for training a single first training model and a multi-training node pattern for training a plurality of first training models.

The first node determines the number of first training models to be trained according to model training modes included in the model training parameters, and if the model training mode is single training node training, determines that one first training model is trained based on a single first node, wherein the training mode is as described above. And if the model training mode is multi-training-node training, determining that a plurality of first training models are trained on the basis of a plurality of first nodes, setting different sequence marks for the plurality of first nodes, and training the plurality of first training models.

In the embodiment of the present disclosure, after obtaining one or more first compression models corresponding to the model training mode, the first node sends a second indication message to the second node through a wireless channel. Wherein the second indication message includes a corresponding number of the first compression models to the model training patterns. And the second node receives the second indication message, determines a number of first compression models corresponding to the model training mode, performs federated averaging on one or more received first compression models to obtain a federated averaged third compression model, and determines whether the third compression model meets the model subscription requirement or meets the analysis subscription requirement.

In the embodiment of the present disclosure, the subscription requirement may be issued by Operation Administration and Maintenance (OAM), or issued by the core network. The subscription requirement comprises an analysis ID used for identifying the analysis type of the model training request; a notification target model training node address for associating a notification received by a requestor with the subscription; the analysis report information comprises parameters such as preferred analysis precision level and analysis time interval; analysis filter information (optional): indicating conditions to be met for reporting the analysis information.

In an implementation manner of the embodiment of the present disclosure, for a single first compression model or a third compression model after federating and averaging a plurality of first compression models, a fourth indication message is sent, where the fourth indication message is used to indicate the third compression model. The first node receives the fourth indication message.

In the embodiment of the present disclosure, the second node receives a subscription requirement sent by the OAM or the core network, and determines to send the model training request based on the received subscription requirement.

In the embodiment of the present disclosure, the first node and the second node may be applied between a base station and a base station, or between a base station and a terminal, or between a base station and a core network. For example, an application environment may be that the first node is a terminal, the second node is a base station, and the measurement is also applied to an application environment in which the first node is a base station and the second node is a base station. Of course, this is merely an illustration of the application environment of the first node and the second node in the present disclosure, and the application environment of the specific embodiment thereof is not specifically limited in the present disclosure.

The embodiment of the disclosure solves the problem of excessive data volume of the deep learning model, effectively relieves the condition of shortage of wireless resources, and reduces the problem of data transmission errors under the condition of network congestion, thereby improving the reliability of model transmission in a wireless network and ensuring the precision of the model. According to the method, the first node is compressed by the model obtained by training the local training parameter set and then uploaded to the second node, the user privacy data are kept locally, meanwhile, the difficulty of the network in carrying out reverse reasoning on the model is greatly increased, and the safety of user information is further guaranteed.

In some embodiments of the present disclosure, the first node is referred to as a model training node and the second node is referred to as a model requesting node. The present disclosure is further illustrated in terms of model training nodes interacting with model requesting nodes.

Fig. 6 is a flowchart of an embodiment of determining a first compression model by a single training node pattern in a training method according to the present disclosure. As shown in FIG. 6, a model request node initiates a model training request to a model training node.

The model training node sends local computing power, communication conditions and training sample parameter set characteristics to the model requesting node.

The model request node determines a model structure and a model training mode according to model/analysis subscription requirements, and provides various model compression options including model precision and model parameter data volume according to information reported by the model training node.

And the model request node sends the model structure, the model training mode and the model compression option to the model training node, and the model training node selects a proper model compression option.

And the model training node performs model training by adopting a local sample parameter set to obtain a first training model and relevant parameters required by model compression.

And the model training node compresses the first training model according to the selected model compression option and relevant parameters required by model compression to obtain a first compression model, and transmits the first compression model to the model request node through a wireless channel.

And when the first compression model obtained by the model request node meets the model/analysis subscription requirement, ending the model training process, and reporting the model to the model/analysis subscriber by the model request node.

Fig. 7 is a flowchart of an embodiment of determining a first compression model by using a multi-training-node mode in a training method according to the present disclosure. As shown in fig. 7, the model request node initiates a model training request to the model training node.

The model request node determines a model structure and a model training mode according to model/analysis subscription requirements, and provides various model compression options including model precision and model parameter data quantity according to information reported by the model training node.

And the model request node sends the model structure, the model training mode and the model compression option to the model training node, and the model training node selects the appropriate model compression option.

The model training node selects a proper model compression option, compresses the first training model according to the selected model compression option and relevant parameters required by model compression to obtain a first compression model, and transmits the first compression model to the model request node through a wireless channel.

And the model request node carries out federal averaging on the first compression model transmitted by the first model training node to obtain a global model.

It is determined whether the global model satisfies the model/analytics subscription requirements.

If the global model meets the model/analysis subscription requirement, the model training process is finished, and the model request node reports the global model to the model/analysis subscriber. And if the global model does not meet the model/analysis subscription requirement, the model training node reselects a proper model compression option and updates the first compression model according to the newly determined model compression option.

Fig. 8 is a schematic diagram of a protocol and an interface of a model training and compression decision part in a training method provided by the present disclosure. As shown in fig. 8, the system comprises a service management module and a network communication module in the model request node, and a network communication module, a model training and compression module, and a data processing and storage module in the model training node device. The method comprises the following steps of performing information interaction on a service management module and a network communication module in a model request node, and a network communication module, a model training and compressing module and a data processing and storing module in a model training node device.

In the embodiment of the present disclosure, step 1 includes steps 1a to 1c, where in step 1a, the model request node service management module sends a model training request signaling to the model request node network communication module, and the signaling indicates that the model training request is initiated to the model training node. In step 1b, the model requesting node network communication module sends the model training request signaling to the model requesting node network communication module. In step 1c, the model training node network communication module sends a model training request response signaling to the model request node network communication module, and the signaling indicates that the model training request is received.

Step 2 comprises steps 2a-2c, wherein in step 2a, the model training node model training and compressing module sends a computing capacity information reporting signaling to the model training node network communication module, and the signaling indicates that the computing capacity information of the model training node equipment is reported to a receiving party. In step 2b, the model training node data processing and storing module sends a training sample characteristic information reporting signaling to the model training node network communication module, and the signaling indication content is to report the local data training sample characteristic information of the model training node to a receiving party. In step 2c, the model training node network communication module sends the calculation capability and training sample characteristic information reporting signaling to the model training node network communication module, and the signaling indication content is to report the calculation capability and local data training sample characteristic information of the model training node to the receiving party.

In step 3, if the model training node is a terminal and the model request node is a base station, the model training node network communication module needs to measure a Channel Quality Indication (CQI) and report a signaling to the model request node network communication module, where the signaling indicates that the CQI measurement is performed and the CQI information is reported to a receiver.

In step 4, the model request node network communication module sends the model training node calculation capacity, training sample characteristics, and CQI information (optional) sending signaling to the model request node service management module, and the signaling indicates that the received model training node calculation capacity, training sample characteristics, and CQI information (optional) are summarized and sent to the receiving party.

In step 5, the model request node service management module determines the model structure and model training pattern according to the model/analysis subscription requirements.

In step 6, the model request node service management module provides various model compression options according to the reported information of the model training node.

And step 7 comprises 7a-7b, wherein in step 7a, the model request node service management module sends a signaling for sending the model structure and the model training mode to the model request node network communication module, and the signaling indicates that the model structure and the model training mode are sent to a receiving party. In step 7b, the model request node service management module sends a signaling for sending model compression options to the model request node network communication module, and the signaling indicates that the contents of the signaling are to send multiple model compression options to the receiving party.

8a-8b are included in step 8, wherein in step 8a the model requesting node network communication module sends the send model structure and model training pattern signalling to the model training node network communication module. In step 8b, the model request node network communication module sends a send model compression option signaling to the model training node network communication module.

9a-9b in step 9, wherein in step 9a the model training node network communication module sends the signaling of the model structure and the model training pattern to the model training and compression module. In step 9b, the model training node network communication module sends a send model compression option signaling to the model training and compression module.

In step 10, the model training node selects an appropriate model compression option according to the local available computational power, the real-time communication condition and the training sample characteristics, and selects an appropriate model compression option.

Fig. 9 is a schematic diagram of a protocol and an interface of model training and compression parts in a single training node mode in a training method provided by the present disclosure. As shown in fig. 9, the data processing and storage module, the model training and compression module, and the network communication module in the model training node, and the network communication module, the service management module in the model requesting node device are included. The data processing and storage module, the model training and compression module and the network communication module in the model training node, and the network communication module and the service management module in the model request node device perform the following steps.

In step 1, the method comprises steps 1a-1b, wherein in step 1a, a model training node model training and compression module sends a local training data set request signaling to a model training node data processing and storage module, and the signaling indicates that the training data set is required to be acquired from local data. In step 1b, the model training node data processing and storing module sends a local training data set sending signaling to the model training and compressing module, and the signaling indicates the content: and collecting data from the local data to generate a training data set and sending the training data set to a receiving party.

In step 2, the model training node model training and compression module performs model training by using the local training data set to obtain a training model and relevant parameters required by model compression.

In step 3, the model training node compresses the original training model according to the selected model compression option and the relevant parameters required by the model compression, and a compression model is obtained.

4a-4c are included in step 4, wherein in step 4a, the model training node model training and compression module sends the compression model to the model training node network communication module. In step 4b, the model training node network communication module sends the compression model to the model requesting node network communication module. In step 4c, the model request node network communication module sends the compressed model to the model request node service management module.

In step 5, the model request node service management module determines whether the obtained model meets the model/analysis subscription requirements. If yes, go to step 6.

In step 6, the model request node service management module sends a signaling for notifying the completion of model training to the model training node network communication module through the model request node network communication module, and the process and the corresponding signaling are newly added in the invention, and the signaling indicates the content: and informing the model training node to finish the model training process.

Otherwise, step 6a is performed.

In step 6a, the model request node service management module sends a notification model training continuation signaling to the model training node network communication module through the model request node network communication module, the process and the corresponding signaling are newly added in the invention, and the signaling indicates the content: the notification model training process continues. In step 6b, the model training node network communication module sends a notification model training continuation signaling to the model training node model training and compression module.

In step 7, the model training node model training and compression module trains the compression model by using the local training data set, and the steps 4a to 7 are repeated until the model obtained by the model request node meets the model/analysis subscription requirement.

Fig. 10 is a schematic diagram of protocols and interfaces of model training and compression parts in a multi-training-node mode in a training method provided by the present disclosure. As shown in fig. 10, the model training and compression module, the transmission control module and the network communication module in the model training node, and the network communication module, the transmission control module and the service management module in the model requesting node device are included. The information interaction among the modules thereof executes the following steps.

Step 1a-1b is included in step 1, wherein in step 1a, the model training node model training and compression module signals a request for a local training data set to the model training node data processing and storage module. In step 1b, the model training node data processing and storage module sends a send local training data set signaling to the model training node model training and compression module.

In step 2, the model training node model training and compression module performs model training by using a local training data set to obtain a first training model and relevant parameters required by model compression.

In step 3, the model training node compresses the first training model according to the selected model compression option and the relevant parameters required by the model compression, so as to obtain a first compression model.

Step 4a-4c is included in step 4, wherein in step 4a, the model training node model training and compression module sends the first compression model to the model training node network communication module. In step 4b, the model training node network communication module sends the first compression model to the model requesting node network communication module. In step 4c, the model requesting node network communication module sends the first compressed model to the model requesting node model calculation and update module.

In step 5, the model request node model calculation and update module collects the first compression models sent by the training nodes of each model, and federate averaging is performed to obtain the global model.

In step 6, the model request node model calculation and update module sends the global model to the model request node service management module.

In step 7, the model request node service management module determines whether the obtained model meets the model/analysis subscription requirements. If yes, executing:

in step 8, the model request node service management module sends a signaling notifying the end of model training to the model training node network communication module via the model request node network communication module.

Otherwise, steps 8a-8b are performed.

In step 8a, the model request node service management module sends a notification model training continuation signaling to the model training node network communication module via the model request node network communication module, and distributes the global model to the model training node network communication module via the model request node network communication module. In step 8b, the model training node network communication module sends the notification model training continuation signaling to the model training node model training and compression module, and sends the global model to the model training node model training and compression module.

In step 9, the model training node model training and compressing module performs model training and compression on the global model sent by the model request node by using the local training data set, and repeats steps 4a-9 until the model obtained by the model request node meets the model/analysis subscription requirement.

Fig. 11 is a schematic diagram of a protocol and an interface of a wireless data transmission part in a training method provided by the present disclosure. As shown in fig. 11, the model training and compression module, the transmission control module, and the network communication module in the model training node, and the network communication module, the transmission control module, and the traffic management module in the model request node are included. The method can be applied to the application scene that the model request node is the base station and the model training node is the terminal. The information interaction among the modules executes the following steps.

In step 1, the model training node model training and compression module sends the compression model to the model training node transmission control module.

In step 2, the model training node network communication module measures the CQI and reports a signaling to send to the model training node transmission control module.

In step 3, the model training node transmission control module formulates a data transmission scheme according to the compression characteristic and the wireless communication condition.

In step 4, the model training node transmission control module sends a signaling for sending data transmission scheme information to the model training node network communication module, the process and the corresponding signaling are newly added in the invention, and the signaling indicates the content: and sending the data transmission scheme information to a receiving party, wherein the data transmission scheme information comprises information such as a modulation mode, a code rate and the like.

In step 5, the model training node model training and compression module sends the compression model to the model training node network communication module.

In step 6, the model training node network communication module packages and packs the compression model according to the data transmission scheme.

In step 7, steps 7a-7d are included, wherein in step 7a, the model training node network communication module transmits the compressed model data packet to the model requesting node network communication module. In step 7b, the model requesting node network communication module sends the compression model to the model requesting node transmission control module, and the data after de-encapsulation is transmitted at this time. In step 7c, the model request node transmission control module sends a signaling notifying that the correct data is received to the model request node network communication module, and the signaling indicates the content: informing the recipient that the correct data has been received. In step 7d, the model request node network communication module sends a notification to confirm receipt of the correct data signaling to the model training node network communication module.

In step 8, the model request node transmission control module sends the compression model to the model request node service management module. If the mode is a single training node mode, the compression model is directly sent to a model request node service management module; if the mode is a multi-training-node mode, the global model needs to be obtained through the model request node model calculation and updating module in the middle, and then the global model is sent to the model request node business management module.

In step 9, the model request node service management module determines whether the model satisfies the model/analysis subscription requirements. If so, steps 10a1-10b1 are performed.

In step 10a1, the model requesting node service management module sends a notify model training end signaling to the model requesting node transmission control module. In step 10b1, the model requesting node network communication module sends a notify model training end signaling to the model training node network communication module.

Otherwise, steps 10a2-10b2 are performed.

In step 10a2, the model requesting node service management module sends a distribution model training end signaling to the model requesting node transmission control module, where the signaling indicates content: and informing the model training node to finish the model training process. In step 10b2, the model requesting node network communication module sends a notify model training end signaling to the model training node network communication module.

If the node mode is a training node mode, only a signaling for informing the model training to continue is sent; if the model is a multi-training-node mode, the global model needs to be distributed to model training nodes.

The protocol and interface principle of global model distribution are similar to those of the steps 1-7, the sending module is replaced by a model request node, the receiving module is replaced by a model training node, and the compression model is replaced by a global model, and in the process of measuring CQI and reporting in the step 2, the model request node sends a CQI measurement request to the model training node, and the model training node performs CQI measurement and feeds the CQI measurement back to the model request node.

Based on the same conception, the embodiment of the disclosure also provides a training device.

It is understood that, in order to implement the above functions, the training device provided in the embodiments of the present disclosure includes a hardware structure and/or a software module for performing each function. The disclosed embodiments can be implemented in hardware or a combination of hardware and computer software, in combination with the exemplary elements and algorithm steps disclosed in the disclosed embodiments. Whether a function is performed as hardware or computer software drives hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

FIG. 12 is a block diagram illustrating an exercise device 100 according to an exemplary embodiment. Referring to fig. 12, the apparatus is applied to a first node and includes a model training and compression module 110, a first network communication module 120, a first transmission control module 130, and a data processing and storage module 140.

The model training and compression module 110 is configured to train a first training model in response to receiving a model training request, where the model training request includes model compression parameters. And obtaining a first compression model of the first training model based on the first training model and the model compression parameters.

In an embodiment of the present disclosure, the model compression parameters include a plurality of model compression options.

The model training and compression module 110 is configured to determine a first model compression option from the plurality of model compression options, and compress the first training model based on the first model compression option to obtain a second compression model. A first loss function is determined based on an output of the first training model, an output of the second compression model, and a set of sample parameters used to train the first training model. And updating the parameters of the second compression model based on the first loss function to obtain the first compression model.

In the disclosed embodiment, the apparatus further comprises a data processing and storage module 140.

The data processing and storage module 140 is configured to determine a first cross entropy between the output of the second compression model and the set of sample parameters, and determine a first relative entropy divergence between the output of the second compression model and the output of the first training model. A first loss function is determined based on the first cross entropy and the first relative entropy divergence.

In an embodiment of the present disclosure, the data processing and storage module 140 is further configured to determine a second loss function for updating parameters of the first training model according to an output of the first training model, an output of the second compression model, and a set of sample parameters used for training the first training model.

In an embodiment of the present disclosure, the data processing and storage module 140 is further configured to determine a second cross entropy between the output of the first training model and the set of sample parameters, and determine a second relative entropy divergence between the output of the first training model and the output of the second compression model. A second loss function is determined based on the second cross entropy and the second relative entropy divergence.

In an embodiment of the present disclosure, the model compression parameters include model training patterns including a single training node pattern for training a single first training model and a multi-training node pattern for training a plurality of first training models. The number of first training models is determined based on the model training patterns.

In the disclosed embodiment, the apparatus further comprises a first network communication module 120.

The first network communication module 120 is configured to send a second indication message, where the second indication message includes a number of the first compression models corresponding to the number of the model training patterns.

In an embodiment of the present disclosure, the first network communication module 120 is further configured to receive a third indication message, where the third indication message includes an indication of determining a training model.

In the embodiment of the present disclosure, the first network communication module 120 is further configured to receive a fourth indication message. The fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first training model based on the number of the first compression models. Based on the third compression model, model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.

In the embodiment of the present disclosure, the network communication module 120 is further configured to receive a fifth indication message, where the fifth indication message is used to indicate that training of the first compression model is finished.

The first network communication module 120 is configured to perform data transmission and control signaling interaction between the model requesting node and the model training node.

The first transmission control module 130 is configured to formulate a data transmission scheme according to characteristics of data to be transmitted and wireless communication conditions, and package the data to be transmitted according to the data transmission scheme, where the transmission control module is only needed to be used in an embodiment where the model request node is a base station and the model training node is a user.

And the data processing and storing module is used for managing local data, generating training sample characteristic information, acquiring data to generate a local training data set and storing the data set.

And the model training and compressing module is used for performing model training by using the local data set and compressing the model according to the information required by model compression obtained in the training process.

FIG. 13 is a block diagram illustrating an exercise device 200 according to an example embodiment. Referring to fig. 13, the apparatus is applied to a second node, and includes a second network communication module 210, a second transmission control module 220, a traffic management module 230, and a model calculation and update module 240.

And a second network communication module 210, configured to send a model training request. The model training request comprises model compression parameters, the model compression parameters are used for compressing the first training model to obtain a first compression model, and the first training model is obtained through training based on the model training request.

In embodiments of the present disclosure, the model compression parameters include model training patterns including a single training node pattern for training a single first training model and a multiple training node pattern for training multiple first training models.

In an embodiment of the present disclosure, the second network communication module 210 is further configured to receive a second indication message, where the second indication message includes a number of the first compression models corresponding to the model training pattern.

In the embodiment of the present disclosure, the second network communication module 210 is further configured to send a third indication message, where the third indication message includes an indication of determining a training model.

In the embodiment of the present disclosure, the model training mode includes a multi-training-node mode, and the second network communication module 210 is further configured to send a fourth indication message. The fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first compression model based on the number of the first training models.

In this embodiment of the present disclosure, the second network communication module 210 is further configured to send a fifth indication message, where the fifth indication message is used to indicate that training of the first compression model is finished.

In the disclosed embodiment, the apparatus further comprises a traffic management module 230.

And the service management module 230 is configured to receive the subscription requirement and send a model training request based on the subscription requirement.

The second network communication module 210 is configured to perform data transmission and control signaling interaction between the model requesting node and the model training node.

The second transmission control module 220 is configured to formulate a data transmission scheme according to characteristics of data to be transmitted and wireless communication conditions, and package data to be transmitted according to the data transmission scheme, where the transmission control module is only needed to be used in an embodiment where the model request node is a base station and the model training node is a user.

And the service management module 230 is configured to process the model/analysis subscription request, initiate a model training request to the model training node, formulate a model structure, a model training mode and a model compression option, and check whether the obtained model meets the model/analysis subscription requirement.

And the model calculating and updating module 240 is used for performing federal averaging on the compression models sent by the plurality of model training nodes in the multi-training-node mode to obtain a global model, and distributing the global model to the model training nodes.

The model training node device for deep learning model training and compression facing to the wireless network is responsible for: responding to the model training request of the model request node, and reporting the local resource information. And selecting a proper model compression option, and performing model training and compression according to the model training mode and the selected model compression option.

With regard to the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be described in detail here.

Fig. 14 is a block diagram illustrating an apparatus 300 for training, according to an example embodiment. For example, the apparatus 300 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 14, the apparatus 300 may include one or more of the following components: a processing component 302, a memory 304, a power component 306, a multimedia component 308, an audio component 310, an input/output (I/O) interface 312, a sensor component 314, and a communication component 316.

The processing component 302 generally controls overall operation of the device 300, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 302 may include one or more processors 320 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 302 may include one or more modules that facilitate interaction between processing component 302 and other components. For example, the processing component 302 may include a multimedia module to facilitate interaction between the multimedia component 308 and the processing component 302.

The memory 304 is configured to store various types of data to support operations at the apparatus 300. Examples of such data include instructions for any application or method operating on device 300, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 304 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

Power components 306 provide power to the various components of device 300. The power components 306 may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the apparatus 300.

The multimedia component 308 includes a screen that provides an output interface between the device 300 and the user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 308 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 300 is in an operation mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 310 is configured to output and/or input audio signals. For example, audio component 310 includes a Microphone (MIC) configured to receive external audio signals when apparatus 300 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signals may further be stored in the memory 304 or transmitted via the communication component 316. In some embodiments, audio component 310 also includes a speaker for outputting audio signals.

The I/O interface 312 provides an interface between the processing component 302 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 314 includes one or more sensors for providing various aspects of status assessment for the device 300. For example, sensor assembly 314 may detect the open/closed status of device 300, the relative positioning of components, such as a display and keypad of device 300, the change in position of device 300 or a component of device 300, the presence or absence of user contact with device 300, the orientation or acceleration/deceleration of device 300, and the change in temperature of device 300. Sensor assembly 314 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 314 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 314 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 316 is configured to facilitate wired or wireless communication between the apparatus 300 and other devices. The device 300 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 316 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 316 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 300 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors, or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as the memory 304 comprising instructions, executable by the processor 320 of the apparatus 300 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

Fig. 15 is a block diagram illustrating an apparatus 400 for training according to an example embodiment. For example, the apparatus 400 may be provided as a server. Referring to fig. 15, apparatus 400 includes a processing component 422, which further includes one or more processors, and memory resources, represented by memory 432, for storing instructions, such as applications, that are executable by processing component 422. The application programs stored in memory 432 may include one or more modules that each correspond to a set of instructions. Further, the processing component 422 is configured to execute instructions to perform the training method described above.

The apparatus 400 may also include a power component 426 configured to perform power management of the apparatus 400, a wired or wireless network interface 450 configured to connect the apparatus 400 to a network, and an input output (I/O) interface 458. The apparatus 400 may operate based on an operating system stored in the memory 432, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, or the like.

It is further understood that the use of "a plurality" in this disclosure means two or more, as other terms are analogous. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. The singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It will be further understood that the terms "first," "second," and the like, are used to describe various information and should not be limited by these terms. These terms are only used to distinguish one type of information from another and do not denote a particular order or importance. Indeed, the terms "first," "second," etc. are used interchangeably throughout. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure.

It will be further appreciated that while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in serial order, or that all illustrated operations be performed, to achieve desirable results. In certain environments, multitasking and parallel processing may be advantageous.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

A training method, applied to a first node, the method comprising:

training a first training model in response to receiving a model training request, wherein the model training request comprises model compression parameters;

and obtaining a first compression model of the first training model based on the first training model and the model compression parameters.
The training method of claim 1, wherein the model compression parameters comprise a plurality of model compression options;

obtaining a first compression model of the first training model based on the first training model and the model compression parameters, including:

determining a first model compression option in the plurality of model compression options, and compressing the first training model based on the first model compression option to obtain a second compression model;

determining a first loss function according to the output of the first training model, the output of the second compression model and a sample parameter set used for training the first training model;

and updating the second compression model parameters based on the first loss function to obtain the first compression model.
A training method according to claim 2, wherein the determining a first loss function from the output of the first training model, the output of the second compression model, and the set of sample parameters used to train the first training model comprises:

determining a first cross entropy between the output of the second compression model and the set of sample parameters, and determining a first relative entropy divergence between the output of the second compression model and the output of the first training model;

determining the first loss function based on the first cross entropy and a first relative entropy divergence.
A training method as claimed in claim 2 or 3, characterized in that the method further comprises:

determining a second loss function for updating the first training model parameters based on the output of the first training model, the output of the second compression model, and the set of sample parameters used to train the first training model.
The training method of claim 4, wherein determining a second loss function for updating the first training model parameters based on the output of the first training model, the output of the second compression model, and the set of sample parameters used to train the first training model comprises:

determining a second cross entropy between the output of the first training model and the set of sample parameters, and determining a second relative entropy divergence between the output of the first training model and the output of the second compression model;

determining the second loss function based on the second cross entropy and a second relative entropy divergence.
A training method according to claim 1, wherein the model compression parameters comprise model training patterns including a single training node pattern for training a single one of the first training models and a multi-training node pattern for training a plurality of the first training models;

the number of first training models is determined based on the model training patterns.
The training method of claim 6, further comprising:

sending a second indication message, the second indication message comprising a corresponding number of first compression models to the model training patterns.
Training method according to claim 1, characterized in that the method further comprises:

receiving a third indication message, the third indication message comprising a determination training model indication.
A training method as recited in claim 6, wherein the model training patterns comprise a multi-training-node pattern, the method further comprising:

receiving a fourth indication message; the fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first training model based on the number of the first compression models;

based on the third compression model, the model compression parameters are re-determined, and the first compression model is updated based on the re-determined model compression parameters.
Training method according to claim 1, characterized in that the method further comprises:

receiving a fifth indication message, wherein the fifth indication message is used for indicating that the training of the first compression model is finished.
A training method applied to a second node, the method comprising:

sending a model training request;

the model training request comprises model compression parameters, the model compression parameters are used for compressing a first training model to obtain a first compression model, and the first training model is obtained based on the model training request.
A training method as recited in claim 11, wherein the model compression parameters comprise model training patterns comprising a single training node pattern for training a single of the first training models and a multi-training node pattern for training a plurality of the first training models;

the number of first training models is determined based on the model training patterns.
Training method according to claim 12, characterized in that the method further comprises:

receiving a second indication message comprising a corresponding number of first compression models to the model training patterns.
A training method as recited in claim 11, the method further comprising:

sending a third indication message, the third indication message comprising an indication to determine a training model.
A training method as recited in claim 12, wherein the model training patterns comprise a multi-training-node pattern, the method further comprising:

sending a fourth indication message; the fourth indication message is used for indicating a third compression model, and the third compression model is obtained by carrying out federal averaging on the first compression model based on the number of the first training models.
A training method as recited in claim 11, the method further comprising:

and sending a fifth indication message, wherein the fifth indication message is used for indicating the end of training the first compression model.
Training method according to claim 11, characterized in that the method further comprises:

and receiving a subscription requirement, and sending a model training request based on the subscription requirement.
An exercise apparatus, applied to a first node, the apparatus comprising:

the model training module is used for training a first training model in response to receiving a model training request, wherein the model training request comprises model compression parameters;

and the model compression module is used for obtaining a first compression model of the first training model based on the first training model and the model compression parameters.
An exercise apparatus, for use in a second node, the apparatus comprising:

the network communication module is used for sending a model training request;

the model training request comprises model compression parameters, the model compression parameters are used for compressing a first training model to obtain a first compression model, and the first training model is obtained based on the model training request.
An exercise device, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to: performing the training method of any one of claims 1-10, or performing the training method of any one of claims 11-17.
A non-transitory computer readable storage medium, wherein instructions, when executed by a processor of a mobile terminal, enable the mobile terminal to perform the training method of any one of claims 1-10 or to perform the training method of any one of claims 11-17.