WO2022104550A1 - 模型蒸馏训练的方法及相关装置和设备、可读存储介质 - Google Patents

模型蒸馏训练的方法及相关装置和设备、可读存储介质 Download PDF

Info

Publication number
WO2022104550A1
WO2022104550A1 PCT/CN2020/129478 CN2020129478W WO2022104550A1 WO 2022104550 A1 WO2022104550 A1 WO 2022104550A1 CN 2020129478 W CN2020129478 W CN 2020129478W WO 2022104550 A1 WO2022104550 A1 WO 2022104550A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
neural network
network model
distillation
information
Prior art date
Application number
PCT/CN2020/129478
Other languages
English (en)
French (fr)
Inventor
闫文
于益俊
秦东润
辛阳
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to CN202080106943.6A priority Critical patent/CN116438546A/zh
Priority to EP20961841.2A priority patent/EP4227857A4/en
Priority to PCT/CN2020/129478 priority patent/WO2022104550A1/zh
Publication of WO2022104550A1 publication Critical patent/WO2022104550A1/zh
Priority to US18/318,616 priority patent/US20230289600A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present application relates to the technical field of model distillation training, and in particular, to a method for model distillation training, related devices and equipment, and readable storage media.
  • central network elements and edge network elements are a common application scenario in the network.
  • a radio access network Radio Access Network, RAN
  • UE central Equipment
  • NWDAF Network Data Analytics Function
  • the central network element can send the trained neural network model to the edge network element.
  • Figure 1 is a schematic diagram of the central network element and the edge network element deploying the neural network. As shown in Figure 1, because the central network element has powerful computing power and stores a large amount of data resources, the central network element usually deploys large scales for all types of scenarios. neural network (large network), and train the neural network.
  • Edge network elements have limited storage resources and computing resources, and edge network elements usually only focus on a few specific categories, so edge network elements usually deploy small neural networks (small networks).
  • edge network elements there are mainly two ways for edge network elements to deploy neural networks: one way is that the central network element directly delivers the large network to the edge network elements, and the edge network elements deploy the large network. Due to the limited storage resources and computing resources of edge network elements, edge network elements deploying neural networks in this way will occupy the limited storage resources of edge network elements. Another way is that the central network element performs network model distillation to obtain a unified small network, and delivers the small network to the edge network element, and the edge network element deploys the small network. Since the central network element does not know the storage resources, computing capabilities of each edge network element, and the category concerned by the edge network element, it is impossible to obtain a small network model adapted to each edge network element by distillation.
  • the problem caused by the mismatch between the small network model obtained by the unified distillation training of the central network element and the edge network elements is that the categories supported by the small network model are all categories, while the categories concerned by the edge network elements are only a few specific ones.
  • the computing power and storage resources required by the small network model of all categories exceed the capabilities of the edge network element, the small network model will occupy the storage space and running space of the edge network element, affecting the edge network element to complete other tasks.
  • the edge network element may not be able to fully run the full category of small network models, making the small network model perform tasks such as portrait recognition, traffic In recognition and other functions, the degree of completion and accuracy is not high.
  • the present application provides a method for model distillation training, which solves the problem of occupying the limited storage resources of the large network issued by the edge network element deployment center network element, and also solves the problem of the edge network element deploying the large network issued by the central network element.
  • an embodiment of the present application provides a method for model distillation training, including:
  • the first device designs the first neural network model
  • the first device sends first configuration information to the second device; wherein, the first configuration information is used for the second device to configure the first reference neural network model;
  • the first device sends first data information to the second device; wherein, the first data information includes first sample data used by the first reference neural network model for distillation training;
  • the first device receives the first indication information returned by the second device, and uses the first indication information to train the first neural network model; the first indication information is the first sample data input to the information obtained from the first reference neural network model.
  • the first device designs the first neural network model, and sends the first configuration information and the first data information to the second device, so that the first device can individually design and train the first neural network model, Realize the balance between the size of the first neural network model and the accuracy of the first neural network model, improve the efficiency of distillation training, solve the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem of This solves the problem that the first device deploys a small network issued by the second device and obtained by unified distillation training, which is not compatible with the first device.
  • the problem caused by the incompatibility between the small neural network model obtained by the unified distillation training of the central network element and the edge network elements in the prior art can be solved, including: the categories supported by the small neural network model are: All categories, and edge network elements only focus on a few specific categories.
  • the computing power and storage resources required by the small neural network model of the full category exceed the capacity of the edge network element, the small neural network model will occupy the edge network.
  • the storage space and running space of the network element affect the edge network element to complete other tasks, thereby reducing the working performance of the edge network element.
  • the edge network element may not be able to fully operate.
  • the all-category small neural network model makes the technical problem of low completion and accuracy when the small neural network model performs functions such as portrait recognition and traffic recognition.
  • the method before the first device designs the first neural network model, the method further includes:
  • the first device sends a first training request to the second device; wherein the first training request includes a first training type ID, and the first training type ID is used to indicate that the first device is to be distilled for training
  • the functional type of the neural network model
  • the first device receives the first response sent by the second device; wherein the first response includes first storage information; the first The storage information is used to indicate the size of the storage space occupied by the first reference neural network model; the second training type ID is used to indicate the functional type of the neural network model that the second device supports distillation training;
  • the first device sends first query information to the second device based on the first stored information; wherein the first query information includes first distillation query information and first distillation capacity information; the first distillation The inquiry information is used by the first device to inquire whether the second device supports distillation training of the neural network model and, in the case of supporting the first device for distillation and training of the neural network model, a method of supporting the distillation and training of the neural network model; the The first distillation capability information includes one or more of the first device's ability to design a neural network model, a distillation training capability to the neural network model, and whether there is distillation training data;
  • the first device receives a second response from the second device; wherein the second response is used to indicate whether the second device supports the first device to train a neural network model for distillation and supports the first device.
  • the method of training a neural network model by distillation is supported.
  • the first device sends a first training request to the second device
  • the second device provides a corresponding reference neural network model to the first device according to the first training request, which is beneficial for the first device to design the first neural network model.
  • the balance between the size of the first neural network model and the accuracy of the first neural network model is achieved, the distillation training efficiency is improved, and the problem that the first device cannot store the large network issued by the second device due to its limited storage resources is solved. , which also solves the problem that the small network obtained by the first device deploying the unified distillation training issued by the second device is not compatible with the first device.
  • the method before the first device sends the first configuration information to the second device, the method further includes:
  • the first device sends a first attention category list to the second device; wherein, the first attention category list includes a set of concerned categories for distillation training of the first device;
  • the set of concerned categories in the distillation training of a device is a subset of the category set in the first category list; the first category list includes the set of preset categories of the first reference neural network model.
  • the first device sends the first category of interest list to the second device, which is helpful for the second device to obtain the distillation training data based on the first data information sent by the first device, and then according to the first category of interest list
  • the category filters the data obtained after distillation training to obtain first indication information
  • the first indication information corresponds to the categories in the first attention category list, which is conducive to the distillation training of the first device based on the first indication information
  • the adapted first neural network model is developed, and the communication overhead is further reduced without affecting the training effect of the first neural network model, and the network resources for separate distillation training are saved.
  • the first indication information is that the second device calculates and processes the first sample data through the first reference neural network model, and processes the processed first sample data.
  • the data is obtained by filtering based on the first concerned category list.
  • the second device filters the data obtained after distillation training according to the categories in the first category of interest list to obtain first indication information, which is the same as the category in the first category of interest list
  • first indication information which is the same as the category in the first category of interest list
  • the first device designs a first neural network model, including:
  • the first device sends a first network structure request to the second device; wherein, the first network structure request is used by the first device to obtain the information of the first reference neural network model from the second device. structural information;
  • the first device receives a first structure request response sent by the second device; wherein the first structure request response includes structure information of the first reference neural network model;
  • the first device designs a first neural network model based on the structural information of the first reference neural network model.
  • the first device after the first device sends the first network structure request to the second device, it receives the first structure request response sent by the second device, and obtains the structure information of the first reference neural network model. It is beneficial for the first device to design the first neural network model according to the structural information of the first reference neural network model, realizes the balance between the size of the first neural network model and the accuracy of the first neural network model, improves the distillation training efficiency, and solves the problem of The problem that the first device cannot store the large network issued by the second device due to its limited storage resources also solves the problem that the first device deploys the small network obtained by the unified distillation training issued by the second device, which is incompatible with the first device. adaptation problem.
  • the method further includes:
  • the first device sends a first training result notification to the second device; wherein the first training result notification includes the degree of completion information of the distillation training of the first neural network model by the first device, the distillation training Feedback information of the first neural network model.
  • the first device sends a first training result notification to the second device, where the first training result notification is used to inform the second device whether the training of the first neural network model is completed and to feed back the first training result to the second device.
  • a neural network model information The second device determines whether to release the communication resources of distillation training between the first device and the second device according to the first training result notification, so as to avoid occupying other channel resources of the first device and the second device.
  • an embodiment of the present application provides a method for model distillation training, including:
  • the second device receives the first configuration information sent by the first device, and uses the first configuration information to configure the first reference neural network model;
  • the second device receives first data information sent by the first device; wherein the first data information includes first sample data used for distillation training by the first reference neural network model;
  • the second device sends first indication information to the first device; wherein the first indication information is used by the first device to train a first neural network model; the first indication information is the first This data is input into the information obtained by the first reference neural network model.
  • the second device receives the first configuration information and the first data information sent by the first device, which is beneficial for the second device to obtain the first indication information based on the first configuration information and the first data information sent by the first device, So that the first device uses the first instruction information to train the first neural network model. Therefore, the balance between the size of the first neural network model and the accuracy of the first neural network model is achieved, the distillation training efficiency is improved, and the problem that the first device cannot store the large network issued by the second device due to its limited storage resources is solved. It also solves the problem that the first device deploys the small network obtained by the unified distillation training issued by the second device and does not adapt to the first device.
  • the second device receives the first configuration information sent by the first device, and before configuring the first reference neural network model by using the first configuration information, the method further includes:
  • the second device receives the first training request sent by the first device; wherein, the first training request includes a first training type ID, and the first training type ID is used to indicate that the first device is to be distilled and trained.
  • the functional type of the neural network model ;
  • the second device If the second training type ID is consistent with the first training type ID, the second device generates a first response based on the first training request, and sends the first response to the first device; wherein , the second training type ID is used to indicate that the second device supports the function type of the distillation training neural network model; the first response includes first storage information; the first storage information is used to indicate that the first The size of the storage space occupied by the reference neural network model;
  • the second device receives first query information sent by the first device; wherein the first query information includes first distillation query information and first distillation capacity information; the first distillation query information is used for the The first device asks the second device whether to support the distillation training of the neural network model and, if the first device supports the distillation and training of the neural network model, the method of supporting the distillation and training of the neural network model; the first distillation capability information includes One or more of the ability of the first device to design the neural network model, the distillation training ability of the neural network model, and whether there is distillation training data;
  • the second device generates a second response based on the first distillation query information, and sends the second response to the first device; wherein the second response is used to indicate whether the second device supports
  • the first device distillation trains the neural network model and supports the distillation training method of the neural network model under the condition that the first device supports the distillation training of the neural network model.
  • the second device receives the first training request and the first inquiry information sent by the first device, and sends a corresponding response to the first device, which is beneficial for the first device to design the first neural network model. Furthermore, the balance between the size of the first neural network model and the accuracy of the first neural network model is achieved, the distillation training efficiency is improved, and the problem that the first device cannot store the large network issued by the second device due to its limited storage resources is solved. , which also solves the problem that the small network obtained by the first device deploying the unified distillation training issued by the second device is not compatible with the first device.
  • the method before the second device receives the first configuration information sent by the first device, the method further includes:
  • the second device receives a first attention category list sent by the first device; wherein, the first attention category list includes a set of concerned categories for distillation training of the first device; the first attention category list contains , the set of categories concerned by the distillation training of the first device is a subset of the category set in the first category list; the first category list includes the set of preset categories of the first reference neural network model.
  • the second device receives the first attention category list sent by the first device, which is helpful for the second device to obtain first indication information based on the first attention category list, where the first indication information includes information related to the first device.
  • the first device uses the first indication information to train the first neural network model, and the first neural network model after the training is completed is the neural network model suitable for the first device, and realizes the first neural network model.
  • the balance between the size of the network model and the accuracy of the first neural network model improves the efficiency of distillation training, solves the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem of the first device.
  • the problem of deploying a small network issued by the second device and obtained by unified distillation training does not fit the first device.
  • the first indication information is that the second device calculates and processes the first sample data through the first reference neural network model, and processes the processed first sample data.
  • the data is obtained by filtering based on the first concerned category list.
  • the second device sends the first indication information obtained based on the first attention category list to the first device, which is helpful for the first device to obtain a more adapted first neural network based on the first indication information model, and then solve the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem that the first device deploys the small network obtained by the unified distillation training issued by the second device and is different from the first device.
  • a device does not fit the problem.
  • the method before the second device receives the first configuration information sent by the first device, the method further includes:
  • the second device receives a first network structure request sent by the first device; wherein, the first network structure request is used by the first device to obtain the first reference neural network model from the second device structural information;
  • the second device sends a first structure request response to the first device based on the first network structure request; wherein the first structure request response includes structure information of the first reference neural network model.
  • the second device sends the structure information of the first reference neural network model to the first device, which is helpful for the first network to design the first neural network model according to the structure information of the first reference neural network model, and realizes the first
  • the balance between the size of the first neural network model and the accuracy of the first neural network model improves the distillation training efficiency, solves the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem of the first device.
  • the method further includes:
  • the second device receives the first training result notification sent by the first device; wherein, the first training result notification includes the degree of completion information of the first neural network model for distillation training by the first device, and the information about the degree of completion of the distillation training for the first device.
  • the feedback information of the first neural network model is described.
  • the second device receives the first result notification, which is helpful for the second device to determine whether to release the distillation training communication resources between the first device and the second device according to the first training result notification, so as to avoid occupying the first device and the second device.
  • Other channel resources of the second device are possible.
  • an embodiment of the present application provides a method for model distillation training, including:
  • the third device receives the registration information sent by the second device; wherein the registration information includes the third training type ID, the third neural network model ID, the second storage information, the second category list, and training response information; the training response The information is used to indicate whether the second device supports distillation to train the neural network model and in the case of supporting the distillation to train the neural network model, the mode to support the distillation to train the neural network model; the third training type ID is used to indicate the third training type ID.
  • the second device supports the functional type of distillation training neural network model;
  • the third device receives a second training request sent by the first device; wherein the second training request includes a fourth training type ID, second distillation query information and second distillation capability information; the fourth training request Type ID is used to indicate the functional type of the neural network model to be distilled and trained by the first device;
  • the third device If the fourth training type ID is consistent with the third training type ID, the third device generates a third response based on the second training request, and sends the third response to the first device; wherein, The third response includes training response information, a third neural network model ID, second storage information, and a second category list;
  • the third device receives a distillation notification sent by the first device; wherein, the distillation result notification is used to indicate whether the first device and the second device are successfully matched.
  • the third device receives the registration information sent by the second device and the third device receives the second training request sent by the first device, which is beneficial for the third device to analyze the registration information and the second training request.
  • a distillation training communication connection is established for the first device and the second device, so as to achieve a balance between the size of the second neural network model and the accuracy of the second neural network model, thereby improving the efficiency of distillation training.
  • an embodiment of the present application provides a method for model distillation training, including:
  • the first device sends a second training request to the third device; wherein, the second training request includes the fourth training type ID, the second distillation query information and the second distillation capability information; wherein, the fourth training type ID uses to indicate the functional type of the neural network model to be distilled and trained by the first device;
  • the first device receives a third response sent by the third device; wherein the third response includes training response information, third neural network model ID, second storage information and second category list; the third training type ID is used to indicate that the second device supports the functional type of the distillation training neural network model;
  • the first device sends a distillation notification to the third device; wherein the distillation result notification is used to indicate whether the first device and the second device are successfully matched.
  • the first device sends a second training request to the third device
  • the third device sends a third response to the first device based on the second training request, which is beneficial for the third device to be the first device and the second device Establish a communication connection for distillation training, thereby improving the efficiency of distillation training.
  • the method further includes:
  • the first device designs a second neural network model
  • the first device sends second configuration information to the second device; wherein the second configuration information is used for the second device to configure the second reference neural network model;
  • the first device sends second data information to the second device; wherein the second data information includes second sample data used by the second reference neural network model for distillation training;
  • the first device receives the second indication information returned by the second device, and uses the second indication information to train the second neural network model; the second indication information is that the second sample data is input to the The information obtained by the second reference neural network model.
  • the first device designs a second neural network model, and the first device and the second device send and receive the second configuration information and the second data information.
  • This distillation trains the communication process. It is beneficial for the second device to acquire the second indication information adapted to the first device, thereby facilitating the first device to use the second indication information to train a second neural network model adapted to it.
  • the method before the first device sends the second configuration information to the second device, the method further includes:
  • the first device sends a second category of interest list to the second device; wherein the second category of interest list includes a set of categories of interest in the distillation training of the first device; in the second category of interest list, The set of concerned categories in the distillation training of the first device is a subset of the category set in the second category list; the second category list includes the set of preset categories of the second reference neural network model.
  • the first device sends the second category of interest list to the second device, which is beneficial for the second device to obtain second indication information based on the second category of interest list, and the second indication information and the second category of interest list
  • the first device uses the second instruction information to train the second neural network model, and a neural network model suitable for the first device can be obtained to realize the size of the second neural network model and the second neural network model.
  • the balance of accuracy improves the efficiency of distillation training, solves the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem that the first device deploys the second device to issue a unified
  • the problem that the small network obtained by distillation training is not suitable for the first device.
  • the second indication information is that the second device calculates and processes the second sample data through the second reference neural network model, and uses the processed second sample data based on The second attention category list is obtained by filtering.
  • the second device obtains second indication information based on the second attention category list, the second indication information corresponds to a category in the second attention category list, and the first device uses the second indication information to train the first device
  • Two neural network models a neural network model suitable for the first device can be obtained, the balance between the size of the second neural network model and the accuracy of the second neural network model can be achieved, the distillation training efficiency is improved, and the problem of the first device is solved. It has limited storage resources and cannot store the large network issued by the second device. It also solves the problem that the first device deploys the small network issued by the second device and obtained by unified distillation training, which is not suitable for the first device. .
  • the first device designs a second neural network model, including:
  • the first device sends a second network structure request to the second device; wherein, the second network structure request is used by the first device to obtain the structure information of the second reference neural network model from the second device ;
  • the first device receives a second structure request response sent by the second device; wherein the second structure request response includes structure information of the second reference neural network model;
  • the first device designs a second neural network model based on the structural information of the second reference neural network model.
  • the first device after sending the second network structure request to the second device, receives the second structure request response sent by the second device, and obtains the structure information of the second reference neural network model. It is beneficial for the first device to design the second neural network model according to the structural information of the first reference neural network model, realizes the balance between the size of the second neural network model and the accuracy of the second neural network model, improves the distillation training efficiency, and solves the problem.
  • the problem that the first device cannot store the large network issued by the second device due to its limited storage resources also solves the problem that the first device deploys the small network obtained by the unified distillation training issued by the second device, which is incompatible with the first device. adaptation problem.
  • the second data information sent by the first device to the second device further includes:
  • the first device sends a second training result notification to the second device; wherein, the second training result notification includes the completion information of the first device's distillation training of the second neural network model, and the distillation training of the second neural network model.
  • the second training result notification includes the completion information of the first device's distillation training of the second neural network model, and the distillation training of the second neural network model.
  • the first device sends a second training result notification to the second device, where the second training result notification is used to inform the second device whether the training of the second neural network model is completed and to feed back the second training result to the second device.
  • the second device determines whether to release the distillation training communication resources between the first device and the second device according to the second training result notification, so as to avoid occupying other channel resources of the first device and the second device.
  • an embodiment of the present application provides a method for model distillation training, including:
  • the second device sends registration information to the third device; wherein the registration information includes a third training type ID, a third neural network model ID, second storage information, a second category list, and training response information; the training response information is used to indicate whether the second device supports distillation training neural network model and in the case of supporting distillation training neural network model, the mode of supporting distillation training neural network model;
  • the second device receives second data information sent by the first device; wherein, the second data information includes second sample data used by the second reference neural network model for distillation training;
  • the second device sends second indication information to the first device; wherein the second indication information is used by the first device to train a second neural network model; the second indication information is the second
  • the data is input to the information obtained by the second reference neural network model.
  • the second device sends the registration information to the third device, which helps the third device to quickly and effectively establish a distillation training connection between the first device and the second device, and improves the efficiency of distillation training.
  • the second device receives the second configuration information and the second data information sent by the first device, which is beneficial for the second device to obtain the second indication information based on the second configuration information and the second data information, so that the first device uses the second indication information to train The second neural network model. Therefore, the balance between the size of the second neural network model and the accuracy of the second neural network model is achieved, the efficiency of distillation training is improved, and the problem that the first device cannot store the large network issued by the second device due to its limited storage resources is solved. It also solves the problem that the first device deploys the small network obtained by the unified distillation training issued by the second device and does not adapt to the first device.
  • the method before the second device receives the second configuration information sent by the first device, the method further includes:
  • the second device receives a second category of interest list sent by the first device; wherein the second category of interest list includes a set of categories of interest in distillation training of the first device; the second category of interest list contains , the set of categories concerned by the distillation training of the first device is a subset of the category set in the second category list; the second category list includes the set of preset categories of the second reference neural network model.
  • the second device receives the second attention category list sent by the first device, which is beneficial for the second device to obtain second indication information based on the second attention category list, where the second indication information includes information related to the first device.
  • the first device uses the second indication information to train the first neural network model, and the first neural network model after the training is completed is the neural network model adapted to the first device, and realizes the first neural network model.
  • the balance between the size of the network model and the accuracy of the second neural network model improves the efficiency of distillation training, solves the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem of the first device.
  • the problem of deploying a small network issued by the second device and obtained by unified distillation training does not fit the first device.
  • the second indication information is that the second device calculates and processes the second sample data through the second reference neural network model, and uses the processed second sample data based on The second attention category list is obtained by filtering.
  • the second device sends the second indication information obtained based on the second attention category list to the first device, which is helpful for the first device to obtain a more adapted second neural network based on the second indication information model, and then solve the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem that the first device deploys the small network obtained by the unified distillation training issued by the second device and is different from the first device.
  • a device does not fit the problem.
  • the method before the second device receives the second configuration information sent by the first device, the method further includes:
  • the second device receives a second network structure request sent by the first device; wherein, the second network structure request is used by the first device to obtain the second reference neural network model from the second device structural information;
  • the second device sends a second structure request response to the first device based on the second network structure request; wherein the second structure request response includes structure information of the second reference neural network model.
  • the second device sends the structure information of the second reference neural network model to the first device, which is helpful for the second network to design the second neural network model according to the structure information of the second reference neural network model, and realizes the first
  • the balance between the size of the second neural network model and the accuracy of the second neural network model improves the efficiency of distillation training, solves the problem that the first device cannot store the large network issued by the second device due to its limited storage resources, and also solves the problem of the second device.
  • the method further includes:
  • the second device receives the second training result notification sent by the first device; wherein, the second training result notification includes the degree of completion information of the distillation training of the second neural network model by the first device, and the distillation training of the second neural network model. Feedback information of the second neural network model.
  • the second device receives the second result notification, which is helpful for the second device to determine whether to release the distillation training communication resources between the first device and the second device according to the second training result notification, so as to avoid occupying the first device and the second device.
  • Other channel resources of the second device are possible.
  • an embodiment of the present application provides a model distillation training device, where the model distillation training device is a first device, including:
  • a first neural network model design unit configured to design the first neural network model if the second device supports the distillation training neural network model of the first device
  • a first configuration information sending unit configured to send the first configuration information to the second device
  • a first data information sending unit configured to send the first data information to the second device
  • a first indication information receiving unit configured to receive the first indication information returned by the second device
  • a first neural network model training unit configured to use the first indication information to train the first neural network model.
  • model distillation training device further includes:
  • a first training request sending unit used for sending a first training request to the second device before the first neural network model design unit designs the first neural network model
  • the first response receiving unit is used for, after the first training request sending unit sends the first training request to the second device, if the second training type ID is consistent with the first training type ID, receive the sent by the second device the first response;
  • a first inquiry information sending unit configured to send first inquiry information to the second device based on the first stored information
  • the second response receiving unit is configured to receive the second response from the second device after the first query information sending unit sends the first query information to the second device.
  • model distillation training device further includes:
  • a first attention category list sending unit configured to send the first attention category list to the second device before the first configuration information sending unit sends the first configuration information to the second device.
  • the first neural network model design unit further includes:
  • a first network structure request sending unit configured to send a first network structure request to the second device before the first neural network model design unit designs the first neural network model
  • the first structure request response receiving unit is configured to receive the first structure request response sent by the second device after the first network structure request sending unit sends the first network structure request to the second device.
  • an embodiment of the present application provides a model distillation training device, where the model distillation training device is a second device, including:
  • a first configuration information receiving unit configured to receive the first configuration information sent by the first device
  • a first configuration unit configured to use the first configuration information to configure the first reference neural network model
  • a first data information receiving unit configured to receive the first data information sent by the first device
  • the first indication information sending unit is configured to send the first indication information to the first device.
  • model distillation training device further includes:
  • a first training request receiving unit configured to receive a first training request sent by the first device before the first parameter configuration unit configures the first reference neural network model
  • a first response sending unit configured to generate a first response based on the first training request if the second training type ID is consistent with the first training type ID, and send the first response to the first training request equipment;
  • a first inquiry information receiving unit configured to receive the first inquiry information sent by the first device
  • the second response sending unit is configured to send the second response to the first device after the first query information receiving unit receives the first query information sent by the first device.
  • model distillation training device further includes:
  • the first attention category list receiving unit is configured to receive the first attention category list sent by the first device before the first configuration information receiving unit receives the first configuration information sent by the first device.
  • model distillation training device further includes:
  • a first network structure request receiving unit configured to receive the first network structure request sent by the first device before the first configuration information receiving unit receives the first configuration information sent by the first device;
  • a first structure request response sending unit configured to send a first structure request response to the first device.
  • the embodiment of the present application provides a model distillation training device, and the model distillation training device is a third device, including:
  • a registration information receiving unit configured to receive the registration information sent by the second device
  • a second training request receiving unit configured to receive a second training request sent by the first device
  • a third response sending unit configured to generate a third response based on the second training request if the fourth training type ID is consistent with the third training type ID, and send the third response to the third training request a device;
  • a distillation notification receiving unit configured to receive a distillation notification sent by the first device.
  • an embodiment of the present application provides a model distillation training device, where the model distillation training device is a first device, including:
  • a second training request sending unit configured to send a second training request to the third device
  • a third response receiving unit configured to receive a third response sent by the third device if the fourth training type ID is consistent with the third training type ID;
  • a distillation notification sending unit configured to send a distillation notification to the third device.
  • model distillation training device further includes:
  • a second neural network model design unit used for designing a second neural network model after the distillation notification sending unit sends a distillation notification to the third device;
  • a second configuration information sending unit configured to send the second configuration information to the second device
  • a second data information sending unit configured to send second data information to the second device
  • the second indication information receiving unit is configured to receive the second indication information returned by the second device.
  • model distillation training device further includes:
  • the second attention category list sending unit is configured to send the second attention category list to the second device after the second configuration information sending unit sends the second configuration information to the second device.
  • the second neural network model design unit further includes:
  • a second network structure request sending unit configured to send a second network structure request to the second device
  • the second structure request response receiving unit is configured to receive the second structure request response sent by the second device.
  • an embodiment of the present application provides a model distillation training device, where the model distillation training device is a second device, including:
  • a registration information sending unit configured to send registration information to the third device
  • a second configuration information receiving unit configured to receive the second configuration information sent by the first device
  • a second data information receiving unit configured to receive the second data information sent by the first device
  • the second indication information sending unit is configured to send the second indication information to the first device.
  • model distillation training device further includes:
  • the second attention category list receiving unit is configured to receive the second attention category list sent by the first device before the second configuration information receiving unit receives the second configuration information sent by the first device.
  • model distillation training device further includes:
  • a second network structure request receiving unit configured to receive the second network structure request sent by the first device before the second configuration information receiving unit receives the second configuration information sent by the first device;
  • a second structure request response sending unit is configured to send a second structure request response to the first device based on the second network structure request.
  • an embodiment of the present application provides a model distillation training device, including a processor and a memory;
  • the memory is used for storing program codes
  • the processor is used for calling the program codes stored in the memory to execute the model distillation training method in the first aspect and various possible implementation manners thereof.
  • an embodiment of the present application provides a model distillation training device, including a processor and a memory;
  • the memory is used for storing program codes
  • the processor is used for calling the program codes stored in the memory to execute the model distillation training method in the second aspect and various possible implementation manners thereof.
  • an embodiment of the present application provides a model distillation training device, including a processor and a memory;
  • the memory is used for storing program codes
  • the processor is used for calling the program codes stored in the memory to execute the model distillation training method in the third aspect and various possible implementation manners thereof.
  • an embodiment of the present application provides a model distillation training device, including a processor and a memory;
  • the memory is used for storing program codes
  • the processor is used for calling the program codes stored in the memory to execute the model distillation training method in the fourth aspect and various possible implementation manners thereof.
  • an embodiment of the present application provides a model distillation training device, including a processor and a memory;
  • the memory is used for storing program codes
  • the processor is used for calling the program codes stored in the memory to execute the model distillation training method in the fifth aspect and various possible implementation manners thereof.
  • an embodiment of the present application provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, enables a first device to implement the above-mentioned first aspect and The fourth aspect and the method for model distillation training in various possible implementations thereof, or the second device to realize the above-mentioned second aspect and the fifth aspect and the method for model distillation training in various possible implementations thereof, or the first Three devices implement the method for model distillation training in the third aspect and various possible implementations thereof.
  • an embodiment of the present application provides a computer program, the computer program including instructions, when the computer program is executed by a computer, enables a first device to execute the above-mentioned first and fourth aspects and their various
  • the process performed by the first device, or the second device may perform the process performed by the second device in the second aspect and the fifth aspect and various possible implementation manners
  • the third device may perform the process. Execute the process performed by the third device in the third aspect and various possible implementation manners thereof.
  • FIG. 1 is a schematic diagram of a central network element and an edge network element deploying a neural network according to an embodiment of the present application
  • Fig. 2 is a kind of model distillation schematic diagram provided by the embodiment of the present application.
  • FIG. 3 is a schematic diagram of the architecture of a model distillation training system provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a model distillation training provided by an embodiment of the present application.
  • FIG. 5 is a schematic flowchart of another model distillation training provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of another model distillation training provided by an embodiment of the present application.
  • FIG. 7 is a schematic flowchart of another model distillation training provided by an embodiment of the present application.
  • FIG. 8 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • FIG. 9 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • FIG. 12 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • FIG. 13 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • 16 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • FIG. 17 is a schematic structural diagram of another model distillation training device provided by an embodiment of the present application.
  • a component may be, but is not limited to, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer.
  • an application running on a computing device and the computing device may be components.
  • One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between 2 or more computers.
  • these components can execute from various computer readable media having various data structures stored thereon.
  • a component may, for example, be based on a signal having one or more data packets (eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals) Communicate through local and/or remote processes.
  • data packets eg, data from two components interacting with another component between a local system, a distributed system, and/or a network, such as the Internet interacting with other systems via signals
  • the first device and the second device described in the claims and the description of the present application are network devices or user devices that support AI model training and neural model distillation training, and the third device supports inter-device communication processes and receives device registration. Information exchange equipment.
  • FIG. 2 is a schematic diagram of a model distillation provided by an embodiment of the present application.
  • the schematic diagram of the model distillation includes a teacher network and a student network.
  • the teacher network is a trained neural network, the network structure is complex, usually has more nodes and layers, and has powerful computing processing capabilities and a large amount of data resources.
  • the student network is a neural network to be trained, which is a small network with a simpler network structure and usually has fewer nodes and layers.
  • the teacher network and the student network input the same X, and the X is processed and calculated by the teacher network and processed by the normalized exponential function Softmax 1 function to obtain the instruction information of the distillation training.
  • the distillation training The indication information is the soft target 1 output by the teacher network, which can also be understood as the "knowledge" obtained by the teacher network processing X.
  • the X is processed and calculated by the student network, and after being processed by the Softmax2 function, the soft target 2 will be output.
  • the student network compares and analyzes soft target 1 and soft target 2, and obtains the loss function Fn1.
  • the X is processed and calculated by the student network, and after being processed by the Softmax3 function, the hard target 1 will be output, and the student network will compare and analyze the hard target 1 and the hard target 2 to obtain a loss function Fn2; wherein the hard target 2 is X the default output result.
  • the loss functions Fn1 and Fn2 are used to indicate the degree of deviation between the soft target 1 and the soft target 2, and the hard target 1 and the hard target 2.
  • the student network adjusts the student network according to the loss functions Fn1 and Fn2, so that after the student network processes and calculates X, the loss functions Fn1 and Fn2 are reduced to a certain preset value, so that the input of the same X is the output of the student network. It is close to the output of the teacher network and achieves the purpose of training the student network through distillation of the teacher model. In this way, the teacher network cultivates the generalization ability of the student network, and realizes the transfer of "knowledge" from the teacher network to the student network.
  • the process of training a neural network of a certain function type by the teacher network on the student network is as follows:
  • the teacher network is a trained neural network, which can realize various functions such as image recognition and traffic recognition.
  • the teacher network can realize image recognition and perform model distillation, so that the student network can realize the function of image recognition as an example.
  • the input X of the teacher network and the student network is an example of a car picture.
  • X is input to the teacher network, and the X is processed by relevant algorithms in each network layer in the teacher network.
  • the processing process is mainly to compare X with each preset category in the teacher network.
  • the preset categories may be different types of things from the input X, things with a large difference in appearance and shape from the input X, or other categories such as things with different attributes from the input X.
  • the preset category of the teacher network is taken as an example of different types of things from the input X.
  • the preset categories in the teacher network include four categories: vehicles, trains, airplanes, and ships.
  • the preset categories in the teacher network are Let the categories be denoted as X 1 , X 2 , X 3 and X 4 , respectively.
  • the teacher network compares the input X with the preset categories X 1 , X 2 , X 3 and X 4 respectively, and then outputs the similarity between X and X 1 , X and X 2 , X and X 3 and X and X 4 respectively. degrees Y 1 , Y 2 , Y 3 and Y 4 .
  • the teacher network Since the teacher network is a trained neural network, the teacher network will determine that X and X 1 belong to the same category (correspondingly, cars and vehicles belong to the same category). Therefore, the similarity between X and X 1 output by the teacher network is the highest, that is, the value of Y 1 is the highest (close to 100%); X and X 2 , X and X 3 , and X and X 4 output by the teacher network
  • the similarity is very low, that is, the values of Y 2 , Y 3 and Y 4 are extremely low (close to 0%), the value gap between the Y 1 and the Y 2 , Y 3 and Y 4 is too large, the teacher network
  • the output information is almost only the information of Y 1 , and the information of Y 2 , Y 3 and Y 4 is almost 0, so the amount of information output by the teacher network is less.
  • the Y 1 , Y 2 , Y 3 and Y 4 are processed through the Softmax 1 function to narrow the numerical gap between Y 1 and Y 2 , Y 3 and Y 4 .
  • the expression of the Softmax 1 function is:
  • Z i is the output information of the teacher network after Y i is processed by the Softmax 1 function
  • T is the distillation temperature information (the larger the T, the smoother the output Z i distribution, which is equivalent to a smooth function, and plays a role in preserving similarity The role of information; if T approaches infinity, it is a uniform distribution), and the value of T is obtained based on the difference in the network structure of the teacher network and the student network.
  • Z 1 , Z 2 , Z 3 and Z 4 are respectively output, and the distributions of the Z 1 , Z 2 , Z 3 and Z 4 are relatively Flattened (Z 1 has the highest value, and Z 2 , Z 3 and Z 4 have a narrower gap with the Z 1 value).
  • Z 1 has the highest value
  • Z 2 , Z 3 and Z 4 have a narrower gap with the Z 1 value.
  • the teacher network is a trained neural network, and the similarity between the output X of the teacher network and its similar category has a high degree of confidence (in this embodiment, the car and the vehicle have a high degree of similarity), so Makes Y 1 and Y 2 , Y 3 , Y 4 have a great numerical difference.
  • the teacher network will determine that the similarity between the car picture and the vehicle category is very high, and the output Y 1 of the teacher network may be 95%; the teacher network determines The car picture has low similarity to trains, planes and boats, the teacher network output Y 2 , Y 3 and Y 4 may be 3%, 1%, 1% respectively, in this case the teacher network input
  • the "knowledge" of X being aligned with the categories X 1 , X 2 , X 3 and X 4 is difficult to transfer to the student network because their similarity is almost zero and has little information. Training the student network in this way is not conducive to strengthening the generalization ability of the student network and affects the effect of distillation training.
  • the teacher network processes Y 1 , Y 2 , Y 3 and Y 4 through the Softmax 1 function, which can make the numerical distribution of Y 1 , Y 2 , Y 3 and Y 4 smoother, and increase Y 2 , Y 3 and Y 4
  • the amount of information that the teacher network learns by comparing X with X 1 , X 2 , X 3 and X 4 is transferred to the student network, thereby enhancing the generalization ability of the student network.
  • X is input to the student network, and the X is processed by relevant algorithms in each network layer in the student network, and compared with each preset category in the student network.
  • the student network compares the input X with the preset categories X 1 , X 2 , X 3 and X 4 respectively, and then outputs the similarity between X and X 1 , X and X 2 , X and X 3 and X and X 4 respectively. degrees H 1 , H 2 , H 3 and H 4 .
  • the student network Since the student network is a neural network to be trained, the student network has insufficient ability to compare the input X with its preset category, so that the similarity between the output X of the student network and its preset category is very different from the preset result (for example, H4 The values are extremely large, close to 100 %; the values for H2 , H3 , and H1 are extremely small, close to 0). Similar to the teacher network, because the difference between the values of H 4 and H 2 , H 3 and H 1 is too large, the information output by the student network is almost only the information of H 4 , and the amount of information of H 2 , H 3 and H 1 almost 0.
  • H 1 , H 2 , H 3 and H 4 are processed through the Softmax 2 function to narrow the numerical gap between H 4 and H 2 , H 3 and H 1 , thereby improving the student
  • the amount of information output by the network; among them, the distillation temperature information T is the same as that used by the teacher network.
  • M 1 , M 2 , M 3 and M 4 are respectively output, and the distributions of the M 1 , M 2 , M 3 and M 4 are relatively Flat (the value of M 4 is the highest, but the gap between the values of M 2 , M 3 and M 1 and the value of M 4 is narrowed), the M 1 , M 2 , M 3 and M 4 are the soft outputs of the student network. Goal 2.
  • the student network compares the soft target 2 output by the student network with the soft target 1 output by the teacher network, and obtains a loss function Fn1 based on the degree of deviation between the soft target 2 and the soft target 1.
  • the student network processes the output H 1 , H 2 , H 3 and H 4 through the Softmax 3 function to obtain B 4 .
  • B 4 obtained by processing the H 1 , H 2 , H 3 and H 4 through the Softmax 3 function only contains the information of H 4
  • the B 4 is a student network Hard target for output 1.
  • the student network compares and analyzes the output hard target 1 and hard target 2, and obtains a loss function Fn2 based on the degree of deviation between the hard target 1 and the hard target 2.
  • the hard target 2 is a preset result that should be obtained after inputting X, and the preset relationship between X and the hard target 2 is stored in the student network as local data.
  • the student network adjusts its neural network model based on the loss functions Fn1 and Fn2 to reduce the deviation of the output information of the student network and the teacher network, thereby reducing the values of the loss functions Fn1 and Fn2, thereby enhancing the generalization ability of the student network and realizing the teacher network. "Knowledge" transfer of the network.
  • FIG. 3 is a schematic diagram of the architecture of a model distillation training system provided by an embodiment of the present application.
  • the second device is a radio access network (Radio Access Network, RAN)
  • the first device is a user equipment (User Equipment, UE)
  • the RAN accesses multiple user equipments.
  • Radio Access Network Radio Access Network
  • UE User Equipment
  • the large network deployed by the RAN is the first reference neural network model
  • the first reference neural network model is a model of all categories
  • the categories in the first reference neural network model include: 3D video, high-definition video transmission and enhancement Augmented Reality (AR)/Virtual Reality (VR);
  • AR Augmented Reality
  • VR Virtual Reality
  • the small network deployed by UE1 is the first neural network model, and the categories in the first neural network model are included in the first reference neural network model A subset of categories, including: 3D video and HD video transmission.
  • the UE1 distillation trains the first neural network model so that the first neural network model has the function of traffic identification as an example.
  • the steps of UE1 distillation to train the first neural network model are as follows:
  • the UE1 sends a first training request to the RAN, where the first training request includes a first training type ID, where the first training type ID is used to indicate that the functional type of the first neural network model trained by the UE1 distillation is traffic identification.
  • the RAN receives the first training request, and compares the first training type ID with the second training type ID in the RAN. If the two are consistent, the RAN supports the UE1 distillation to train the first neural network model, and send a first response to UE1 according to the second training type ID.
  • the first response includes first storage information, where the first storage information is the size of the memory or storage space occupied by the first reference neural network model with the traffic identification function.
  • the UE1 sends the first inquiry information to the RAN according to the first response, asking whether the RAN supports the UE1 to perform distillation training on the first neural network model, so that the first neural network model has the traffic identification function, and reports the distillation training ability of the UE1 .
  • RAN sends a second response to UE1 based on the first query information, which is used to inform UE1 that: RAN supports UE1 distillation to train the first neural network model, so that it can obtain the traffic identification function; the method of supporting distillation training is RAN to UE1 Send the first indication information to instruct UE1 to train the first neural network model by distillation.
  • UE1 designs the first neural network model.
  • the UE1 sends the first configuration information to the RAN, and the RAN configures the first reference neural network model based on the first configuration information.
  • the first configuration information includes distillation temperature information T and knowledge type KnowledgeType; the distillation temperature information T is obtained according to the first reference neural network model deployed by RAN and the first neural network model designed and deployed by UE1, or can also be Obtained according to an empirical library or other methods, the T is used to distill out the information obtained by comparative analysis of the traffic data to be identified in the first reference neural network model and the preset traffic category in the first reference neural network model.
  • the RAN configures a normalized Softmax(T) function according to the distillation temperature information T in the first configuration information, and learns from the knowledge type KnowledgeType that the type of the first indication information is the output after the first reference neural network model performs traffic identification. soft target. Then, the first indication information is sent to UE1 (in this embodiment, the knowledge type KnowledgeType is the soft target Soft target output by the first reference neural network model as an example).
  • UE1 sends first data information to the RAN, where the first data information includes first sample data; the first sample data is required for UE1 to perform distillation training for traffic identification on the first neural network model training data.
  • the first sample data includes an index for indicating each piece of traffic data.
  • the first sample data includes: traffic data of 3D movies, traffic data of movie transmission, and traffic data of human-computer interaction. The traffic data of movie transmission and the traffic data of human-computer interaction.
  • the RAN processes the first sample data through the first reference neural network model to obtain first indication information, and sends the first indication information to UE1; the first indication information is the first indication information Refer to the information obtained in the process of identifying the traffic based on the first sample data by the neural network model.
  • the process of obtaining the first indication information by the RAN is as follows: the RAN inputs the first sample data information (the traffic data of the 3D movie, the traffic data of the movie transmission, and the traffic data of the human-computer interaction) into the first reference neural network model, so The first reference neural network model described above will compare the traffic data of 3D movies, the traffic data of movie transmission and the traffic data of human-computer interaction with the three data of 3D video, high-definition video transmission, AR/VR in the first reference neural network model respectively.
  • the preset categories are compared and analyzed, and the similarities A1, A2 and A3 between the traffic data of 3D movies and the three preset categories of 3D video, HD video transmission and AR/VR are output; the traffic data of movie transmission is similar to 3D video, Similarity B1, B2 and B3 of the four preset categories of HD video transmission and AR/VR; similarity between the traffic data of human-computer interaction and the three preset categories of 3D video, HD video transmission and AR/VR C1, C2 and C3. Since the first reference neural network model is a trained neural network model, it can accurately identify which preset category the input traffic data has the highest similarity to. Therefore, among A1, A2 and A3, the value of A1 is the highest and close to 100.
  • the value of A2 and A3 is very small, close to 0; in B1, B2 and B3, the value of B2 is up to 100%, and the value of B1 and B3 is very small, close to 0; in C1, C2 and C3 , the value of C3 is up to 100%, and the values of C1 and C2 are very small, close to 0.
  • A1, A2 and A3 are used as the normalized Softmax(T)
  • the input of the function, outputs a1, a2 and a3, the a1, a2 and a3 are the soft target 1 output by the first reference neural network model, and the index Index 1 is used to indicate the soft target 1 (the index Index of the soft target is the same as the first one. (corresponding to the index Index of the traffic data in a sample data message).
  • the UE1 receives the first indication information, and uses the first indication information to distill and train the first neural network model for the traffic identification function of each category included in it.
  • the specific process for UE1 to perform distillation training on the first neural network model by using the first indication information is as follows: UE1 converts the first sample data information (flow data of 3D movies, flow data of movie transmission, and human-computer interaction) The flow data) is input into the first neural network model, and the first neural network model will compare the flow data of 3D movies, the flow data of movie transmission and the flow data of human-computer interaction with the 3D video in the first neural network model respectively.
  • the first neural network model may think that the traffic data of 3D movies is highly similar to the traffic data of HD video transmission category, the traffic data of movie transmission is highly similar to the traffic data of 3D video category, the traffic data of human-computer interaction is highly similar to HD video traffic data
  • the traffic data of the video transmission category is highly similar. Therefore, D2 has a high value, close to 100%, D1 has a very low value, close to 0; E1 has a high value, close to 100%, E2 has a very low value, close to 0; F2 has a high value, Close to 100%, the value of F1 is very low, close to 0.
  • D1 and D2 are used as the input of the normalized Softmax(T) function.
  • the distillation temperature information T in the Softmax function is consistent with the T used by the first reference neural network model
  • output d1 and d2 are the soft target 4 output by the first neural network model
  • use the index Index 1 to use to indicate soft target 4 (the index Index of the soft target corresponds to the index Index of the traffic data in the first sample data information).
  • the UE1 compares and analyzes the soft target 4 with the soft target 1 in the first indication information, compares and analyzes the soft target 5 with the soft target 2 in the first indication information, and compares the soft target 6 with the soft target 2 in the first indication information.
  • the soft target 3 in the indication information is compared and analyzed, so as to obtain the difference between the output result of traffic identification performed by the first neural network model and the output result of traffic identification performed by the first reference neural network model, and the loss function Fn1 is obtained.
  • the information of E1, F3' only contains the information of F3), the D2', E1' and F3' are respectively hard target 1, hard target 2 and hard target 3 output by the first neural network model. Then, compare and analyze the hard target 1 and the preset result 1, compare and analyze the hard target 2 and the preset result 2, and compare and analyze the hard target 3 and the preset result 3 to obtain a loss function Fn2.
  • the preset result 1, the preset result 2 and the preset result 3 are the traffic data of the preset category corresponding to the traffic data information in the first data information stored in the UE1.
  • preset result 1 is that the traffic data identification of 3D movies belongs to the 3D video traffic category
  • preset result 2 is that the traffic data of movie transmission belongs to the traffic category of high-definition video transmission
  • preset result 3 is that the traffic data of human-computer interaction belongs to AR/ Traffic class for VR.
  • the output result of the first neural network model is: identifying the traffic data of a 3D movie as the traffic of the AR/VR category
  • UE1 adjusts the first neural network model based on the loss functions Fn1 and Fn2 to reduce the values of the loss functions Fn1 and Fn2, thereby strengthening the generalization ability of the first neural network model for traffic identification, and realizing UE1's ability to recognize the first neural network model. distillation training.
  • the UE1 sends a first training result notification to the RAN, which is used to notify the RAN that the distillation training of the first neural network model is completed, and feed back the information of the first neural network model whose distillation training is completed.
  • the RAN releases the channel resources for the distillation training communication connection with UE1.
  • the RAN uses the first configuration information sent by the UE1 to configure the first reference A neural network model; wherein, the first reference neural network model is a trained neural network model.
  • the RAN uses the first data information to perform arithmetic processing on the first sample data in the first data information through the configured first reference neural network model to obtain the first reference neural network model.
  • Indication information where the first indication information is "knowledge" obtained by distillation training of the first reference neural network model.
  • the RAN sends the first indication information to the UE1, and the UE1 uses the first indication information to train the first neural network model designed by the UE1.
  • the RAN guides the distillation training of UE1 by transferring "knowledge”.
  • This method enables RAN and UE1 to perform separate distillation training, enables UE1 to design and train a personalized small network, achieves a balance between the size of the neural network model and the accuracy of the neural network model, improves the efficiency of distillation training, and saves UE1 storage. space.
  • FIG. 4 is a schematic flowchart of a model distillation training provided by an embodiment of the present application.
  • the first device Before the first device performs distillation training on the neural network model, it establishes a communication connection for distillation training with the second device, and through the interaction of relevant distillation training information with the second device, the transfer of "knowledge" of the second device is realized, thereby guiding the first device. Distill training the neural network model.
  • the steps of performing distillation training on the neural network model by the edge network element are as follows:
  • Step S401 The edge network element sends a first training request to the central network element.
  • the edge network element needs to establish a distillation communication connection with the central network element before the distillation training of the neural network model.
  • the edge network element first sends a first training request to the central network element, where the first training request is used for the edge network element to inform the central network element that the edge network element has a demand for distillation training a neural network model of a certain functional type.
  • the first training request includes a first training type ID, where the first training ID is used to indicate the function type of the edge network element distillation training neural network model; the function type includes image recognition, face recognition, and the like.
  • Step S402 The central network element sends a first response to the edge network element.
  • the central network element compares the first training type ID with the second training type ID of the central network element. If the first training type ID and the second training type ID If they are consistent, it means that the central network element has a reference neural network model that supports distillation training of edge network elements, and the reference neural network model has the function type indicated by the first training type ID.
  • the second training type ID is the function type of the central network element that supports distillation training.
  • the central network element finds a first reference neural network model from a plurality of reference neural network models deployed by the central network element based on the second training type ID, where the first reference neural network model has the function type indicated by the second training type ID.
  • the central network element sends a first response to the edge network element, where the first response includes the ID of the first reference neural network model and the first storage information.
  • the first storage information is used to indicate the size of the storage space occupied by the first reference neural network model.
  • the central network element does not have a reference neural network model corresponding to the first training type ID. Then, the central network element cannot perform distillation by referring to the neural network model according to the first data information sent by the edge network element to obtain the first indication information. Since the edge network element lacks the first indication information, the edge network element cannot use the first indication information for distillation. The neural network model is trained, and further, the central network element cannot guide the edge network element to distill and train the neural network model.
  • the first training type ID is consistent with the second training type ID as an example.
  • Step S403 The edge network element sends first inquiry information to the central network element based on the first storage information.
  • the first inquiry information includes first distillation inquiry information and first distillation capability information.
  • the first distillation query information is sent by the edge network element to the central network element based on the first storage information, specifically: when the resource storage and other capabilities of the edge network element exceed a certain threshold, the threshold is higher than the first storage information.
  • the first distillation inquiry information is used to inquire whether the central network element supports the distillation of the edge network element by sending the first indication information to the edge network element.
  • the first distillation inquiry information is used to inquire whether the central network element supports the The distillation training of the central network element obtains a small network, and the method of delivering the small network to the edge network element supports the distillation training of the edge network element.
  • This embodiment takes as an example that a small network can be designed when the resource storage and other capabilities of the edge network element exceed a certain threshold. Therefore, the first distillation inquiry information is used to inquire whether the central network element supports the distillation training of the edge network elements, and in what manner to support the training.
  • the first distillation capability information is used to indicate the capability of the edge network element for distillation training, and includes one or more of the following: the design capability of the neural network model, the distillation training capability of the neural network model, and whether the edge network element has distillation training data information.
  • the design capability of the neural network model is used to indicate whether the edge network element has the capability to independently design a small neural network model.
  • the distillation training capability of the neural network model is used to indicate whether the edge network element has the capability of distillation training the first neural network model under the condition that the edge network element can independently design the first neural network model.
  • the distillation training capability of the neural network model includes one or more of the following: whether the edge network element supports a training package for algorithm training, whether it has a GPU chip and a tensor core of the GPU chip, the number of floating-point operations per second, memory bandwidth, The main frequency size of the CPU, the cache size of the CPU, the architecture category, and hardware capabilities such as storage space.
  • the distillation training data is the first data information of the edge network element, which is used by the central network element to obtain the first indication information based on the first sampling data in the first data information, and guide the edge network element to perform distillation training.
  • the embodiments of the present application take as an example that the edge network element has the ability to design a neural network model, the ability of distillation training of the neural network model, and the information of the distillation training data.
  • Step S404 The central network element generates a second response based on the first distillation query information, and sends the second response to the edge network element.
  • the central network element after receiving the first distillation query information, the central network element sends a second response to the edge network element according to its own parameters such as storage resources and hardware performance.
  • the second response is used to indicate that the central network element supports the Distillation training for edge network elements, and supports edge network elements for distillation training by sending first indication information to the edge network elements.
  • Step S405 The edge network element designs the first neural network model.
  • the edge network element uses the first reference neural network model
  • the structural information of is used as a reference, and a small network is designed, and the small network is the first neural network model.
  • the edge network element if the edge network element needs to refer to the structural information of the first reference neural network model when designing the first neural network model, and the structural information of the first reference neural network model is not loaded, the edge network element sends the information to the central network.
  • the element sends a first network structure request, where the first network structure request is used by the edge network element to request the central network element to provide structure information of the first reference network structure model.
  • the central network element sends a first structure request response to the edge network element based on the first network structure request, where the first structure request response includes structure information of the first reference neural network model.
  • the edge network element designs a small network with reference to the structural information of the first reference neural network model, and the small network is the first neural network model.
  • Step S406 The edge network element sends the first configuration information to the central network element.
  • the first configuration information sent by the edge network element to the central network element is used for the central network element to configure the first reference neural network model
  • the first configuration information includes first distillation temperature information
  • the first distillation temperature information is The temperature parameter T of the normalized exponential function Softmax
  • the temperature parameter T is obtained according to the size difference between the first neural network model and the first reference neural network model, and is used to improve the amount of information output by the first reference neural network model after distillation .
  • the first configuration information also includes the knowledge type KnowledgeType and the ID of the first reference neural network model, where the knowledge type KnowledgeType is used to indicate the type of the first indication information required by the edge network element, wherein the type of the first indication information includes: The soft target output by the first reference neural network model, the intermediate output feature information of the first reference neural network model, the relationship between network layers in the first reference neural network model, and the like.
  • the ID of the first reference neural network model is used to associate the first configuration information with the first reference neural network model, so as to avoid confusion with other neural network models of the central network element.
  • Step S407 The edge network element sends the first data information to the central network element.
  • the first data information sent by the edge network element to the central network element is used by the central network element to process the first data information through the first reference neural network model to obtain the first indication information.
  • the first data information includes first sample data, and the first sample data is the sample data used by the first reference neural network model for distillation training; the first sample information also includes the sample data index Index and the first sample data.
  • An ID of the reference neural network model is the label of each piece of first sample data, used to indicate the first sample data; the ID of the first reference neural network model is used to associate the first configuration information with the first reference neural network
  • the network model is associated to avoid confusion with other neural network models of the central network element.
  • Step S408 The central network element performs calculation processing on the first sample data through the first reference neural network model to obtain first indication information, and sends the first indication information to the edge network element.
  • the first indication information includes the distillation knowledge DistillationKnowledge and the ID of the first reference neural network model, where the first indication information is used to instruct the edge network element to train the first neural network model.
  • the central network element inputs the first sample data into the first reference neural network model one by one according to the Index for calculation and processing.
  • the specific process is as follows: the inputted first sample data is respectively the same as the preset category in the first reference neural network model. A comparative analysis is performed to obtain first indication information, and the sample data index Index and the distillation knowledge DistillationKnowledge are used.
  • the first indication information is sent to the edge network element.
  • the ID of the first reference neural network model is used to associate the distillation knowledge DistillationKnowledge with the first reference neural network model, so as to avoid confusion with other neural network models of the central network element.
  • Step S409 The edge network element uses the first indication information to perform distillation training on the first neural network model.
  • the process that the edge network element uses the first indication information to perform distillation training on the first neural network model is as follows: the edge network element inputs the first sample data into the first neural network model one by one according to the Index, and then Comparing and analyzing the first sample data with the preset categories in the first neural network model, respectively, to obtain the similarity between the first sample data and the preset categories.
  • the edge network element outputs the soft target through the softmax function (the temperature parameter T configured by the Softmax function of the first neural network model is the same as the temperature parameter T configured by the first reference neural network model Softmax), and the The soft target is compared and analyzed with the soft target output by the first reference neural network model, and the loss function Fn1 is obtained.
  • the edge network element adjusts the first neural network model based on Fn1 and Fn2 to reduce the values of the loss functions Fn1 and Fn2, enhance the generalization ability of the first neural network model, and realize the distillation training of the first neural network model by the edge network element.
  • the process of performing distillation training on the first neural network model by the edge network element using the first indication information is as follows: : The edge network element inputs the first sample data into the first neural network model one by one according to the Index, and when the first sample data is obtained and the preset category of the first neural network model is compared and analyzed, the middle of the first neural network model The second feature information output by the layer.
  • the edge network element adjusts the parameters of the first neural network model based on Fn3 to reduce the value of the loss function Fn3 and enhance the first neural network model.
  • the generalization ability of the edge network element realizes the distillation training of the first neural network model.
  • the edge network element uses the first indication information to The process of distillation training of the model is as follows: the edge network element inputs the first sample data into the first neural network model one by one according to the Index, and then compares the first sample data with the preset categories in the first neural network model respectively. Through analysis, the relationship of each network layer in the first neural network model is obtained, and the relationship between each network layer in the first neural network model and the first indication information are compared and analyzed to obtain a loss function Fn4. The edge network element adjusts the parameters of the first neural network model based on Fn4 to reduce the value of the loss function Fn4, enhance the generalization ability of the first neural network model, and realize the distillation training of the first neural network model by the edge network element.
  • Step S410 The edge network element sends a first training result notification to the central network element.
  • a first training result notification is sent to the central network element, and the first training result notification is used to inform the central network element whether the distillation training is completed;
  • the information of the first neural network model trained by the edge network element is fed back to the central network element (for example, whether the edge network element receives the first neural network model trained by distillation, etc.).
  • the training of the first neural network model by the edge network element cannot be completed in one training, and steps S407 to S407 are often repeated.
  • S409 Whether the distillation training of the edge network element is completed is determined by the convergence of the loss function Fn of the edge network element in the distillation training or reaching the preset maximum number of iterations. For example, when the loss function Fn of the edge network element in the distillation training reaches the preset value y, it can be considered that the first neural network model of the edge network element distillation training reaches the required accuracy, then the distillation training of the edge network element has been completed. . Or, when the distillation training has reached the preset number of edge network elements, the distillation training is ended.
  • Step S411 The central network element releases the channel resources for the distillation training communication connection with the edge network element.
  • the central network element judges whether the distillation training of the edge network element has been completed; if the distillation training of the edge network element has been completed, the gap between the central network element and the edge network element is released.
  • the communication connection resources for training are distilled to avoid occupying other channel resources of central network elements and edge network elements.
  • the first configuration information of configures a first reference neural network model; wherein, the first reference neural network model is a trained neural network model.
  • the central network element uses the first data information, the first sample data in the first data information is subjected to operation processing through the configured first reference neural network model to obtain The first indication information, where the first indication information is "knowledge" obtained by distillation training of the first reference neural network model.
  • the central network element sends the first instruction information to the edge network element, and the edge network element uses the first instruction information to train the first neural network model designed by the edge network element.
  • the central network element guides the distillation training of edge network elements by transferring "knowledge”.
  • This method enables the central network element and the edge network element to perform separate distillation training, enables the edge network element to design and train a personalized small network, achieves a balance between the size of the neural network model and the accuracy of the neural network model, and improves the efficiency of distillation training. , saving the storage space of edge network elements.
  • FIG. 5 is a schematic flowchart of another model distillation training provided by an embodiment of the present application.
  • the first device Before the first device sends the first configuration information to the second device, it needs to send the first attention category list to the second device, and the first indication information received by the first device is filtered by the second device based on the first attention category list out information.
  • the steps of performing distillation training on the neural network model by the edge network element are as follows:
  • steps S501 to S505 please refer to steps S401 to S405, which will not be repeated here.
  • Step S506 The edge network element sends the first attention category list to the central network element.
  • the first attention category list sent by the edge network element to the central network element includes the set of categories concerned by the distillation training of the edge network element; wherein, the set of categories concerned by the distillation training of the edge network element is the first category list A subset of the medium category set; wherein, the first category list includes a set of preset categories of the first reference neural network model.
  • the edge network element can obtain the first indication information corresponding to the category concerned by the distillation training of the edge network element, which is beneficial to realize the personalized training of the first neural network model of the edge network element.
  • steps S507 to S508 please refer to steps S406 to S407, which will not be repeated here.
  • Step S509 The central network element performs calculation processing on the first sample data through the first reference neural network model, and filters the processed first sample data based on the first attention category list to obtain the first indication information, and uses The first indication information is sent to the edge network element.
  • the central network element inputs the first sample data into the first reference neural network model one by one according to the Index, and the first sample data is compared with the preset category in the first reference neural network model. Comparing and analyzing to obtain indication information, the central network element, based on the categories in the first concerned category list, selects the indication information corresponding to the categories in the first concerned category list from the indication information to obtain the first indication information, and then: The central network element sends the first indication information to the edge network element.
  • steps S510 to S512 please refer to steps S409 to S411, which will not be repeated here.
  • the edge network element sends the first attention category list to the central network element, which is used to instruct the central network element to obtain the first sample data through the first reference neural network model
  • the indication information is filtered based on the concerned categories in the first concerned category list.
  • the central network element filters the indication information obtained by comparative analysis of the first sample data and the preset categories in the first reference neural network model, and the obtained first
  • the indication information is more in line with the requirements of the distillation training of edge network elements, further reduces communication overhead without affecting the effect of distillation training of the first neural network model, saves network resources for separate distillation training, and saves the storage space of edge network elements.
  • FIG. 6 is a schematic flowchart of another model distillation training provided by an embodiment of the present application.
  • a distillation training communication connection between the first device and the second device should be established through the third device.
  • the first device is an edge NWDAF
  • the second device is a central NWDAF
  • the third device is a network storage function (Network Repository Function, NRF) as an example.
  • NRF Network Repository Function
  • Step S601 The central NWDAF sends registration information to the NRF.
  • the registration information includes a third training type ID, which is used to indicate the functional type of the neural network model that the central NWDAF supports distillation training; and a third neural network model ID, which is used to indicate the third training type ID.
  • the corresponding second reference neural network model, the second reference neural network model has the function indicated by the third training type ID; the second storage information is used to indicate the storage space occupied by the second reference neural network model. Size; training response information, used to indicate whether the central NWDAF supports distillation to train the neural network model and if it supports distillation to train the neural network model, the way to support the distillation to train the neural network model.
  • the registration information further includes a second category list, where the second category list includes a set of central NWDAF preset categories; the ID of the central NWDAF is used to indicate the device number of the central NWDAF, so as to avoid multiple central NWDAFs from registering with the NRF cause confusion.
  • the central NWDAF sends registration information to the NRF, which is beneficial for the NRF to match the central NWDAF and the edge NWDAF based on the registration information, and then establish a distillation training communication connection between the central NWDAF and the edge NWDAF.
  • Step S602 The edge NWDAF sends a second training request to the network NRF.
  • the second training request includes a fourth training type ID, which is used to indicate the function type of the neural network model to be distilled and trained by the edge NWDAF;
  • the second query information is used for the edge NWDAF to query the NRF, whether the center NWDAF supports distillation training , and in the case of supporting distillation training, the method for supporting distillation training;
  • the second distillation capability information is used to indicate the capability of edge NWDAF distillation training, and the second distillation capability information includes one or more of the following: edge NWDAF neural The design ability of the network model, the training ability of the neural network model, and whether it has distilled training data.
  • the second training request also includes an exclusion set, and the exclusion set includes the ID of the central NWDAF that fails to establish a distillation training communication connection with the edge NWDAF.
  • the NRF does not Then, a communication connection for distillation training is established for the center NWDAF and the edge NWDAF.
  • Step S603 The NRF generates a third response based on the second training request, and sends the third response to the edge NWDAF.
  • the NRF determines whether the third training type ID is consistent with the fourth training type ID, and if so, generates a third response (in this embodiment, the third training type ID and the fourth training type ID are used Consistent as an example), and send the third response to the edge NWDAF.
  • the third response includes the training response information, the third neural network model ID, the second storage information and the second category list, and also includes the ID of the central NWDAF.
  • Step S604 The edge NWDAF sends a distillation notification to the NRF.
  • the neural network model is trained according to the training response information in the third response (in this embodiment, the training response information is used to indicate that the center NWDAF supports the edge NWDAF distillation, and Taking the method of sending instruction information to the edge NWDAF to guide the edge NWDAF distillation training neural network model as an example) to learn that the center NWDAF supports distillation training and distillation training; According to the third neural network model ID and the second storage information to know The information of the second reference neural network model and the size of the storage space occupied by the second reference neural network model.
  • the edge NWDAF establishes a communication connection for distillation training with the central NWDAF according to the ID of the central NWDAF based on the information of the central NWDAF obtained by the third response, and sends a distillation notification to the NRF after the communication connection is established.
  • the distillation notification is used to inform the NRF that the connection between the edge NWDAF and the center NWDAF is successful. Then, the NRF releases the communication resources between the NRF and the edge NWDAF and the center NWDAF according to the distillation notification.
  • steps S605 to S611 please refer to steps S405 to S411, which will not be repeated here.
  • the center NWDAF registers the training type ID and training response information on the NRF, the edge NWDAF sends a training request to the NRF, and the NRF can quickly and effectively establish a distillation communication connection between the edge NWDAF and the center NWDAF, thereby improving distillation training.
  • the efficiency is beneficial to the edge NWDAF distillation to train the neural network model and receive the storage space of the edge NWDAF.
  • FIG. 7 is a schematic flowchart of another model distillation training provided by an embodiment of the present application.
  • the first device Before sending the second configuration information to the second device, the first device sends a second category of interest list to the second device, and the second indication information received by the first device is filtered by the second device based on the second category of interest list Information.
  • the steps of performing distillation training on the neural network model by the edge NWDAF are as follows:
  • steps S701 to S705 please refer to steps S601 to S605, which will not be repeated here.
  • Step S706 The edge NWDAF sends the second attention category list to the central NWDAF.
  • the second attention category list sent by the edge NWDAF to the center NWDAF includes a set of categories concerned by the edge NWDAF distillation training; wherein, the set of categories concerned by the edge NWDAF distillation training is the set of categories in the second category list. Subset; wherein, the second category list includes a set of preset categories of the second reference neural network model.
  • the edge NWDAF can obtain the second indication information corresponding to the category concerned by the distillation training of the edge NWDAF, which is beneficial to realize the personalized training of the second neural network model by the edge NWDAF.
  • steps S707 to S708 please refer to steps S606 to S607, which will not be repeated here.
  • Step S709 The central NWDAF calculates and processes the second sample data through the second reference neural network model, and filters the processed second sample data based on the second category of interest list to obtain second indication information, and the second indication Information is sent to edge NWDAF.
  • the central NWDAF inputs the second sample data in the second data information into the second reference neural network model one by one according to the Index, and compares and analyzes the second sample data with the preset category in the second reference neural network model to obtain the indication information , the central NWDAF selects the indication information related to the categories in the second attention category list from the indication information based on the categories in the second attention category list, and obtains the second indication information, and then the central NWDAF selects the Two indication messages are sent to the edge NWDAF.
  • steps S710 to S712 please refer to steps S609 to S611, which will not be repeated here.
  • the edge NWDAF sends the second attention category list to the central NWDAF, which is used to instruct the central NWDAF to base the indication information obtained from the second sample data through the second reference neural network model on the first 2. Filter the following categories in the following categories list.
  • the central NWDAF will filter the indication information obtained by comparative analysis between the second sample data and the preset categories in the second reference neural network model, and the obtained second indication information is more It meets the needs of edge NWDAF distillation training, further reduces communication overhead without affecting the effect of distillation training of the second neural network model, saves network resources for separate distillation training, and thus saves NWDAF storage space.
  • FIG. 8 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 8 may be the first device in the above method embodiment.
  • the model distillation training device 8 It may include a first neural network model design unit 801, a first configuration information sending unit 802, a first data information sending unit 803, a first instruction information receiving unit 804 and a first distillation training unit 805, wherein the detailed description of each unit is as follows :
  • the first neural network model design unit 801 is configured to design the first neural network model if the second device supports the distillation training neural network model of the first device;
  • a first configuration information sending unit 802 configured to send the first configuration information to the second device
  • a first data information sending unit 803, configured to send first data information to the second device
  • a first indication information receiving unit 804 configured to receive the first indication information returned by the second device
  • the first distillation training unit 805 is configured to use the first indication information to train the first neural network model.
  • model distillation training device 8 further includes:
  • a first training request sending unit used for sending a first training request to the second device before the first neural network model design unit 801 designs the first neural network model
  • the first response receiving unit is used for, after the first training request sending unit sends the first training request to the second device, if the second training type ID is consistent with the first training type ID, receive the sent by the second device the first response;
  • a first inquiry information sending unit configured to send first inquiry information to the second device based on the first storage information
  • the second response receiving unit is configured to receive the second response of the second device after the first query information sending unit sends the first query information to the second device.
  • model distillation training device 8 further includes:
  • the first attention category list sending unit is configured to send the first attention category list to the second device before the first configuration information sending unit 802 sends the first configuration information to the second device.
  • model distillation training device 8 further includes:
  • a first network structure request sending unit used for sending a first network structure request to the second device before the first neural network model design unit 801 designs the first neural network model
  • the first structure request response receiving unit is configured to receive the first structure request response sent by the second device after the first network structure request sending unit sends the first network structure request to the second device.
  • FIG. 9 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 9 may be the second device in the above method embodiment.
  • the model distillation training device 9 It may include a first configuration information receiving unit 901, a first parameter configuration unit 902, a first data information receiving unit 903 and a first indication information sending unit 904, wherein the detailed description of each unit is as follows:
  • a first configuration information receiving unit 901 configured to receive the first configuration information sent by the first device
  • a first parameter configuration unit 902 configured to configure a first reference neural network model by using the first configuration information
  • a first data information receiving unit 903 configured to receive the first data information sent by the first device
  • the first indication information sending unit 904 is configured to send the first indication information to the first device.
  • model distillation training device 9 further includes:
  • a first training request receiving unit configured to receive a first training request sent by the first device before the first parameter configuration unit 902 configures the first reference neural network model
  • a first response sending unit configured to generate a first response based on the first training request if the second training type ID is consistent with the first training type ID, and send the first response to the first device;
  • a first inquiry information receiving unit configured to receive the first inquiry information sent by the first device
  • the second response sending unit is configured to send the second response to the first device after the first query information receiving unit receives the first query information sent by the first device.
  • model distillation training device 9 further includes:
  • the first attention category list receiving unit is configured to receive the first attention category list sent by the first device before the first configuration information receiving unit receives the first configuration information sent by the first device.
  • model distillation training device 9 further includes:
  • a first network structure request receiving unit configured to receive the first network structure request sent by the first device before the first configuration information receiving unit 901 receives the first configuration information sent by the first device;
  • a first structure request response sending unit configured to send a first structure request response to the first device.
  • FIG. 10 is a schematic structural diagram of a model distillation training apparatus provided in an embodiment of the present application.
  • the model distillation training apparatus 10 may be the third device in the above method embodiment.
  • the model distillation training apparatus 10 It may include a registration information receiving unit 1001, a second training request receiving unit 1002, a third response sending unit 1003 and a distillation notification receiving unit 1004, wherein the detailed description of each unit is as follows:
  • a registration information receiving unit 1001 configured to receive registration information sent by the second device
  • a second training request receiving unit 1002 configured to receive a second training request sent by the first device
  • a third response sending unit 1003, configured to generate a third response based on the second training request if the fourth training type ID is consistent with the third training type ID, and send the third response to the first device;
  • the distillation notification receiving unit 1004 is configured to receive the distillation notification sent by the first device.
  • FIG. 11 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 11 may be the first device in the above method embodiment.
  • the model distillation training device 11 It may include a second training request sending unit 1101, a third response receiving unit 1102, and a distillation notification sending unit 1103, wherein the detailed description of each unit is as follows:
  • a second training request sending unit 1101, configured to send a second training request to a third device
  • the third response receiving unit 1102 is configured to receive the third response sent by the third device if the fourth training type ID is consistent with the third training type ID;
  • a distillation notification sending unit 1103, configured to send a distillation notification to the third device.
  • model distillation training device 11 further includes:
  • a second neural network model design unit configured to design a second neural network model based on the structural information of the second reference neural network model after the distillation notification sending unit 1103 sends a distillation notification to the third device;
  • a second configuration information sending unit configured to send the second configuration information to the second device
  • a second data information sending unit configured to send second data information to the second device
  • a second indication information receiving unit configured to receive the second indication information returned by the second device
  • a second distillation training unit configured to perform distillation training on the second neural network model by using the second indication information.
  • model distillation training device 11 further includes:
  • the second attention category list sending unit is configured to send the second attention category list to the second device after the second configuration information sending unit sends the second configuration information to the second device.
  • FIG. 12 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 12 may be the second device in the above method embodiment.
  • the model distillation training device 12 It may include a registration information sending unit 1201, a second configuration information receiving unit 1202, a second data information receiving unit 1203, and a second indication information sending unit 1204, wherein the detailed description of each unit is as follows:
  • a registration information sending unit 1201, configured to send registration information to a third device
  • the second configuration information receiving unit 1202 is configured to receive the second configuration information sent by the first device
  • the second data information receiving unit 1203 is configured to receive the second data information sent by the first device
  • the second indication information sending unit 1204 is configured to send the second indication information to the first device.
  • model distillation training device 12 further includes:
  • the second attention category list receiving unit is configured to receive the second attention category list sent by the first device before the second configuration information receiving unit 1202 receives the second configuration information sent by the first device.
  • model distillation training device 12 further includes:
  • the second network structure request receiving unit is used for receiving the second network structure request sent by the first device before the second configuration information receiving unit 1202 receives the second configuration information sent by the first device;
  • a second structure request response sending unit is configured to send a second structure request response to the first device based on the second network structure request.
  • FIG. 13 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 13 may be the first device in the above method embodiment.
  • the model distillation training device 13 It may include a memory 1301, a communication module 1302 and a processor 1303, wherein the detailed description of each unit is as follows:
  • the memory 1301 is used to store program codes.
  • the processor 1303 is configured to call the program code stored in the memory to perform the following steps:
  • the second device supports the distillation training neural network model of the first device, design the first neural network model
  • the first neural network model is trained using the first indication information.
  • the processor 1303 is further configured to: before designing the first neural network model, send the first training request to the second device through the communication module 1302;
  • the second response of the second device is received through the communication module 1302 .
  • the processor 1303 is further configured to: send the first attention category list to the second device through the communication module 1302 .
  • the processor 1303 is further configured to: before designing the first neural network model, send a first network structure request to the second device through the communication module 1302;
  • the first structure request response sent by the second device is received by the communication module 1302 .
  • FIG. 14 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 14 may be the second device in the above method embodiment.
  • the model distillation training device 14 It may include a memory 1401, a communication module 1402 and a processor 1403, wherein the detailed description of each unit is as follows:
  • the memory 1401 is used to store program codes.
  • the processor 1403 is configured to call the program code stored in the memory to perform the following steps:
  • the processor 1403 is further configured to: receive the first training request sent by the first device through the communication module 1402;
  • the second response is sent to the first device through the communication module 1402 .
  • the processor 1403 is further configured to: before receiving the first configuration information sent by the first device through the communication module 1402, receive the first attention sent by the first device through the communication module 1402 Category list.
  • the processor 1403 is further configured to: before receiving the first configuration information sent by the first device through the communication module 1402, receive the first network sent by the first device through the communication module 1402 Structure request; send a first structure request response to the first device through the communication module 1402 .
  • FIG. 15 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 15 may be the third device in the above method embodiment.
  • the model distillation training device 15 It may include a memory 1501, a communication module 1502 and a processor 1503, wherein the detailed description of each unit is as follows:
  • the memory 1501 is used to store program codes.
  • the processor 1503 is configured to call the program code stored in the memory to perform the following steps:
  • the distillation notification sent by the first device is received through the communication module 1502 .
  • FIG. 16 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 16 may be the first device in the above method embodiment.
  • the model distillation training device 16 It may include a memory 1601, a communication module 1602 and a processor 1603, wherein the detailed description of each unit is as follows:
  • Memory 1601 is used to store program codes.
  • the processor 1603 is used to call the program code stored in the memory to perform the following steps: send a second training request to the third device through the communication module 1602; if the fourth training type ID is consistent with the third training type ID, receive the The third response sent by the third device;
  • a distillation notification is sent to the third device through the communication module 1602 .
  • the processor 1603 is further configured to: design the second neural network model after sending the distillation notification to the third device through the communication module 1602;
  • the second neural network model is trained using the second indication information.
  • the processor 1603 is further configured to: before sending the second configuration information to the second device through the communication module 1602, send the second attention category list to the second device through the communication module 1602.
  • Distillation training is performed on the second neural network model using the second indication information.
  • the processor 1603 is further configured to: before designing the second neural network model, send a second network structure request to the second device through the communication module 1602;
  • the second structure request response sent by the second device is received by the communication module 1602 .
  • FIG. 17 is a schematic structural diagram of a model distillation training device provided by an embodiment of the present application.
  • the model distillation training device 17 may be the second device in the above method embodiment.
  • the model distillation training device 17 It may include a memory 1701, a communication module 1702 and a processor 1703, wherein the detailed description of each unit is as follows:
  • the memory 1701 is used to store program codes.
  • the processor 1703 is configured to call the program code stored in the memory to perform the following steps:
  • the processor 1703 is further configured to: before receiving the second configuration information sent by the first device through the communication module 1702, receive through the communication module 1702 the second category of interest list sent by the first device .
  • the processor 1703 is further configured to: before receiving the second configuration information sent by the first device through the communication module 1702, receive the second network structure request sent by the first device through the communication module 1702 ;
  • a second structure request response is sent to the first device through the communication module 1702 .
  • Embodiments of the present application provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium.
  • the computer program is executed by a processor, the models in the foregoing embodiments and various possible implementation manners thereof are implemented. Distillation training method.
  • An embodiment of the present application provides a computer program, the computer program including instructions, when the computer program is executed by a computer, enables a first device to execute the first device in the foregoing embodiments and various possible implementation manners.
  • process, or the second device can execute the process performed by the second device in the foregoing embodiment and its various possible implementations, so that the third device can execute the third device in the foregoing embodiment and its various possible implementations. the process performed.
  • An embodiment of the present application provides a chip system, where the chip system includes a processor, configured to support a first device to implement the functions involved in the methods in the foregoing embodiments and various possible manners thereof, or a second device to implement the foregoing embodiments and the functions involved in the methods in the various possible manners thereof, or the third device implements the functions involved in the methods in the foregoing embodiments and the various possible manners.
  • the chip system further includes a memory for storing necessary program instructions and data of the first device or the second device or the third device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • Computer-readable media may include computer-readable storage media, which corresponds to tangible media, such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another (eg, according to a communication protocol) .
  • a computer-readable medium may generally correspond to (1) a non-transitory tangible computer-readable storage medium, or (2) a communication medium such as a signal or carrier wave.
  • Data storage media can be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementing the techniques described in this application.
  • the computer program product may comprise a computer-readable medium.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Abstract

一种模型蒸馏训练的方法,包括:第一设备蒸馏训练神经网络模型之前,要和第二设备建立蒸馏训练的通信连接,通过第一设备和第二设备之间蒸馏训练信息的交互,第二设备使用第一设备发送的第一配置信息,配置第一参考神经网络模型。第二设备配置好第一参考神经网络模型后,使用第一数据信息,将第一数据信息中的第一样本数据通过已配置好的第一参考神经网络模型进行运算处理后,得到第一指示信息,并发送给第一设备。第一设备使用第一指示信息训练其设计的第一神经网络模型。通过这种方式,实现了第二设备通过迁移"知识"来指导第一设备进行蒸馏训练。

Description

模型蒸馏训练的方法及相关装置和设备、可读存储介质 技术领域
本申请涉及模型蒸馏训练的技术领域,尤其涉及一种模型蒸馏训练的方法及相关装置和设备、可读存储介质。
背景技术
在信息技术中,中心网元和边缘网元是网络中常见的一种应用场景,例如,在无线场景下,无线接入网(Radio Access Network,RAN)可作为中心网元,用户设备(User Equipment,UE)可作为边缘网元。类似的,在3GPP中,引入标准化网络数据分析功能(Network Data Analytics Function,NWDAF),也有中心NWDAF和边缘NWDAF的部署场景。中心网元可向边缘网元发送训练好的神经网络模型。图1是中心网元和边缘网元部署神经网络示意图,如图1所示,由于中心网元具有强大的计算能力以及存储有大量的数据资源,因此中心网元通常针对全类别的场景部署大的神经网络(大网络),并训练所述神经网络。边缘网元存储资源和计算资源受限,且边缘网元通常只关注几个特定的类别,因此边缘网元通常部署小的神经网络(小网络)。
目前,边缘网元部署神经网络的方式主要有两种:一种方式是,中心网元直接将大网络下发给边缘网元,边缘网元部署所述大网络。由于边缘网元的存储资源和计算资源受限,边缘网元通过这种方式部署神经网络,会占用边缘网元受限的存储资源。另一种方式是,中心网元做网络模型蒸馏,得到统一的小网络,并将小网络下发给边缘网元,边缘网元部署所述小网络。由于中心网元不知道每个边缘网元的存储资源、计算能力以及边缘网元关注的类别,因此,无法蒸馏得到适配每个边缘网元的小网络模型。中心网元统一蒸馏训练得到的小网络模型与边缘网元的不适配所带来的问题为:所述小网络模型所支持的类别为全类别,而边缘网元所关注的类别只有特定几个,当所述全类别的小网络模型要求的计算能力以及存储资源超出边缘网元的能力时,所述小网络模型会占用边缘网元的存储空间和运行空间,影响边缘网元完成其他工作,进而降低边缘网元的工作性能;另外,由于边缘网元的计算能力以及存储资源有限,边缘网元可能无法完全运行所述全类别的小网络模型,使得小网络模型执行如人像识别、流量识别等功能时,完成度和精准度不高。
发明内容
本申请提供了一种模型蒸馏训练的方法,解决了边缘网元部署中心网元下发的大网络而占用其受限的存储资源的问题,也解决了边缘网元部署中心网元下发的统一蒸馏训练而得到的小网络与边缘网元不适配的问题。
第一方面,本申请实施例提供了一种模型蒸馏训练的方法,包括:
若第二设备支持第一设备蒸馏训练神经网络模型,所述第一设备设计第一神经网络模型;
所述第一设备向所述第二设备发送第一配置信息;其中,所述第一配置信息用于所述 第二设备配置所述第一参考神经网络模型;
所述第一设备向所述第二设备发送第一数据信息;其中,所述第一数据信息包括所述第一参考神经网络模型用于蒸馏训练的第一样本数据;
所述第一设备接收所述第二设备返回的第一指示信息,并使用所述第一指示信息训练所述第一神经网络模型;所述第一指示信息为所述第一样本数据输入到所述第一参考神经网络模型得到的信息。
本申请实施例,第一设备设计第一神经网络模型,以及向第二设备发送第一配置信息和第一数据信息,使得第一设备可对第一神经网络模型进行个性化地设计和训练,实现第一神经网络模型大小和第一神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的、统一蒸馏训练得到的小网络而与第一设备不适配的问题。也就是说,可以解决现有技术中由中心网元统一蒸馏训练得到的小神经网络模型与边缘网元的不适配所带来的问题,包括:所述小神经网络模型所支持的类别为全类别,而边缘网元所关注的类别只有特定几个,当所述全类别的小神经网络模型要求的计算能力以及存储资源超出边缘网元的能力时,所述小神经网络模型会占用边缘网元的存储空间和运行空间,影响边缘网元完成其他工作,进而降低边缘网元的工作性能的技术问题;另外,由于边缘网元的计算能力以及存储资源有限,边缘网元可能无法完全运行所述全类别的小神经网络模型,使得小神经网络模型执行如人像识别、流量识别等功能时,完成度和精准度不高的技术问题。
在一种可能的实现方式中,所述第一设备设计第一神经网络模型之前,还包括:
所述第一设备向所述第二设备发送第一训练请求;其中,所述第一训练请求包括第一训练类型ID,所述第一训练类型ID用于指示所述第一设备待蒸馏训练的神经网络模型的功能类型;
若第二训练类型ID与所述第一训练类型ID一致,所述第一设备接收所述第二设备发送的第一响应;其中,所述第一响应包括第一存储信息;所述第一存储信息用于指示所述第一参考神经网络模型所占存储空间的大小;所述第二训练类型ID用于指示所述第二设备支持蒸馏训练的神经网络模型的功能类型;
所述第一设备基于所述第一存储信息向所述第二设备发送第一询问信息;其中,所述第一询问信息包括第一蒸馏询问信息和第一蒸馏能力信息;所述第一蒸馏询问信息用于所述第一设备询问所述第二设备是否支持蒸馏训练神经网络模型以及在支持所述第一设备蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式;所述第一蒸馏能力信息包括所述第一设备对神经网络模型的设计能力、对神经网络模型的蒸馏训练能力、是否存在蒸馏训练数据中的一项或多项;
所述第一设备接收所述第二设备的第二响应;其中,所述第二响应用于指示所述第二设备是否支持所述第一设备蒸馏训练神经网络模型以及在支持所述第一设备蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式。
本申请实施例,第一设备向第二设备发送第一训练请求,第二设备根据第一训练请求向第一设备提供相应的参考神经网络模型,有利于第一设备设计第一神经网络模型。进而, 实现了第一神经网络模型大小和第一神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第一设备向所述第二设备发送第一配置信息之前,还包括:
所述第一设备向所述第二设备发送第一关注类别列表;其中,所述第一关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第一关注类别列表中第一设备蒸馏训练所关注类别的集合为第一类别列表中类别集合的子集;所述第一类别列表包括所述第一参考神经网络模型预设类别的集合。
本申请实施例,第一设备向第二设备发送第一关注类别列表,有利于第二设备基于第一设备发送的第一数据信息获取蒸馏训练的数据后,根据所述第一关注类别列表中类别对蒸馏训练后获取的数据进行筛选,得到第一指示信息,所述第一指示信息与所述第一关注类别列表中类别相对应,有利于第一设备基于所述第一指示信息蒸馏训练出适配的第一神经网络模型,也在不影响第一神经网络模型训练效果下进一步减少了通信开销,节省了分离式蒸馏训练的网络资源。
在一种可能的实现方式中,所述第一指示信息为所述第二设备将所述第一样本数据通过所述第一参考神经网络模型计算处理,并将处理后的第一样本数据基于所述第一关注类别列表进行筛选所得到的。
本申请实施例,第二设备根据所述第一关注类别列表中类别对蒸馏训练后获取的数据进行筛选,得到第一指示信息,所述第一指示信息与所述第一关注类别列表中类别相对应,有利于第一设备基于所述第一指示信息蒸馏训练出适配的第一神经网络模型,也在不影响第一神经网络模型训练效果下进一步减少了通信开销,节省了分离式蒸馏训练的网络资源。
在一种可能的实现方式中,所述第一设备设计第一神经网络模型,包括:
所述第一设备向所述第二设备发送第一网络结构请求;其中,所述第一网络结构请求用于所述第一设备向所述第二设备获取所述第一参考神经网络模型的结构信息;
所述第一设备接收所述第二设备发送的第一结构请求响应;其中,所述第一结构请求响应包括所述第一参考神经网络模型的结构信息;
所述第一设备基于所述第一参考神经网络模型的结构信息设计第一神经网络模型。
本申请实施例,第一设备向第二设备发送第一网络结构请求后,接收第二设备发送的第一结构请求响应,获取了第一参考神经网络模型的结构信息。有利于第一设备根据所述第一参考神经网络模型的结构信息设计第一神经网络模型,实现了第一神经网络模型大小和第一神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第一设备向所述第二设备发送第一数据信息之后,还包括:
所述第一设备向所述第二设备发送第一训练结果通知;其中,所述第一训练结果通知包括所述第一设备蒸馏训练第一神经网络模型的完成度信息、蒸馏训练的所述第一神经网络模型的反馈信息。
本申请实施例,第一设备向第二设备发送第一训练结果通知,所述第一训练结果通知用于告知第二设备第一神经网络模型是否训练完成以及向第二设备反馈对训练的第一神经网络模型的信息。第二设备根据第一训练结果通知判断是否释放第一设备与第二设备之间的蒸馏训练的通信资源,避免占用第一设备与第二设备的其它信道资源。
第二方面,本申请实施例提供了一种模型蒸馏训练的方法,包括:
第二设备接收第一设备发送的第一配置信息,并使用所述第一配置信息配置第一参考神经网络模型;
所述第二设备接收所述第一设备发送的第一数据信息;其中,所述第一数据信息包括所述第一参考神经网络模型用于蒸馏训练的第一样本数据;
所述第二设备向所述第一设备发送第一指示信息;其中所述第一指示信息用于所述第一设备训练第一神经网络模型;所述第一指示信息为所述第一样本数据输入到所述第一参考神经网络模型得到的信息。
本申请实施例,第二设备接收第一设备发送的第一配置信息和第一数据信息,有利于第二设备基于第一设备发送的第一配置信息和第一数据信息获得第一指示信息,以便第一设备使用第一指示信息训练第一神经网络模型。从而实现了第一神经网络模型大小和第一神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二设备接收第一设备发送的第一配置信息,使用所述第一配置信息配置第一参考神经网络模型之前,还包括:
所述第二设备接收所述第一设备发送的第一训练请求;其中,所述第一训练请求包括第一训练类型ID,所述第一训练类型ID用于指示第一设备待蒸馏训练的神经网络模型的功能类型;
若第二训练类型ID与所述第一训练类型ID一致,所述第二设备基于所述第一训练请求,生成第一响应,并将所述第一响应发送给所述第一设备;其中,所述第二训练类型ID用于指示所述第二设备支持蒸馏训练神经网络模型的功能类型;所述第一响应包括第一存储信息;所述第一存储信息用于指示所述第一参考神经网络模型所占存储空间的大小;
所述第二设备接收所述第一设备发送的第一询问信息;其中,所述第一询问信息包括第一蒸馏询问信息和第一蒸馏能力信息;所述第一蒸馏询问信息用于所述第一设备询问所述第二设备是否支持蒸馏训练神经网络模型以及在支持所述第一设备蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式;所述第一蒸馏能力信息包括所述第一设 备对神经网络模型的设计能力、对神经网络模型的蒸馏训练能力、是否存在蒸馏训练数据中的一项或多项;
所述第二设备基于所述第一蒸馏询问信息生成第二响应,并将所述第二响应发送给所述第一设备;其中,所述第二响应用于指示所述第二设备是否支持所述第一设备蒸馏训练神经网络模型以及在支持所述第一设备蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式。
本申请实施例,第二设备接收第一设备发送的第一训练请求和第一询问信息,并向第一设备发送对应的响应,有利于第一设备进行第一神经网络模型的设计。进而,实现了第一神经网络模型大小和第一神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二设备接收所述第一设备发送的第一配置信息之前,还包括:
所述第二设备接收所述第一设备发送的第一关注类别列表;其中,所述第一关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第一关注类别列表中,所述第一设备蒸馏训练所关注类别的集合为第一类别列表中类别集合的子集;所述第一类别列表包括所述第一参考神经网络模型预设类别的集合。
本申请实施例,第二设备接收第一设备发送的第一关注类别列表,有利于第二设备基于所述第一关注类别列表获得第一指示信息,所述第一指示信息包括与第一设备关注类别相关的指示信息,第一设备使用第一指示信息训练第一神经网络模型,训练完成后的第一神经网络模型为与所述第一设备相适配的神经网络模型,实现第一神经网络模型大小和第一神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的、统一蒸馏训练得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第一指示信息为所述第二设备将所述第一样本数据通过所述第一参考神经网络模型计算处理,并将处理后的第一样本数据基于所述第一关注类别列表进行筛选所得到的。
本申请实施例,第二设备基于所述第一关注类别列表所得到的第一指示信息发送给第一设备,有利于第一设备基于所述第一指示信息得到更加适配的第一神经网络模型,进而解决第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二设备接收所述第一设备发送的第一配置信息之前,还包括:
所述第二设备接收所述第一设备发送的第一网络结构请求;其中,所述第一网络结构请求用于所述第一设备向所述第二设备获取所述第一参考神经网络模型的结构信息;
所述第二设备基于所述第一网络结构请求向所述第一设备发送第一结构请求响应;其中,所述第一结构请求响应包括所述第一参考神经网络模型的结构信息。
本申请实施例,第二设备向第一设备发送第一参考神经网络模型的结构信息,有利于第一网络根据所述第一参考神经网络模型的结构信息设计第一神经网络模型,实现了第一神经网络模型大小和第一神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二设备接收所述第一设备发送的第一数据信息之后,还包括:
所述第二设备接收所述第一设备发送的第一训练结果通知;其中,所述第一训练结果通知包括所述第一设备蒸馏训练第一神经网络模型的完成度信息、蒸馏训练的所述第一神经网络模型的反馈信息。
本申请实施例,第二设备接收第一结果通知,有利于第二设备根据第一训练结果通知判断是否释放第一设备与第二设备之间的蒸馏训练通信资源,避免占用第一设备与第二设备的其它信道资源。
第三方面,本申请实施例提供了一种模型蒸馏训练的方法,包括:
第三设备接收第二设备发送的注册信息;其中,所述注册信息包括第三训练类型ID、第三神经网络模型ID、第二存储信息、第二类别列表、训练响应信息;所述训练响应信息用于指示所述第二设备是否支持蒸馏训练神经网络模型以及在支持蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式;所述第三训练类型ID用于指示所述第二设备支持蒸馏训练神经网络模型的功能类型;
所述第三设备接收所述第一设备发送的第二训练请求;其中,所述第二训练请求包括第四训练类型ID、第二蒸馏询问信息和第二蒸馏能力信息;所述第四训练类型ID用于指示所述第一设备待蒸馏训练神经网络模型的功能类型;
若第四训练类型ID与所述第三训练类型ID一致,所述第三设备基于所述第二训练请求生成第三响应,并将所述第三响应发送给所述第一设备;其中,所述第三响应包括训练响应信息、第三神经网络模型ID、第二存储信息和第二类别列表;
所述第三设备接收所述第一设备发送的蒸馏通知;其中,所述蒸馏结果通知用于指示所述第一设备与所述第二设备是否匹配成功。
本申请实施例,第三设备接收第二设备发送的注册信息以及第三设备接收第一设备发送的第二训练请求,有利于第三设备通过分析所述注册信息与所述第二训练请求,为第一设备和第二设备建立蒸馏训练通信连接,进而实现第二神经网络模型大小和第二神经网络模型精度的平衡,从而提高蒸馏训练效率。
第四方面,本申请实施例提供了一种模型蒸馏训练的方法,包括:
第一设备向第三设备发送第二训练请求;其中,所述第二训练请求包括第四训练类型 ID、第二蒸馏询问信息和第二蒸馏能力信息;其中,所述第四训练类型ID用于指示第一设备待蒸馏训练的神经网络模型的功能类型;
若所述第四训练类型ID与所述第三训练类型ID一致,所述第一设备接收所述第三设备发送的第三响应;其中,所述第三响应包括训练响应信息、第三神经网络模型ID、第二存储信息和第二类别列表;所述第三训练类型ID用于指示所述第二设备支持蒸馏训练神经网络模型的功能类型;
所述第一设备向所述第三设备发送蒸馏通知;其中,所述蒸馏结果通知用于指示所述第一设备与所述第二设备是否匹配成功。
本申请实施例,第一设备向第三设备发送第二训练请求,第三设备基于所述第二训练请求向第一设备发送第三响应,有利于第三设备为第一设备和第二设备建立蒸馏训练通信连接,从而提高蒸馏训练效率。
在一种可能的实现方式中,所述第一设备向所述第三设备发送蒸馏通知之后,还包括:
所述第一设备设计第二神经网络模型;
所述第一设备向第二设备发送第二配置信息;其中,所述第二配置信息用于所述第二设备配置所述第二参考神经网络模型;
所述第一设备向所述第二设备发送第二数据信息;其中,所述第二数据信息包括所述第二参考神经网络模型用于蒸馏训练的第二样本数据;
所述第一设备接收所述第二设备返回的第二指示信息,并使用所述第二指示信息训练所述第二神经网络模型;所述第二指示信息为所述第二样本数据输入到所述第二参考神经网络模型得到的信息。
本申请实施例,第一设备和第二设备建立蒸馏训练通信连接之后,第一设备设计第二神经网络模型,第一设备和第二设备进行第二配置信息和第二数据信息的发送、接收的这种蒸馏训练通信过程。有利于第二设备获取与第一设备相适配的第二指示信息,从而有利于第一设备使用第二指示信息训练出与其相适配的第二神经网络模型。
在一种可能的实现方式中,所述第一设备向第二设备发送第二配置信息之前,还包括:
所述第一设备向所述第二设备发送第二关注类别列表;其中,所述第二关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第二关注类别列表中,第一设备蒸馏训练所关注类别的集合为第二类别列表中类别集合的子集;所述第二类别列表包括所述第二参考神经网络模型预设类别的集合。
本申请实施例,第一设备向第二设备发送第二关注类别列表,有利于第二设备基于所述第二关注类别列表获得第二指示信息,所述第二指示信息与第二关注类别列表中的类别相对应,第一设备使用第二指示信息训练第二神经网络模型,可以得到与所述第一设备相适配的神经网络模型,实现第二神经网络模型大小和第二神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的、统一蒸馏训练得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二指示信息为所述第二设备将所述第二样本数据通过所述第二参考神经网络模型计算处理,并将处理后的第二样本数据基于所述第二关注类别列表进行筛选所得到的。
本申请实施例,第二设备基于所述第二关注类别列表获得第二指示信息,所述第二指示信息与第二关注类别列表中的类别相对应,第一设备使用第二指示信息训练第二神经网络模型,可以得到与所述第一设备相适配的神经网络模型,实现第二神经网络模型大小和第二神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的、统一蒸馏训练得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第一设备设计第二神经网络模型,包括:
所述第一设备向所述第二设备发送第二网络结构请求;其中,所述第二网络结构请求用于所述第一设备向所述第二设备获取第二参考神经网络模型的结构信息;
所述第一设备接收所述第二设备发送的第二结构请求响应;其中,所述第二结构请求响应包括所述第二参考神经网络模型的结构信息;
所述第一设备基于第二参考神经网络模型的结构信息设计第二神经网络模型。
本申请实施例,第一设备向第二设备发送第二网络结构请求后,接收第二设备发送的第二结构请求响应,获取了第二参考神经网络模型的结构信息。有利于第一设备根据所述第一参考神经网络模型的结构信息设计第二神经网络模型,实现了第二神经网络模型大小和第二神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第一设备向所述第二设备发送的第二数据信息之后,还包括:
所述第一设备向所述第二设备发送第二训练结果通知;其中,所述第二训练结果通知包括所述第一设备蒸馏训练第二神经网络模型的完成度信息、蒸馏训练所述第二神经网络模型的反馈信息。
本申请实施例,第一设备向第二设备发送第二训练结果通知,所述第二训练结果通知用于告知第二设备第二神经网络模型是否训练完成以及向第二设备反馈对训练的第二神经网络模型的信息。第二设备根据第二训练结果通知判断是否释放第一设备与第二设备之间的蒸馏训练通信资源,避免占用第一设备与第二设备的其它信道资源。
第五方面,本申请实施例提供了一种模型蒸馏训练的方法,包括:
第二设备向第三设备发送注册信息;其中,所述注册信息包括第三训练类型ID、第三神经网络模型ID、第二存储信息、第二类别列表、训练响应信息;所述训练响应信息用于指示所述第二设备是否支持蒸馏训练神经网络模型以及在支持蒸馏训练神经网络模型的情 况下,支持蒸馏训练神经网络模型的方式;
所述第二设备接收第一设备发送的第二配置信息;其中,所述第二配置信息用于所述第二设备配置所述第二参考神经网络模型;
所述第二设备接收所述第一设备发送的第二数据信息;其中,所述第二数据信息包括所述第二参考神经网络模型用于蒸馏训练的第二样本数据;
所述第二设备向所述第一设备发送第二指示信息;其中所述第二指示信息用于所述第一设备训练第二神经网络模型;所述第二指示信息为所述第二本数据输入到所述第二参考神经网络模型得到的信息。
本申请实施例,第二设备向第三设备发送注册信息,有利于第三设备可以快速有效地建立第一设备和第二设备的蒸馏训练连接,提升蒸馏训练的效率。第二设备接收第一设备发送的第二配置信息和第二数据信息,有利于第二设备基于第二配置信息和第二数据信息获得第二指示信息,使得第一设备使用第二指示信息训练第二神经网络模型。从而实现了第二神经网络模型大小和第二神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二设备接收第一设备发送的第二配置信息之前,还包括:
所述第二设备接收所述第一设备发送的第二关注类别列表;其中,所述第二关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第二关注类别列表中,第一设备蒸馏训练所关注类别的集合为第二类别列表中类别集合的子集;所述第二类别列表包括所述第二参考神经网络模型预设类别的集合。
本申请实施例,第二设备接收第一设备发送的第二关注类别列表,有利于第二设备基于所述第二关注类别列表获得第二指示信息,所述第二指示信息包括与第一设备关注类别相关的指示信息,第一设备使用第二指示信息训练第一神经网络模型,训练完成后的第一神经网络模型为与所述第一设备相适配的神经网络模型,实现第一神经网络模型大小和第二神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的、统一蒸馏训练得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二指示信息为所述第二设备将所述第二样本数据通过所述第二参考神经网络模型计算处理,并将处理后的第二样本数据基于所述第二关注类别列表进行筛选所得到的。
本申请实施例,第二设备基于所述第二关注类别列表所得到的第二指示信息发送给第一设备,有利于第一设备基于所述第二指示信息得到更加适配的第二神经网络模型,进而解决第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二设备接收第一设备发送的第二配置信息之前,还包括:
所述第二设备接收所述第一设备发送的第二网络结构请求;其中,所述第二网络结构请求用于所述第一设备向所述第二设备获取所述第二参考神经网络模型的结构信息;
所述第二设备基于所述第二网络结构请求向所述第一设备发送第二结构请求响应;其中,所述第二结构请求响应包括所述第二参考神经网络模型的结构信息。
本申请实施例,第二设备向第一设备发送第二参考神经网络模型的结构信息,有利于第二网络根据所述第二参考神经网络模型的结构信息设计第二神经网络模型,实现了第二神经网络模型大小和第二神经网络模型精度的平衡,提高了蒸馏训练效率,解决了第一设备因其有限的存储资源而无法存储第二设备下发的大网络的问题,也解决了第一设备部署第二设备下发的统一蒸馏训练所得到的小网络而与第一设备不适配的问题。
在一种可能的实现方式中,所述第二设备接收所述第一设备发送的第二数据信息之后,还包括:
所述第二设备接收所述第一设备发送的第二训练结果通知;其中,所述第二训练结果通知包括所述第一设备蒸馏训练第二神经网络模型的完成度信息、蒸馏训练所述第二神经网络模型的反馈信息。
本申请实施例,第二设备接收第二结果通知,有利于第二设备根据第二训练结果通知判断是否释放第一设备与第二设备之间的蒸馏训练通信资源,避免占用第一设备与第二设备的其它信道资源。
第六方面,本申请实施例提供了一种模型蒸馏训练装置,所述模型蒸馏训练装置为第一设备,包括:
第一神经网络模型设计单元,用于若第二设备支持所述第一设备蒸馏训练神经网络模型,设计第一神经网络模型;
第一配置信息发送单元,用于向所述第二设备发送第一配置信息;
第一数据信息发送单元,用于向所述第二设备发送第一数据信息;
第一指示信息接收单元,用于接收所述第二设备返回的第一指示信息;
第一神经网络模型训练单元,用于使用所述第一指示信息训练所述第一神经网络模型。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第一训练请求发送单元,用于所述第一神经网络模型设计单元设计第一神经网络模型之前,向所述第二设备发送第一训练请求;
第一响应接收单元,用于第一训练请求发送单元向所述第二设备发送第一训练请求之后,若第二训练类型ID与所述第一训练类型ID一致,接收所述第二设备发送的第一响应;
第一询问信息发送单元,用于基于所述第一存储信息向所述第二设备发送第一询问信息;
第二响应接收单元,用于所述第一询问信息发送单元向所述第二设备发送第一询问信 息之后,接收所述第二设备的第二响应。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第一关注类别列表发送单元,用于所述第一配置信息发送单元向所述第二设备发送第一配置信息之前,向所述第二设备发送第一关注类别列表。
在一种可能实现的方式中,所述第一神经网络模型设计单元,还包括:
第一网络结构请求发送单元,用于所述第一神经网络模型设计单元设计第一神经网络模型之前,向所述第二设备发送第一网络结构请求;
第一结构请求响应接收单元,用于所述第一网络结构请求发送单元向所述第二设备发送第一网络结构请求之后,接收所述第二设备发送的第一结构请求响应。
第七方面,本申请实施例提供了一种模型蒸馏训练装置,所述模型蒸馏训练装置为第二设备,包括:
第一配置信息接收单元,用于接收所述第一设备发送的第一配置信息;
第一配置单元,用于使用所述第一配置信息配置所述第一参考神经网络模型;
第一数据信息接收单元,用于接收所述第一设备发送的第一数据信息;
第一指示信息发送单元,用于向所述第一设备发送第一指示信息。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第一训练请求接收单元,用于所述第一参数配置单元配置所述第一参考神经网络模型之前,接收所述第一设备发送的第一训练请求;
第一响应发送单元,用于若第二训练类型ID与所述第一训练类型ID一致,基于所述第一训练请求,生成第一响应,并将所述第一响应发送给所述第一设备;
第一询问信息接收单元,用于接收所述第一设备发送的第一询问信息;
第二响应发送单元,用于所述第一询问信息接收单元接收所述第一设备发送的第一询问信息后,将所述第二响应发送给所述第一设备。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第一关注类别列表接收单元,用于所述第一配置信息接收单元接收所述第一设备发送的第一配置信息之前,接收所述第一设备发送的第一关注类别列表。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第一网络结构请求接收单元,用于所述第一配置信息接收单元接收所述第一设备发送的第一配置信息之前,接收所述第一设备发送的第一网络结构请求;
第一结构请求响应发送单元,用于向所述第一设备发送第一结构请求响应。
第八方面,本申请实施例提供了一种模型蒸馏训练装置,所述模型蒸馏训练装置为第 三设备,包括:
注册信息接收单元,用于接收第二设备发送的注册信息;
第二训练请求接收单元,用于接收所述第一设备发送的第二训练请求;
第三响应发送单元,用于若所述第四训练类型ID与所述第三训练类型ID一致,基于所述第二训练请求生成第三响应,并将所述第三响应发送给所述第一设备;
蒸馏通知接收单元,用于接收所述第一设备发送的蒸馏通知。
第九方面,本申请实施例提供了一种模型蒸馏训练装置,所述模型蒸馏训练装置为第一设备,包括:
第二训练请求发送单元,用于向第三设备发送第二训练请求;
第三响应接收单元,用于若所述第四训练类型ID与所述第三训练类型ID一致,接收所述第三设备发送的第三响应;
蒸馏通知发送单元,用于向所述第三设备发送蒸馏通知。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第二神经网络模型设计单元,用于所述蒸馏通知发送单元向所述第三设备发送蒸馏通知之后,设计第二神经网络模型;
第二配置信息发送单元,用于向第二设备发送第二配置信息;
第二数据信息发送单元,用于向所述第二设备发送第二数据信息;
第二指示信息接收单元,用于接收所述第二设备返回的第二指示信息。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第二关注类别列表发送单元,用于所述第二配置信息发送单元向第二设备发送第二配置信息之后,向所述第二设备发送第二关注类别列表。
在一种可能的实现方式中,所述第二神经网络模型设计单元,还包括:
第二网络结构请求发送单元,用于向所述第二设备发送第二网络结构请求;
第二结构请求响应接收单元,用于接收所述第二设备发送的第二结构请求响应。
第十方面,本申请实施例提供了一种模型蒸馏训练装置,所述模型蒸馏训练装置为第二设备,包括:
注册信息发送单元,用于向第三设备发送注册信息;
第二配置信息接收单元,用于接收第一设备发送的第二配置信息;
第二数据信息接收单元,用于接收所述第一设备发送的第二数据信息;
第二指示信息发送单元,用于向所述第一设备发送第二指示信息。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第二关注类别列表接收单元,用于所述第二配置信息接收单元接收第一设备发送的第 二配置信息之前,接收所述第一设备发送的第二关注类别列表。
在一种可能实现的方式中,所述模型蒸馏训练装置,还包括:
第二网络结构请求接收单元,用于第二配置信息接收单元接收第一设备发送的第二配置信息之前,接收所述第一设备发送的第二网络结构请求;
第二结构请求响应发送单元,用于基于所述第二网络结构请求向所述第一设备发送第二结构请求响应。
第十一方面,本申请实施例提供了一种模型蒸馏训练设备,包括处理器和存储器;
其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代码,执行上述第一方面及其各种可能的实现方式中的模型蒸馏训练方法。
第十二方面,本申请实施例提供了一种模型蒸馏训练设备,包括处理器和存储器;
其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代码,执行上述第二方面及其各种可能的实现方式中的模型蒸馏训练方法。
第十三方面,本申请实施例提供了一种模型蒸馏训练设备,包括处理器和存储器;
其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代码,执行上述第三方面及其各种可能的实现方式中的模型蒸馏训练方法。
第十四方面,本申请实施例提供了一种模型蒸馏训练设备,包括处理器和存储器;
其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代码,执行上述第四方面及其各种可能的实现方式中的模型蒸馏训练方法。
第十五方面,本申请实施例提供了一种模型蒸馏训练设备,包括处理器和存储器;
其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代码,执行上述第五方面及其各种可能的实现方式中的模型蒸馏训练方法。
第十六方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时,使得第一设备实现上述第一方面和第四方面及其各种可能的实现方式中的模型蒸馏训练的方法、或第二设备实现上述第二方面和第五方面及其各种可能的实现方式中的模型蒸馏训练的方法、或第三设备实现上述第三方面及其各种可能的实现方式中的模型蒸馏训练的方法。
第十七方面,本申请实施例提供了一种计算机程序,该计算机程序包括指令,当所述计算机程序被计算机执行时,使得第一设备可以执行上述第一方面和第四方面及其各种可能的实现方式中第一设备所执行的流程、或第二设备可以执行上述第二方面和第五方面及其各种可能的实现方式中第二设备所执行的流程、或使得第三设备可以执行上述第三方面 及其各种可能的实现方式中第三设备所执行的流程。
附图说明
图1是本申请实施例提供的中心网元和边缘网元部署神经网络的示意图;
图2是本申请实施例提供的一种模型蒸馏示意图;
图3是本申请实施例提供的一种模型蒸馏训练系统架构示意图;
图4是本申请实施例提供的一种模型蒸馏训练的流程示意图;
图5是本申请实施例提供的另外一种模型蒸馏训练的流程示意图;
图6是本申请实施例提供的另外一种模型蒸馏训练的流程示意图;
图7是本申请实施例提供的另外一种模型蒸馏训练的流程示意图;
图8是本申请实施例提供的一种模型蒸馏训练装置的结构示意图;
图9是本申请实施例提供的另外一种模型蒸馏训练装置的结构示意图;
图10是本申请实施例提供的另外一种模型蒸馏训练装置的结构示意图;
图11是本申请实施例提供的另外一种模型蒸馏训练装置的结构示意图;
图12是本申请实施例提供的另外一种模型蒸馏训练装置的结构示意图;
图13是本申请实施例提供的另外一种模型蒸馏训练设备的结构示意图;
图14是本申请实施例提供的另外一种模型蒸馏训练设备的结构示意图;
图15是本申请实施例提供的另外一种模型蒸馏训练设备的结构示意图;
图16是本申请实施例提供的另外一种模型蒸馏训练设备的结构示意图;
图17是本申请实施例提供的另外一种模型蒸馏训练设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例进行描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其它步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
在本说明书中使用的术语“部件”、“模块”、“系统”等用于表示计算机相关的实体、硬件、固件、硬件和软件的组合、软件、或执行中的软件。例如,部件可以是但不限于,在处理器上运行的进程、处理器、对象、可执行文件、执行线程、程序和/或计算机。通过图示,在计算设备上运行的应用和计算设备都可以是部件。一个或多个部件可驻留在进程和/或执行线程中,部件可位于一个计算机上和/或分布在2个或更多个计算机之间。此外,这些部件可从在上面存储有各种数据结构的各种计算机可读介质执行。部件可例如根据具有一个 或多个数据分组(例如来自与本地系统、分布式系统和/或网络间的另一部件交互的二个部件的数据,例如通过信号与其它系统交互的互联网)的信号通过本地和/或远程进程来通信。
本申请权利要求书以及说明书中所述的第一设备和第二设备为支持AI模型训练和神经模型蒸馏训练的网络设备或用户设备,所述第三设备为支持设备间通信进程,接收设备注册信息的交换设备。
请参见图2,图2是本申请实施例提供的一种模型蒸馏示意图。如图2所示,所述模型蒸馏示意图包括教师网络和学生网络。所述教师网络是一个是训练好的神经网络,网络结构复杂,通常具有更多的节点和层数,具有强大的计算处理能力和大量的数据资源。所述学生网络是待训练的神经网络,是一个小网络,网络结构较简单,通常具有更少的节点和层数。教师网络和学生网络输入相同的X,所述X通过教师网络处理和计算,并经过归一化指数函数Softmax 1函数处理后,得到蒸馏训练的指示信息,在本实施例中,所述蒸馏训练的指示信息为教师网络输出的软目标1,也可以理解为教师网络处理X所获得的“知识”。所述X通过学生网络处理和计算,并经过Softmax2函数处理后,会输出软目标2。学生网络将软目标1和软目标2进行对比分析,得到损失函数Fn1。所述X通过学生网络处理和计算,并经过Softmax3函数处理后,会输出硬目标1,学生网络将硬目标1与硬目标2进行对比分析,得到损失函数Fn2;其中所述硬目标2为X的预设输出结果。所述损失函数Fn1和Fn2用于指示软目标1和软目标2、硬目标1和硬目标2的偏差程度。所述学生网络根据损失函数Fn1和Fn2调整所述学生网络,使得所述学生网络处理和计算X后,损失函数Fn1和Fn2降低到某一预设值,从而使得输入相同X,学生网络的输出与教师网络的输出接近,达到教师模型蒸馏训练学生网络的目的。通过这种方式,教师网络培养了学生网络的泛化能力,实现了教师网络对学生网络“知识”的迁移。教师网络对学生网络进行某一功能类型的神经网络训练过程如下:
教师网络是一个训练好的神经网络,能够实现图像识别、流量识别等多种功能。本实施例以教师网络能够实现图像识别,并进行模型蒸馏,使得学生网络能够实现图像识别的功能为例,本实施例以教师网络和学生网络的输入X是汽车图片进行举例说明。向教师网络输入X,所述X经过所述教师网络内各网络层进行相关的算法处理,处理过程主要是将X与所述教师网络中预设的各个类别进行比对。其中,所述预设的各个类别可以是与输入X不同种类的事物,可以是与输入X外观形状差异较大的事物,也可以是与输入X属性不同的事物等其他类别。本实施例以教师网络预设类别为与输入X为不同种类的事物为例,在所述教师网络中的预设类别有车辆、火车、飞机、船只四种类别,所述教师网络中的预设类别分别记为X 1、X 2、X 3和X 4。教师网络将输入的X分别与预设类别X 1、X 2、X 3和X 4进行比对,然后分别输出X与X 1、X与X 2、X与X 3和X与X 4的相似度Y 1、Y 2、Y 3和Y 4。由于所述教师网络是已经训练好了的神经网络,因此所述教师网络会判断出X与X 1属于同类别(对应汽车与车辆属于同类别)。因此,所述教师网络输出的X与X 1的相似度最高,即Y 1的数值最高(接近于100%);所述教师网络输出的X与X 2、X与X 3和X与X 4的相似度很低,即Y 2、Y 3和Y 4数值极低(接近于0%),所述Y 1与所述Y 2、Y 3和Y 4的数值差距过大,所述教师 网络输出的信息几乎只有Y 1的信息,Y 2、Y 3和Y 4的信息几乎为0,因此,教师网络输出的信息量较少。为了提高教师网络输出的信息量,将所述Y 1、Y 2、Y 3和Y 4通过Softmax 1函数处理,缩小Y 1与Y 2、Y 3和Y 4的数值差距。其中,所述Softmax 1函数的表达式为:
Figure PCTCN2020129478-appb-000001
其中,Z i是Y i经过Softmax 1函数处理后,教师网络的输出信息;T是蒸馏温度信息(T越大,则输出的Z i分布越平缓,相当于平滑的一个作用,起到保留相似信息的作用;如果T趋近无穷,就是一个均匀分布),T的取值是基于教师网络与学生网络的网络结构差异所得到的。
所述Y 1、Y 2、Y 3和Y 4通过Softmax 1函数处理后,分别输出Z 1、Z 2、Z 3和Z 4,所述Z 1、Z 2、Z 3和Z 4的分布较为平缓(Z 1的数值是最高的,Z 2、Z 3和Z 4的数值与Z 1数值的差距缩小)。通过将Y 1、Y 2、Y 3和Y 4经过Softmax 1函数进行处理的方式,提高了Y 2、Y 3和Y 4的信息量,进而,提高了教师网络输出的信息量,所述Z 1、Z 2、Z 3和Z 4为教师网络输出的软目标1。
需要说明的是,教师网络是一个已训练好的神经网络,教师网络输出X与其相似类别的相似度有一个很高的置信度(本实施例中汽车与车辆有很高的相似度),从而使得Y 1与Y 2、Y 3、Y 4有极大的数值差距。例如,在本实施例中,对于汽车图片的输入,所述教师网络会判定所述汽车图片与车辆类别的相似度很高,所述教师网络输出Y 1可能为95%;所述教师网络判定所述汽车图片与火车、飞机和船只的相似度很低,所述教师网络输出Y 2、Y 3和Y 4可能分别为3%、1%、1%,在这种情况下,教师网络输入X与X 1、X 2、X 3和X 4类别比对的“知识”很难迁移学生网络,因为它们的相似度几乎接近0,具备很少的信息量。通过这种方式训练学生网络,不利于强化学生网络的泛化能力,影响蒸馏训练的效果。因此,教师网络将Y 1、Y 2、Y 3和Y 4通过Softmax 1函数进行处理,可以使Y 1、Y 2、Y 3和Y 4数值分布更加平缓,增加Y 2、Y 3和Y 4的信息量,使得教师网络学习X与X 1、X 2、X 3和X 4比对的“知识”迁移到学生网络,进而加强学生网络的泛化能力。
同样的,向学生网络输入X,所述X经过所述学生网络内各个网络层进行相关的算法处理,与所述学生网络中预设的各个类别进行比对。学生网络将输入的X分别与预设类别X 1、X 2、X 3和X 4进行比对,然后分别输出X与X 1、X与X 2、X与X 3和X与X 4的相似度H 1、H 2、H 3和H 4。由于学生网络是待训练的神经网络,学生网络对输入X与其预设类别的对比能力不够,使得学生网络输出的X与其预设类别的相似度与预设的结果差别大(例如,H 4的数值极大,接近于100%;H 2、H 3和H 1的数值极小,接近于0)。与教师网络类似,由于,H 4和H 2、H 3和H 1的数值差距过大,使得所述学生网络输出的信息几乎只有H 4的信息,H 2、H 3和H 1的信息量几乎为0。为了增加所述学生网络输出的信息量,将H 1、H 2、H 3和H 4通过Softmax 2函数处理,缩小H 4与H 2、H 3和H 1的数值差距,进而提高所述学生网络输出的信息量;其中,蒸馏温度信息T与教师网络使用相同。所述H 1、H 2、H 3和H 4通过Softmax 2函数处理后,分别输出M 1、M 2、M 3和M 4,所述M 1、M 2、M 3和M 4的分布较为平缓(M 4的数值是最高的,但是M 2、M 3和M 1的数值与M 4数值的差距缩小了),所述M 1、M 2、M 3和M 4为学生网络输出的软目标2。
学生网络将其输出的软目标2与教师网络输出的软目标1进行比对,基于所述软目标 2与所述软目标1的偏差程度得到损失函数Fn1。另外,学生网络将其输出的H 1、H 2、H 3和H 4通过Softmax 3函数处理得到B 4。其中,Softmax 3中使用的蒸馏温度信息T=1,所述H 1、H 2、H 3和H 4通过Softmax 3函数处理得到的B 4只含有H 4的信息,所述B 4为学生网络输出的硬目标1。学生网络将其输出的硬目标1与硬目标2进行比对分析,基于所述硬目标1与所述硬目标2的偏差程度得到损失函数Fn2。其中,所述硬目标2为输入X后应该得到的一个预设结果,X-硬目标2的预设关系作为本地数据存储在学生网络中。
学生网络基于损失函数Fn1和Fn2对其神经网络模型进行调整,降低学生网络与教师网络输出信息的偏差程度,从而减小损失函数Fn1和Fn2的数值,进而加强学生网络的泛化能力,实现教师网络的“知识”迁移。
请参见图3,图3是本申请实施例提供的一种模型蒸馏训练系统架构示意图。图3表示在无线场景下,第二设备为无线接入网(Radio Access Network,RAN),第一设备为用户设备(User Equipment,UE),所述RAN接入多个用户设备。其中,所述RAN部署的大网络为第一参考神经网络模型,所述第一参考神经网络模型为全类别的模型,第一参考神经网络模型中的类别包括:3D视频、高清视频传输和增强现实(Augmented Reality,AR)/虚拟现实(Virtual Reality,VR);UE1部署的小网络为第一神经网络模型,所述第一神经网络模型中的类别为所述第一参考神经网络模型所包含类别的子集,包括:3D视频和高清视频传输。本实施例以UE1蒸馏训练第一神经网络模型,使得第一神经网络模型具备流量识别的功能为例。UE1蒸馏训练第一神经网络模型的步骤如下:
1、UE1向RAN发送第一训练请求,所述第一训练请求包括第一训练类型ID,所述第一训练类型ID用于指示UE1蒸馏训练的第一神经网络模型的功能类型为流量识别。
2、RAN接收所述第一训练请求,并将所述第一训练类型ID与RAN中的第二训练类型ID相比较,若二者相一致,则RAN支持UE1蒸馏训练第一神经网络模型,并根据所述第二训练类型ID,向UE1发送第一响应。所述第一响应包括第一存储信息,所述第一存储信息为具备流量识别功能的第一参考神经网络模型所占的内存或存储空间的大小。
3、UE1根据第一响应向RAN发送第一询问信息,询问RAN是否支持UE1对第一神经网络模型进行蒸馏训练,使得所述第一神经网络模型具备流量识别功能,并上报UE1的蒸馏训练能力。
4、RAN基于所述第一询问信息,向UE1发送第二响应,用于告知UE1:RAN支持UE1蒸馏训练第一神经网络模型,使其获得流量识别功能;支持蒸馏训练的方式为RAN向UE1发送第一指示信息来指导UE1蒸馏训练第一神经网络模型。
5、UE1设计第一神经网络模型。
6、UE1向RAN发送第一配置信息,RAN基于所述第一配置信息配置第一参考神经网络模型。其中,所述第一配置信息包括蒸馏温度信息T、知识类型KnowledgeType;所述蒸馏温度信息T根据RAN部署的第一参考神经网络模型和UE1设计部署的第一神经网络模型而得到的,也可根据经验库或其它方式得到,所述T用于蒸馏出第一参考神经网络模型对待识别的流量数据与所述第一参考神经网络模型中预设流量类别对比分析所得到的信息。RAN根据所述第一配置信息中的蒸馏温度信息T配置归一化Softmax(T)函数,通过所 述知识类型KnowledgeType获知第一指示信息的类型为第一参考神经网络模型进行流量识别后输出的软目标。然后,将第一指示信息发送给UE1(本实施例以知识类型KnowledgeType为第一参考神经网络模型输出的软目标Soft target为例)。
7、UE1向RAN发送第一数据信息,所述第一数据信息包括第一样本数据;所述第一样本数据为UE1对所述第一神经网络模型进行流量识别的蒸馏训练所需的训练数据。此外,所述第一样本数据中包括索引Index用于指示每条流量数据。例如,所述第一样本数据包括:3D电影的流量数据、电影传输的流量数据以及人机交互的流量数据,其索引Index分别为1、2、3,用于指示3D电影的流量数据、电影传输的流量数据以及人机交互的流量数据。
8、RAN将所述第一样本数据通过第一参考神经网络模型进行处理,得到第一指示信息,并将所述第一指示信息发送给UE1;所述第一指示信息为所述第一参考神经网络模型基于第一样本数据进行流量识别的过程中所得到的信息。RAN得到第一指示信息的过程为:RAN将所述第一样本数据信息(3D电影的流量数据、电影传输的流量数据以及人机交互的流量数据)输入第一参考神经网络模型中,所述第一参考神经网络模型会将3D电影的流量数据、电影传输的流量数据以及人机交互的流量数据分别与第一参考神经网络模型中的3D视频、高清视频传输、AR/VR这三个预设类别进行对比分析,并输出3D电影的流量数据与3D视频、高清视频传输、AR/VR这三个预设类别流量的相似度A1、A2和A3;电影传输的流量数据与3D视频、高清视频传输、AR/VR这四个预设类别流量的相似度B1、B2和B3;人机交互的流量数据与3D视频、高清视频传输、AR/VR这三个预设类别流量的相似度C1、C2和C3。由于第一参考神经网络模型是已经训练好的神经网络模型,能够准确识别输入的流量数据与哪个预设类别的相似度最高,因此,在A1、A2和A3中,A1的数值最高接近于100%,A2和A3的数值很小,接近于0;在B1、B2和B3中,B2的数值最高接近于100%,B1和B3的数值很小,接近于0;在C1、C2和C3中,C3的数值最高接近于100%,C1和C2的数值很小,接近于0。为了提高第一参考神经网络模型输出的信息量,使得每一条样本数据信息中的流量数据与预设类别的相似度的数值分布更加平缓,将A1、A2和A3作为归一化Softmax(T)函数的输入,输出a1、a2和a3,所述a1、a2和a3为第一参考神经网络模型输出的软目标1,并使用索引Index 1用于指示软目标1(软目标的索引Index与第一样本数据信息中的流量数据的索引Index相对应)。同理,使用相同的方法处理B1~B3、C1~C3,得到软目标2(b1~b3)和软目标3(c1~c3),用索引Index 2和3分别指示软目标2和软目标3。所述软目标1、软目标2和软目标3为第一指示信息。然后,RAN将所述第一指示信息发送给UE1。
9、UE1接收第一指示信息,并使用所述第一指示信息蒸馏训练第一神经网络模型对于其包含的各类别的流量识别功能。UE1使用所述第一指示信息对所述第一神经网络模型进行蒸馏训练的具体过程为:UE1将所述第一样本数据信息(3D电影的流量数据、电影传输的流量数据以及人机交互的流量数据)输入第一神经网络模型中,所述第一神经网络模型会将3D电影的流量数据、电影传输的流量数据以及人机交互的流量数据分别与第一神经网络模型中的3D视频和高清视频传输这两个预设类别进行对比分析,并输出3D电影的流量数据与3D视频和高清视频传输这两个预设类别流量的相似度D1和D2;电影传输的流 量数据与3D视频和高清视频传输这两个预设类别流量的相似度E1和E2;人机交互的流量数据与3D视频和高清视频传输这两个预设类别流量的相似度F1和F2。由于第一神经网络模型是一个蒸馏训练的神经网络模型,对于流量识别的输出的结果置信度很低。例如,第一神经网络模型可能认为3D电影的流量数据与高清视频传输类别的流量数据相似度高,电影传输的流量数据与3D视频类别的流量数据相似度高,人机交互的流量数据与高清视频传输类别的流量数据相似度高。因此,D2的值很高,接近于100%,D1的值很低,接近于0;E1的值很高,接近于100%,E2的值很低,接近于0;F2的值很高,接近于100%,F1的值很低,接近于0。为了提高第一神经网络模型输出的信息量,使得每一条样本数据信息中的流量数据与预设类别的相似度的数值分布更加平缓,将D1和D2作为归一化Softmax(T)函数的输入(Softmax函数中的蒸馏温度信息T与第一参考神经网络模型所用的T一致),输出d1和d2,所述d1和d2为第一神经网络模型输出的软目标4,并使用索引Index 1用于指示软目标4(软目标的索引Index与第一样本数据信息中的流量数据的索引Index相对应)。同理,使用相同的方法处理E1~E2、F1~F2,得到软目标5(e1~e2)和软目标6(f1~f2),用索引Index 2和3分别指示软目标5和软目标6。然后UE1将软目标4与所述第一指示信息中的软目标1进行对比分析,将软目标5与所述第一指示信息中的软目标2进行对比分析,将软目标6与所述第一指示信息中的软目标3进行对比分析,以此获知第一神经网络模型进行流量识别输出的结果与第一参考神经网络模型进行流量识别输出结果的差异,得到损失函数Fn1。另外,边缘网元将D1~D2、E1~E2和F1~F2分别作为Softmax(T=1)的输入,得到D2′、E1′和F2′(D2′只含有D2的信息,E1′只含有E1的信息,F3′只含有F3的信息),所述D2′、E1′和F3′分别为第一神经网络模型输出的硬目标1、硬目标2和硬目标3。然后将所述硬目标1与预设结果1进行对比分析,将所述硬目标2与预设结果2进行对比分析,将所述硬目标3与预设结果3进行对比分析得到损失函数Fn2。其中,所述预设结果1、预设结果2和预设结果3是存储在UE1中的第一数据信息内的流量数据信息对应的预设类别的流量数据。例如,预设结果1为3D电影的流量数据识别属于3D视频流量类别,预设结果2为电影传输的流量数据属于高清视频传输的流量类别,预设结果3为人机交互的流量数据属于AR/VR的流量类别。预设结果用于第一神经网络模型进行流量识别任务时,对其输出结果进行比对,判断所述第一神经网络模型输出的结果是否正确。例如,当第一神经网络模型的输出结果为:将3D电影的流量数据识别为AR/VR类别的流量,显然是不正确的,因为按照预设结果1,3D电影的流量数据识别属于3D视频流量类别。然后UE1基于损失函数Fn1和Fn2调整第一神经网络模型,以降低损失函数Fn1和Fn2的数值,从而强化所述第一神经网络模型对于流量识别的泛化能力,实现UE1对第一神经网络模型的蒸馏训练。
10、UE1向RAN发送第一训练结果通知,用于通知RAN第一神经网络模型的蒸馏训练完成,反馈蒸馏训练完成的第一神经网络模型的信息。
11、RAN释放与UE1之间的蒸馏训练通信连接的信道资源。
在本实施例中,UE1蒸馏训练神经网络模型之前,要和RAN建立蒸馏训练的通信连接,通过UE1和RAN之间蒸馏训练信息的交互,RAN使用UE1发送的第一配置信息,配置第一参考神经网络模型;其中,所述第一参考神经网络模型是已经训练好的神经网络模型。 RAN配置好第一参考神经网络模型后,使用第一数据信息,将所述第一数据信息中的第一样本数据通过已配置好的第一参考神经网络模型进行运算处理后,得到第一指示信息,所述第一指示信息为第一参考神经网络模型蒸馏训练所得到的“知识”。RAN将第一指示信息发送给UE1,UE1使用所述第一指示信息训练其设计的第一神经网络模型。通过这种方式,实现了RAN通过迁移“知识”来指导UE1的蒸馏训练。该方法使得RAN和UE1进行分离式蒸馏训练,令UE1可进行个性化小网络的设计和训练,实现神经网络模型大小和神经网络模型精度的平衡,提升了蒸馏训练的效率,节省了UE1的存储空间。
请参见图4,图4是本申请实施例提供的一种模型蒸馏训练的流程示意图。第一设备对神经网络模型进行蒸馏训练之前与第二设备建立蒸馏训练的通信连接,通过与第二设备进行相关蒸馏训练信息的交互,实现第二设备“知识”的迁移,从而指导第一设备对神经网络模型进行蒸馏训练。在图4中,以第一设备为边缘网元,第二设备为中心网元为例,边缘网元对神经网络模型进行蒸馏训练的步骤如下:
步骤S401:边缘网元向中心网元发送第一训练请求。
具体地,边缘网元在蒸馏训练神经网络模型之前,要和中心网元建立蒸馏通信连接。边缘网元首先向中心网元发送第一训练请求,所述第一训练请求用于边缘网元向中心网元告知边缘网元有蒸馏训练某一功能类型的神经网络模型的需求。另外,第一训练请求包括第一训练类型ID,所述第一训练ID用于指示边缘网元蒸馏训练神经网络模型的功能类型;所述功能类型有图像识别、人脸识别等。
步骤S402:中心网元向边缘网元发送第一响应。
具体地,中心网元接收到边缘网元发送的第一训练请求之后,将第一训练类型ID与中心网元的第二训练类型ID进行对比,若第一训练类型ID与第二训练类型ID一致,则说明中心网元有支持边缘网元蒸馏训练的参考神经网络模型,所述参考神经网络模型具备第一训练类型ID所指示的功能类型。其中,所述第二训练类型ID为中心网元支持蒸馏训练的功能类型。中心网元基于第二训练类型ID从中心网元部署的多个参考神经网络模型中找到第一参考神经网络模型,所述第一参考神经网络模型具备第二训练类型ID所指示的功能类型。中心网元向边缘网元发送第一响应,所述第一响应包括第一参考神经网络模型的ID以及第一存储信息。其中,所述第一存储信息用于指示第一参考神经网络模型所占的存储空间的大小。
若第一训练类型ID与第二训练类型ID不一致,则说明中心网元没有与第一训练类型ID相对应的参考神经网络模型。那么,中心网元不能根据边缘网元发送的第一数据信息通过参考神经网络模型进行蒸馏,得到第一指示信息,由于边缘网元缺少第一指示信息,边缘网元不能使用第一指示信息蒸馏训练神经网络模型,进而,不能实现中心网元指导边缘网元蒸馏训练神经网络模型。本申请实施例以第一训练类型ID与第二训练类型ID一致为例。
步骤S403:边缘网元基于第一存储信息向中心网元发送第一询问信息。
具体地,所述第一询问信息包括第一蒸馏询问信息和第一蒸馏能力信息。所述第一蒸馏询问信息是边缘网元基于第一存储信息向中心网元发送的,具体为:当边缘网元的资源 存储等能力超过某一阈值,所述阈值比所述第一存储信息所指示的第一参考神经网络模型所占的存储空间大时,所述第一蒸馏询问信息用于询问中心网元是否支持以向边缘网元发送第一指示信息的方式支持边缘网元的蒸馏训练;当边缘网元的资源存储能力低于所述第一存储信息所指示的第一参考神经网络模型所占的存储空间时,所述第一蒸馏询问信息用于询问中心网元是否支持以中心网元蒸馏训练得到小网络,并将所述小网络下发给边缘网元的方式支持边缘网元的蒸馏训练。本实施例以当边缘网元的资源存储等能力超过某一阈值,能够进行小网络的设计为例。因此,所述第一蒸馏询问信息用于询问中心网元是否支持边缘网元的蒸馏训练,以及用何种方式支持训练。
第一蒸馏能力信息用于指示边缘网元蒸馏训练的能力,包括以下一项或多项:神经网络模型的设计能力、神经网络模型的蒸馏训练能力以及边缘网元是否具有蒸馏训练数据信息。神经网络模型的设计能力用于指示边缘网元是否具备独立设计小神经网络模型的能力。神经网络模型的蒸馏训练能力用于指示在边缘网元能够独立设计第一神经网络模型的情况下,是否具有蒸馏训练第一神经网络模型的能力。所述神经网络模型的蒸馏训练能力包括以下一项或多项:边缘网元是否支持算法训练的训练包、是否具有GPU芯片以及GPU芯片的张量核心、每秒浮点运算次数、显存带宽、CPU的主频大小、CPU的缓存大小、架构类别以及存储空间等硬件能力。蒸馏训练数据为边缘网元的第一数据信息,用于中心网元基于所述第一数据信息中的第一采样数据得到第一指示信息,指导边缘网元进行蒸馏训练。本申请实施例以边缘网元具备神经网络模型设计能力、神经网络模型的蒸馏训练能力和蒸馏训练数据信息为例。
步骤S404:中心网元基于第一蒸馏询问信息生成第二响应,并将第二响应发送给边缘网元。
具体地,中心网元接收到所述第一蒸馏询问信息后,根据自身的存储资源、硬件性能等参数向边缘网元发送第二响应,所述第二响应,用于指示中心网元支持对边缘网元的蒸馏训练,并以向边缘网元发送第一指示信息的方式支持边缘网元进行蒸馏训练。
步骤S405:边缘网元设计第一神经网络模型。
具体地,若边缘网元设计第一神经网络模型需要参考第一参考神经网络模型的结构信息且已加载第一参考神经网络模型的结构信息时,边缘网元以所述第一参考神经网络模型的结构信息为参考,设计小网络,所述小网络为第一神经网络模型。
在一种可能的实现方式中,若边缘网元设计第一神经网络模型需要参考第一参考神经网络模型的结构信息,未加载第一参神经网络模型的结构信息时,边缘网元向中心网元发送第一网络结构请求,所述第一网络结构请求用于边缘网元请求中心网元提供第一参考网络结构模型的结构信息。中心网元基于所述第一网络结构请求向边缘网元发送第一结构请求响应,所述第一结构请求响应包含第一参考神经网络模型的结构信息。边缘网元以第一参考神经网络模型的结构信息为参考,设计小网络,所述小网络为第一神经网络模型。
步骤S406:边缘网元向中心网元发送第一配置信息。
具体地,边缘网元向中心网元发送的第一配置信息用于中心网元配置第一参考神经网络模型,所述第一配置信息包括第一蒸馏温度信息,所述第一蒸馏温度信息为归一化指数函数Softmax的温度参数T,温度参数T是根据第一神经网络模型与第一参考神经网络模 型的大小差异所得到的,用于提高第一参考神经网络模型蒸馏后输出的信息量。第一配置信息还包括知识类型KnowledgeType和第一参考神经网络模型的ID,所述知识类型KnowledgeType用于指示边缘网元所需要的第一指示信息的类型,其中,第一指示信息的类型包括:第一参考神经网络模型输出的软目标、第一参考神经网络模型中间输出特征信息以及第一参考神经网络模型内网络层之间的关系等。第一参考神经网络模型的ID用于将所述第一配置信息与第一参考神经网络模型进行关联,避免与中心网元的其它神经网络模型混淆。
步骤S407:边缘网元向中心网元发送第一数据信息。
具体地,边缘网元向中心网元发送的第一数据信息用于中心网元将所述第一数据信息通过第一参考神经网络模型进行处理,得到第一指示信息。所述第一数据信息包括第一样本数据,所述第一样本数据为第一参考神经网络模型用于蒸馏训练的样本数据;所述第一样本信息还包括样本数据索引Index和第一参考神经网络模型的ID。其中,所述样本数据索引Index为每条第一样本数据的标签,用于指示第一样本数据;第一参考神经网络模型的ID用于将所述第一配置信息与第一参考神经网络模型进行关联,避免与中心网元的其它神经网络模型混淆。
步骤S408:中心网元将第一样本数据通过第一参考神经网络模型进行计算处理,得到第一指示信息,并将所述第一指示信息发送给边缘网元。
具体地,第一指示信息包括蒸馏知识DistillationKnowledge和第一参考神经网络模型的ID,所述第一指示信息用于指导边缘网元训练第一神经网络模型。中心网元将第一样本数据按照Index逐条输入到第一参考神经网络模型中进行计算和处理,具体过程如下:输入的第一样本数据分别与第一参考神经网络模型中的预设类别进行对比分析,得到第一指示信息,并将所述样本数据索引Index与所述蒸馏知识DistillationKnowledge。然后,将第一指示信息发送给边缘网元。第一参考神经网络模型的ID用于将所述蒸馏知识DistillationKnowledge与第一参考神经网络模型进行关联,避免与中心网元的其它神经网络模型混淆。
步骤S409:边缘网元使用第一指示信息对第一神经网络模型进行蒸馏训练。
具体地,若边缘网元收到的第一指示信息的类型为第一参考神经网络模型输出的软目标时,即所述第一指示信息为第一样本数据与第一参考神经网络模型预设类别的相似度时,边缘网元使用第一指示信息对第一神经网络模型进行蒸馏训练的过程为:边缘网元将第一样本数据按照Index逐条输入到第一神经网络模型中,然后将第一样本数据分别与第一神经网络模型中的预设类别进行对比分析,得到第一样本数据与所述预设类别的相似度。然后,边缘网元将所述相似度通过Softmax函数(第一神经网络模型Softmax函数配置的温度参数T与第一参考神经网络模型Softmax配置的温度参数T是相同的)输出软目标,将所述软目标与第一参考神经网络模型输出的软目标进行对比分析,得到损失函数Fn1。此外,边缘网元将第一样本数据与所述预设类别的相似度通过另一个Softmax函数(配置的温度参数T=1)进行处理得到一个硬目标,将所述硬目标与第一样本数据的预设结果进行对比分析,得到损失函数Fn2。边缘网元基于Fn1和Fn2调整第一神经网络模型,以降低损失函数Fn1和Fn2的数值,增强第一神经网络模型的泛化能力,实现边缘网元对第一神 经网络模型的蒸馏训练。
在一种可能的实现方式中,若边缘网元收到的第一指示信息的类型为第一参考神经网络模型中间输出特征信息时,即所述第一指示信息为第一样本数据与第一参考神经网络模型预设类别进行对比分析时,第一参考神经网络模型的中间网络层输出的第一特征信息,边缘网元使用第一指示信息对第一神经网络模型进行蒸馏训练的过程为:边缘网元将第一样本数据按照Index逐条输入到第一神经网络模型中,得到所述第一样本数据与第一神经网络模型预设类别进行对比分析时,第一神经网络模型中间层输出的第二特征信息。然后将所述第二特征信息与第一特征信息进行对比分析,得到损失函数Fn3,边缘网元基于Fn3调整第一神经网络模型的参数,以降低损失函数Fn3的数值,增强第一神经网络模型的泛化能力,实现边缘网元对第一神经网络模型的蒸馏训练。
在一种可能的实现方式中,若边缘网元收到的第一指示信息的类型为第一参考神经网络模型网络层之间的关系时,边缘网元使用第一指示信息对第一神经网络模型进行蒸馏训练的过程为:边缘网元将第一样本数据按照Index逐条输入到第一神经网络模型中,然后将第一样本数据分别与第一神经网络模型中的预设类别进行对比分析,得到第一神经网络模型中各个网络层的关系,并将所述第一神经网络模型中各个网络层的关系与所述第一指示信息进行对比分析,得到损失函数Fn4。边缘网元基于Fn4调整第一神经网络模型的参数,以降低损失函数Fn4的数值,增强第一神经网络模型的泛化能力,实现边缘网元对第一神经网络模型的蒸馏训练。
步骤S410:边缘网元向中心网元发送第一训练结果通知。
具体地,边缘网元对第一神经网络模型的蒸馏训练完成后,向中心网元发送第一训练结果通知,所述第一训练结果通知用于告知中心网元蒸馏训练是否完成;在边缘网元的蒸馏训练已经完成的情况下,向中心网元反馈边缘网元对训练好的第一神经网络模型的信息(例如,边缘网元对蒸馏训练好的第一神经网络模型是否接收等)。
需要说明的是,由于对蒸馏训练的神经网络模型有精度、泛化能力的指标等其它要求,边缘网元对第一神经网络模型的训练不是训练一次就能完成的,往往需要重复步骤S407~S409。边缘网元的蒸馏训练是否完成是由边缘网元在蒸馏训练中的损失函数Fn的收敛情况或者达到预设最大的迭代次数所决定的。例如,当边缘网元在蒸馏训练中的损失函数Fn达到预设的数值y时,可认为边缘网元蒸馏训练的第一神经网络模型达到需求的精度,那么,边缘网元的蒸馏训练已经完成。或者,当蒸馏训练已经达到边缘网元的预设次数后,结束蒸馏训练。
步骤S411:中心网元释放与边缘网元之间的蒸馏训练通信连接的信道资源。
具体地,中心网元接收所述第一训练结果通知后,判断边缘网元的蒸馏训练是否已经完成;在边缘网元的蒸馏训练已经完成的情况下,释放中心网元与边缘网元之间蒸馏训练的通信连接资源,避免占用中心网元和边缘网元的其他信道资源。
本实施例,边缘网元蒸馏训练神经网络模型之前,要和中心网元建立蒸馏训练的通信连接,通过边缘网元和中心网元之间蒸馏训练信息的交互,中心网元使用边缘网元发送的第一配置信息,配置第一参考神经网络模型;其中,所述第一参考神经网络模型是已经训练好的神经网络模型。中心网元配置好第一参考神经网络模型后,使用第一数据信息,将 所述第一数据信息中的第一样本数据通过已配置好的第一参考神经网络模型进行运算处理后,得到第一指示信息,所述第一指示信息为第一参考神经网络模型蒸馏训练所得到的“知识”。中心网元将第一指示信息发送给边缘网元,边缘网元使用所述第一指示信息训练其设计的第一神经网络模型。通过这种方式,实现了中心网元通过迁移“知识”来指导边缘网元的蒸馏训练。该方法使得中心网元和边缘网元进行分离式蒸馏训练,令边缘网元可进行个性化小网络的设计和训练,实现神经网络模型大小和神经网络模型精度的平衡,提升了蒸馏训练的效率,节省了边缘网元的存储空间。
请参见图5,图5是本申请实施例提供的另外一种模型蒸馏训练的流程示意图。当第一设备向第二设备发送第一配置信息之前,要向第二设备发送第一关注类别列表,第一设备接收的第一指示信息为第二设备基于所述第一关注类别列表所筛选出的信息。在图5中,以第一设备为边缘网元,第二设备为中心网元为例,边缘网元对神经网络模型进行蒸馏训练的步骤如下:
步骤S501~步骤S505请参考步骤S401~步骤S405,此处不再赘述。
步骤S506:边缘网元向中心网元发送第一关注类别列表。
具体地,边缘网元向中心网元发送的第一关注类别列表包括边缘网元蒸馏训练所关注的类别的集合;其中,所述边缘网元蒸馏训练所关注的类别的集合是第一类别列表中类别集合的子集;其中,所述第一类别列表包括第一参考神经网络模型预设类别的集合。边缘网元通过向中心网元发送第一关注类别列表,可以获取与边缘网元蒸馏训练所关注的类别所对应的第一指示信息,有利于实现边缘网元个性化地训练第一神经网络模型。
步骤S507~步骤S508请参考步骤S406~步骤S407,此处不再赘述。
步骤S509:中心网元将第一样本数据通过第一参考神经网络模型进行计算处理,并将处理后的第一样本数据基于第一关注类别列表进行筛选,得到第一指示信息,并将第一指示信息发送给边缘网元。
具体地,与步骤S408类似,中心网元将第一样本数据按照Index逐条输入到第一参考神经网络模型中,所述第一样本数据与第一参考神经网络模型中的预设类别进行对比分析得到指示信息,中心网元基于第一关注类别列表中的类别,从所述指示信息中筛选出与所述第一关注类别列表中类别对应的指示信息,得到第一指示信息,然后,中心网元将所述第一指示信息发送给边缘网元。
步骤S510~步骤S512请参见步骤S409~步骤S411,此处不再赘述。
本实施例,边缘网元在中心网元发送第一指示信息之前,向中心网元发送第一关注类别列表,用于指示中心网元将第一样本数据经过第一参考神经网络模型所得到的指示信息基于第一关注类别列表中的关注类别进行筛选。通过向中心网元发送第一关注类别列表的方式,中心网元将第一样本数据与第一参考神经网络模型中的预设类别进行对比分析所得到指示信息进行筛选,所得到的第一指示信息更加符合边缘网元蒸馏训练的需求,在不影响第一神经网络模型蒸馏训练的效果下进一步减少通信开销,节省分离式蒸馏训练的网络资源,节省边缘网元的存储空间。
请参见图6,图6是本申请实施例提供的另外一种模型蒸馏训练的流程示意图。第二设备指导第一设备进行神经网络模型的蒸馏训练之前,要通过第三设备建立第一设备与第二设备之间的蒸馏训练通信连接。在图6中,以第一设备为边缘NWDAF,第二设备为中心NWDAF,第三设备为网络存储功能装置(Network Repository Function,NRF)为例。边缘NWDAF对神经网络模型进行蒸馏训练的步骤如下:
步骤S601:中心NWDAF向NRF发送注册信息。
具体地,所述注册信息包括第三训练类型ID,用于指示所述中心NWDAF支持蒸馏训练的神经网络模型的功能类型;第三神经网络模型ID,用于指示与所述第三训练类型ID相对应的第二参考神经网络模型,所述第二参考神经网络模型具有第三训练类型ID所指示的功能;第二存储信息,用于指示所述第二参考神经网络模型所占存储空间的大小;训练响应信息,用于指示所述中心NWDAF是否支持蒸馏训练神经网络模型以及在支持蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式。所述注册信息还包括第二类别列表,所述第二类别列表包括中心NWDAF预设类别的集合;中心NWDAF的ID,用于指示所述中心NWDAF的设备号,避免多个中心NWDAF向NRF注册产生混淆。中心NWDAF向NRF发送注册信息,有利于NRF基于注册信息,匹配中心NWDAF和边缘NWDAF,进而建立起中心NWDAF和边缘NWDAF之间的蒸馏训练通信连接。
步骤S602:边缘NWDAF向网络NRF发送第二训练请求。
具体地,所述第二训练请求包括第四训练类型ID,用于指示边缘NWDAF待蒸馏训练的神经网络模型的功能类型;第二询问信息,用于边缘NWDAF询问NRF,中心NWDAF是否支持蒸馏训练,以及在支持蒸馏训练的情况下,支持蒸馏训练的方式;第二蒸馏能力信息,用于指示边缘NWDAF蒸馏训练的能力,所述第二蒸馏能力信息包括以下一项或多项:边缘NWDAF神经网络模型的设计能力、神经网络模型的训练能力、是否具有蒸馏训练数据。所述第二训练请求还包括排除集,所述排除集包括与边缘NWDAF建立蒸馏训练通信连接失败的中心NWDAF的ID,当NRF包含的注册信息中存在排除集中的中心NWDAF的ID时,NRF不再为所述中心NWDAF和边缘NWDAF建立蒸馏训练的通信连接。
步骤S603:NRF基于第二训练请求生成第三响应,并将第三响应发送给边缘NWDAF。
具体地,NRF接收到第二训练请求后,判断第三训练类型ID与第四训练类型ID是否一致,若一致则生成第三响应(本实施例以第三训练类型ID与第四训练类型ID一致为例),并将所述第三响应发送给边缘NWDAF。所述第三响应包括训练响应信息、第三神经网络模型ID、第二存储信息和第二类别列表,还包括中心NWDAF的ID。
步骤S604:边缘NWDAF向NRF发送蒸馏通知。
具体地,所述边缘NEDAF接收到第三响应后,根据所述第三响应中的训练响应信息(本实施例以所述训练响应信息指示所述中心NWDAF支持边缘NWDAF蒸馏训练神经网络模型,并以用向边缘NWDAF发送指示信息的方式指导边缘NWDAF蒸馏训练神经网络模型为例)获知所述中心NWDAF支持蒸馏训练以及蒸馏训练的方式;根据所述第三神经网络模型ID和第二存储信息获知第二参考神经网络模型的信息以及所述第二参考神经网络模型所占存储空间的大小。所述边缘NWDAF基于第三响应所获取的中心NWDAF的信 息,根据中心NWDAF的ID与中心NWDAF建立蒸馏训练的通信连接,通信连接建立完成后,向NRF发送蒸馏通知。所述蒸馏通知用于告知NRF,边缘NWDAF与中心NWDAF连接成功。然后,NRF根据所述蒸馏通知释放NRF与边缘NWDAF和中心NWDAF的通信资源。
步骤S605~步骤S611请参见步骤S405~步骤S411,此处不再赘述。
本实施例,通过中心NWDAF在NRF上注册训练类型ID以及训练响应等信息,边缘NWDAF向NRF发送训练请求,NRF可以快速有效地建立边缘NWDAF和中心NWDAF之间的蒸馏通信连接,从而提高蒸馏训练的效率,有利于边缘NWDAF蒸馏训练神经网络模型,接收边缘NWDAF的存储空间。
请参见图7,图7是本申请实施例提供的另外一种模型蒸馏训练的流程示意图。当第一设备向第二设备发送第二配置信息之前,向第二设备发送第二关注类别列表,第一设备接收的第二指示信息为第二设备基于所述第二关注类别列表所筛选出的信息。在图7中,以第一设备为边缘NWDAF,第二设备为中心NWDAF,第三设备为NRF为例,边缘NWDAF对神经网络模型进行蒸馏训练的步骤如下:
步骤S701~步骤S705请参考步骤S601~步骤S605,此处不再赘述。
步骤S706:边缘NWDAF向中心NWDAF发送第二关注类别列表。
具体地,边缘NWDAF向中心NWDAF发送的第二关注类别列表包括边缘NWDAF蒸馏训练所关注的类别的集合;其中,所述边缘NWDAF蒸馏训练所关注的类别的集合是第二类别列表中类别集合的子集;其中,所述第二类别列表包括第二参考神经网络模型预设类别的集合。边缘NWDAF通过向中心NWDAF发送第二关注类别列表,可以获取与边缘NWDAF蒸馏训练所关注的类别所对应的第二指示信息,有利于实现边缘NWDAF个性化地训练第二神经网络模型。
步骤S707~步骤S708请参考步骤S606~步骤S607,此处不再赘述。
步骤S709:中心NWDAF将第二样本数据通过第二参考神经网络模型进行计算处理,并将处理后的第二样本数据基于第二关注类别列表进行筛选,得到第二指示信息,并将第二指示信息发送给边缘NWDAF。
中心NWDAF将第二数据信息中的第二样本数据按照Index逐条输入到第二参考神经网络模型中,所述第二样本数据与第二参考神经网络模型中的预设类别进行对比分析得到指示信息,中心NWDAF基于第二关注类别列表中的类别,从所述指示信息中筛选出与所述第二关注类别列表中类别相关的指示信息,得到第二指示信息,然后,中心NWDAF将所述第二指示信息发送给边缘NWDAF。
步骤S710~步骤S712请参考步骤S609~步骤S611,此处不再赘述。
本实施例,边缘NWDAF在中心NWDAF发送第二指示信息之前,向中心NWDAF发送第二关注类别列表,用于指示中心NWDAF将第二样本数据经过第二参考神经网络模型所得到的指示信息基于第二关注类别列表中的关注类别进行筛选。通过向中心NWDAF发送第二关注类别列表的方式,中心NWDAF将第二样本数据与第二参考神经网络模型中的预设类别进行对比分析所得到指示信息进行筛选,所得到的第二指示信息更加符合边缘 NWDAF蒸馏训练的需求,在不影响第二神经网络模型蒸馏训练的效果下进一步减少通信开销,节省分离式蒸馏训练的网络资源,进而节省NWDAF的存储空间。
上述详细阐述了本申请实施例的方法,下面提供本实施例的相关装置。
请参见图8,图8是本申请实施例提供的一种模型蒸馏训练装置的结构示意图,所述模型蒸馏训练装置8可以为上述方法实施例中的第一设备,所述模型蒸馏训练装置8可以包括第一神经网络模型设计单元801、第一配置信息发送单元802、第一数据信息发送单元803、第一指示信息接收单元804以及第一蒸馏训练单元805,其中,各个单元的详细描述如下:
第一神经网络模型设计单元801,用于若第二设备支持第一设备蒸馏训练神经网络模型,设计第一神经网络模型;
第一配置信息发送单元802,用于向所述第二设备发送第一配置信息;
第一数据信息发送单元803,用于向所述第二设备发送第一数据信息;
第一指示信息接收单元804,用于接收所述第二设备返回的第一指示信息;
第一蒸馏训练单元805,用于使用所述第一指示信息训练所述第一神经网络模型。
在一种可能的实现方式中,所述模型蒸馏训练装置8还包括:
第一训练请求发送单元,用于所述第一神经网络模型设计单元801设计第一神经网络模型之前,向所述第二设备发送第一训练请求;
第一响应接收单元,用于第一训练请求发送单元向所述第二设备发送第一训练请求之后,若第二训练类型ID与所述第一训练类型ID一致,接收所述第二设备发送的第一响应;
第一询问信息发送单元,用于基于所述第一存储信息向所述第二设备发送第一询问信息;
第二响应接收单元,用于所述第一询问信息发送单元向所述第二设备发送第一询问信息之后,接收所述第二设备的第二响应。
在一种可能实现的方式中,所述模型蒸馏训练装置8,还包括:
第一关注类别列表发送单元,用于所述第一配置信息发送单元802向所述第二设备发送第一配置信息之前,向所述第二设备发送第一关注类别列表。
在一种可能实现的方式中,所述模型蒸馏训练装置8,还包括:
第一网络结构请求发送单元,用于所述第一神经网络模型设计单元801设计第一神经网络模型之前,向所述第二设备发送第一网络结构请求;
第一结构请求响应接收单元,用于所述第一网络结构请求发送单元向所述第二设备发送第一网络结构请求之后,接收所述第二设备发送的第一结构请求响应。
请参见图9,图9是本申请实施例提供的一种模型蒸馏训练装置的结构示意图,所述模型蒸馏训练装置9可以为上述方法实施例中的第二设备,所述模型蒸馏训练装置9可以 包括第一配置信息接收单元901、第一参数配置单元902、第一数据信息接收单元903和第一指示信息发送单元904,其中,各个单元的详细描述如下:
第一配置信息接收单元901,用于接收第一设备发送的第一配置信息;
第一参数配置单元902,用于使用所述第一配置信息配置第一参考神经网络模型;
第一数据信息接收单元903,用于接收所述第一设备发送的第一数据信息;
第一指示信息发送单元904,用于向所述第一设备发送第一指示信息。
在一种可能实现的方式中,所述模型蒸馏训练装置9,还包括:
第一训练请求接收单元,用于所述第一参数配置单元902配置所述第一参考神经网络模型之前,接收所述第一设备发送的第一训练请求;
第一响应发送单元,用于若第二训练类型ID与第一训练类型ID一致,基于所述第一训练请求,生成第一响应,并将所述第一响应发送给所述第一设备;
第一询问信息接收单元,用于接收所述第一设备发送的第一询问信息;
第二响应发送单元,用于所述第一询问信息接收单元接收所述第一设备发送的第一询问信息后,将所述第二响应发送给所述第一设备。
在一种可能实现的方式中,所述模型蒸馏训练装置9,还包括:
第一关注类别列表接收单元,用于所述第一配置信息接收单元接收所述第一设备发送的第一配置信息之前,接收所述第一设备发送的第一关注类别列表。
在一种可能实现的方式中,所述模型蒸馏训练装置9,还包括:
第一网络结构请求接收单元,用于所述第一配置信息接收单元901接收所述第一设备发送的第一配置信息之前,接收所述第一设备发送的第一网络结构请求;
第一结构请求响应发送单元,用于向所述第一设备发送第一结构请求响应。
请参见图10,图10是本申请实施例提供的一种模型蒸馏训练装置的结构示意图,所述模型蒸馏训练装置10可以为上述方法实施例中的第三设备,所述模型蒸馏训练装置10可以包括注册信息接收单元1001、第二训练请求接收单元1002、第三响应发送单元1003和蒸馏通知接收单元1004,其中,各个单元的详细描述如下:
注册信息接收单元1001,用于接收第二设备发送的注册信息;
第二训练请求接收单元1002,用于接收所述第一设备发送的第二训练请求;
第三响应发送单元1003,用于若第四训练类型ID与第三训练类型ID一致,基于所述第二训练请求生成第三响应,并将所述第三响应发送给所述第一设备;
蒸馏通知接收单元1004,用于接收所述第一设备发送的蒸馏通知。
请参见图11,图11是本申请实施例提供的一种模型蒸馏训练装置的结构示意图,所述模型蒸馏训练装置11可以为上述方法实施例中的第一设备,所述模型蒸馏训练装置11可以包括第二训练请求发送单元1101、第三响应接收单元1102和蒸馏通知发送单元1103, 其中,各个单元的详细描述如下:
第二训练请求发送单元1101,用于向第三设备发送第二训练请求;
第三响应接收单元1102,用于若第四训练类型ID与第三训练类型ID一致,接收所述第三设备发送的第三响应;
蒸馏通知发送单元1103,用于向所述第三设备发送蒸馏通知。
在一种可能实现的方式中,所述模型蒸馏训练装置11,还包括:
第二神经网络模型设计单元,用于所述蒸馏通知发送单元1103向所述第三设备发送蒸馏通知之后,基于第二参考神经网络模型的结构信息设计第二神经网络模型;
第二配置信息发送单元,用于向第二设备发送第二配置信息;
第二数据信息发送单元,用于向所述第二设备发送第二数据信息;
第二指示信息接收单元,用于接收所述第二设备返回的第二指示信息;
第二蒸馏训练单元,用于使用所述第二指示信息对所述第二神经网络模型进行蒸馏训练。
在一种可能实现的方式中,所述模型蒸馏训练装置11,还包括:
第二关注类别列表发送单元,用于所述第二配置信息发送单元向第二设备发送第二配置信息之后,向所述第二设备发送第二关注类别列表。
请参见图12,图12是本申请实施例提供的一种模型蒸馏训练装置的结构示意图,所述模型蒸馏训练装置12可以为上述方法实施例中的第二设备,所述模型蒸馏训练装置12可以包括注册信息发送单元1201、第二配置信息接收单元1202、第二数据信息接收单元1203和第二指示信息发送单元1204,其中,各个单元的详细描述如下:
注册信息发送单元1201,用于向第三设备发送注册信息;
第二配置信息接收单元1202,用于接收第一设备发送的第二配置信息;
第二数据信息接收单元1203,用于接收所述第一设备发送的第二数据信息;
第二指示信息发送单元1204,用于向所述第一设备发送第二指示信息。
在一种可能实现的方式中,所述模型蒸馏训练装置12,还包括:
第二关注类别列表接收单元,用于所述第二配置信息接收单元1202接收第一设备发送的第二配置信息之前,接收所述第一设备发送的第二关注类别列表。
在一种可能实现的方式中,所述模型蒸馏训练装置12,还包括:
第二网络结构请求接收单元,用于第二配置信息接收单元1202接收第一设备发送的第二配置信息之前,接收所述第一设备发送的第二网络结构请求;
第二结构请求响应发送单元,用于基于所述第二网络结构请求向所述第一设备发送第二结构请求响应。
请参见图13,图13是本申请实施例提供的一种模型蒸馏训练设备的结构示意图,所述模型蒸馏训练设备13可以为上述方法实施例中的第一设备,所述模型蒸馏训练设备13可以包括存储器1301、通信模块1302和处理器1303,其中,各个单元的详细描述如下:
存储器1301用于存储程序代码。
处理器1303用于调用存储器存储的程序代码执行如下步骤:
若所述第二设备支持所述第一设备蒸馏训练神经网络模型,设计第一神经网络模型;
通过通信模块1302向第二设备发送第一配置信息;
通过通信模块1302向第二设备发送第一数据信息;
通过通信模块1302接收所述第二设备返回的第一指示信息;
使用所述第一指示信息训练所述第一神经网络模型。
在一种可能的实现方式中,处理器1303还用于:在设计第一神经网络模型之前,通过通信模块1302向第二设备发送第一训练请求;
若第二训练类型ID与所述第一训练类型ID一致,通过通信模块1302接收所述第二设备发送的第一响应;
通过通信模块1302向所述第二设备发送第一询问信息;
通过通信模块1302接收所述第二设备的第二响应。
在一种可能的实现方式中,处理器1303还用于:通过通信模块1302向所述第二设备发送第一关注类别列表。
在一种可能的实现方式中,处理器1303还用于:在设计所述第一神经网络模型之前,通过通信模块1302向所述第二设备发送第一网络结构请求;
通过通信模块1302接收所述第二设备发送的第一结构请求响应。
请参见图14,图14是本申请实施例提供的一种模型蒸馏训练设备的结构示意图,所述模型蒸馏训练设备14可以为上述方法实施例中的第二设备,所述模型蒸馏训练设备14可以包括存储器1401、通信模块1402和处理器1403,其中,各个单元的详细描述如下:
存储器1401用于存储程序代码。
处理器1403用于调用存储器存储的程序代码执行如下步骤:
通过通信模块1402接收第一设备发送的第一配置信息;
通过通信模块1402接收第一设备发送的第一数据信息;
通过通信模块1402向所述第一设备发送第一指示信息。
在一种可能的实现方式中,处理器1403,还用于:通过通信模块1402接收所述第一设备发送的第一训练请求;
生成第一响应;
通过通信模块1402将所述第一响应发送给所述第一设备;
通过通信模块1402接收所述第一设备发送的第一询问信息;
生成第二响应;
通过通信模块1402将所述第二响应发送给所述第一设备。
在一种可能的实现方式中,处理器1403,还用于:通过通信模块1402接收所述第一 设备发送的第一配置信息之前,通过通信模块1402接收所述第一设备发送的第一关注类别列表。
在一种可能的实现方式中,处理器1403,还用于:通过通信模块1402接收所述第一设备发送的第一配置信息之前,通过通信模块1402接收所述第一设备发送的第一网络结构请求;通过通信模块1402向所述第一设备发送第一结构请求响应。
请参见图15,图15是本申请实施例提供的一种模型蒸馏训练设备的结构示意图,所述模型蒸馏训练设备15可以为上述方法实施例中的第三设备,所述模型蒸馏训练设备15可以包括存储器1501、通信模块1502和处理器1503,其中,各个单元的详细描述如下:
存储器1501用于存储程序代码。
处理器1503用于调用存储器存储的程序代码执行如下步骤:
通过通信模块1502接收第二设备发送的注册信息;
通过通信模块1502接收第一设备发送的第二训练请求;
生成第三响应;
通过通信模块1502将所述第三响应发送给所述第一设备;
通过通信模块1502接收所述第一设备发送的蒸馏通知。
请参见图16,图16是本申请实施例提供的一种模型蒸馏训练设备的结构示意图,所述模型蒸馏训练设备16可以为上述方法实施例中的第一设备,所述模型蒸馏训练设备16可以包括存储器1601、通信模块1602和处理器1603,其中,各个单元的详细描述如下:
存储器1601用于存储程序代码。
处理器1603用于调用存储器存储的程序代码执行如下步骤:通过通信模块1602向第三设备发送第二训练请求;若第四训练类型ID与第三训练类型ID一致,通过通信模块1602接收所述第三设备发送的第三响应;
通过通信模块1602向所述第三设备发送蒸馏通知。
在一种可能的实现方式中,处理器1603还用于:通过通信模块1602向所述第三设备发送蒸馏通知之后,设计第二神经网络模型;
通过通信模块1602向第二设备发送第二配置信息;
通过通信模块1602向第二设备发送第二数据信息;
接收所述第二设备返回的第二指示信息;
使用所述第二指示信息训练所述第二神经网络模型。
在一种可能的实现方式中,处理器1603还用于:通过通信模块1602向第二设备发送第二配置信息之前,通过通信模块1602向所述第二设备发送第二关注类别列表。
使用所述第二指示信息对所述第二神经网络模型进行蒸馏训练。
在一种可能的实现方式中,处理器1603还用于:设计第二神经网络模型之前,通过通信模块1602向所述第二设备发送第二网络结构请求;
通过通信模块1602接收所述第二设备发送的第二结构请求响应。
请参见图17,图17是本申请实施例提供的一种模型蒸馏训练设备的结构示意图,所述模型蒸馏训练设备17可以为上述方法实施例中的第二设备,所述模型蒸馏训练设备17可以包括存储器1701、通信模块1702和处理器1703,其中,各个单元的详细描述如下:
存储器1701用于存储程序代码。
处理器1703用于调用存储器存储的程序代码执行如下步骤:
通过通信模块1702向第三设备发送注册信息;
通过通信模块1702接收第一设备发送的第二配置信息;
通过通信模块1702接收所述第一设备发送的第二数据信息;
通过通信模块1702向所述第一设备发送第二指示信息。
在一种可能的实现方式中,处理器1703,还用于:通过通信模块1702接收第一设备发送的第二配置信息之前,通过通信模块1702接收所述第一设备发送的第二关注类别列表。
在一种可能的实现方式中,处理器1703,还用于:通过通信模块1702接收第一设备发送的第二配置信息之前,通过通信模块1702接收所述第一设备发送的第二网络结构请求;
通过通信模块1702向所述第一设备发送第二结构请求响应。
本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时,实现上述实施例及其各种可能的实现方式中的模型蒸馏训练方法。
本申请实施例提供了一种计算机程序,该计算机程序包括指令,当所述计算机程序被计算机执行时,使得第一设备可以执行上述实施例及其各种可能的实现方式中第一设备所执行的流程、或第二设备可以执行上述实施例及其各种可能的实现方式中第二设备所执行的流程、使得第三设备可以执行上述实施例及其各种可能的实现方式中第三设备所执行的流程。
本申请实施例提供了一种芯片系统,该芯片系统包括处理器,用于支持第一设备实现上述实施例及其各种可能方式中的方法所涉及的功能、或第二设备实现上述实施例及其各种可能方式中的方法所涉及的功能、或第三设备实现上述实施例及其各种可能方式中的方法所涉及的功能。
在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存第一设备或第二设备或第三设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其它分立器件。
可以理解,本领域普通技术人员可以意识到,结合本申请各个实施例中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本领域技术人员能够领会,结合本申请各个实施例中公开描述的各种说明性逻辑框、模块和算法步骤所描述的功能可以硬件、软件、固件或其任何组合来实施。如果以软件来实施,那么各种说明性逻辑框、模块、和步骤描述的功能可作为一或多个指令或代码在计算机可读媒体上存储或传输,且由基于硬件的处理单元执行。计算机可读媒体可包含计算机可读存储媒体,其对应于有形媒体,例如数据存储媒体,或包括任何促进将计算机程序从一处传送到另一处的媒体(例如,根据通信协议)的通信媒体。以此方式,计算机可读媒体大体上可对应于(1)非暂时性的有形计算机可读存储媒体,或(2)通信媒体,例如信号或载波。数据存储媒体可为可由一或多个计算机或一或多个处理器存取以检索用于实施本申请中描述的技术的指令、代码和/或数据结构的任何可用媒体。计算机程序产品可包含计算机可读媒体。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (27)

  1. 一种模型蒸馏训练的方法,其特征在于,包括:
    若第二设备支持第一设备蒸馏训练神经网络模型,所述第一设备设计第一神经网络模型;
    所述第一设备向所述第二设备发送第一配置信息;其中,所述第一配置信息用于所述第二设备配置所述第一参考神经网络模型;
    所述第一设备向所述第二设备发送第一数据信息;其中,所述第一数据信息包括所述第一参考神经网络模型用于蒸馏训练的第一样本数据;
    所述第一设备接收所述第二设备返回的第一指示信息,并使用所述第一指示信息训练所述第一神经网络模型;所述第一指示信息为所述第一样本数据输入到所述第一参考神经网络模型得到的信息。
  2. 如权利要求1所述的方法,其特征在于,所述第一设备设计第一神经网络模型之前,还包括:
    所述第一设备向所述第二设备发送第一训练请求;其中,所述第一训练请求包括第一训练类型ID,所述第一训练类型ID用于指示所述第一设备待蒸馏训练的神经网络模型的功能类型;
    若第二训练类型ID与所述第一训练类型ID一致,所述第一设备接收所述第二设备发送的第一响应;其中,所述第一响应包括第一存储信息;所述第一存储信息用于指示所述第一参考神经网络模型所占存储空间的大小;所述第二训练类型ID用于指示所述第二设备支持蒸馏训练的神经网络模型的功能类型;
    所述第一设备基于所述第一存储信息向所述第二设备发送第一询问信息;其中,所述第一询问信息包括第一蒸馏询问信息;所述第一蒸馏询问信息用于所述第一设备询问所述第二设备是否支持蒸馏训练神经网络模型以及在支持所述第一设备蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式;
    所述第一设备接收所述第二设备的第二响应。
  3. 如权利要求1-2任一项所述的方法,其特征在于,所述第一设备向所述第二设备发送第一配置信息之前,还包括:
    所述第一设备向所述第二设备发送第一关注类别列表;其中,所述第一关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第一关注类别列表中第一设备蒸馏训练所关注类别的集合为第一类别列表中类别集合的子集;所述第一类别列表包括所述第一参考神经网络模型预设类别的集合。
  4. 如权利要求3所述的方法,其特征在于,所述第一指示信息为所述第二设备将所述第一样本数据通过所述第一参考神经网络模型计算处理,并将处理后的第一样本数据基于所述第一关注类别列表进行筛选所得到的。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述第一设备设计第一神经网络模型,包括:
    所述第一设备向所述第二设备发送第一网络结构请求;其中,所述第一网络结构请求用于所述第一设备向所述第二设备获取第一参考神经网络模型的结构信息;
    所述第一设备接收所述第二设备发送的第一结构请求响应;其中,所述第一结构请求响应包括所述第一参考神经网络模型的结构信息;
    所述第一设备基于所述第一参考神经网络模型的结构信息设计第一神经网络模型。
  6. 一种模型蒸馏训练的方法,其特征在于,包括:
    第二设备接收第一设备发送的第一配置信息,并使用所述第一配置信息配置第一参考神经网络模型;
    所述第二设备接收所述第一设备发送的第一数据信息;其中,所述第一数据信息包括所述第一参考神经网络模型用于蒸馏训练的第一样本数据;
    所述第二设备向所述第一设备发送第一指示信息;其中所述第一指示信息用于所述第一设备训练第一神经网络模型;所述第一指示信息为所述第一样本数据输入到所述第一参考神经网络模型得到的信息。
  7. 如权利要求6所述的方法,其特征在于,所述第二设备接收第一设备发送的第一配置信息,使用所述第一配置信息配置第一参考神经网络模型之前,还包括:
    所述第二设备接收所述第一设备发送的第一训练请求;其中,所述第一训练请求包括第一训练类型ID,所述第一训练类型ID用于指示第一设备待蒸馏训练的神经网络模型的功能类型;
    若第二训练类型ID与所述第一训练类型ID一致,所述第二设备基于所述第一训练请求,生成第一响应,并将所述第一响应发送给所述第一设备;其中,所述第二训练类型ID用于指示所述第二设备支持蒸馏训练神经网络模型的功能类型;所述第一响应包括第一存储信息;所述第一存储信息用于指示所述第一参考神经网络模型所占存储空间的大小;
    所述第二设备接收所述第一设备发送的第一询问信息;其中,所述第一询问信息包括第一蒸馏询问信息;所述第一蒸馏询问信息用于所述第一设备询问所述第二设备是否支持蒸馏训练神经网络模型以及在支持所述第一设备蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式;
    所述第二设备基于所述第一蒸馏询问信息生成第二响应,并将所述第二响应发送给所述第一设备。
  8. 如权利要求6-7任一项所述的方法,其特征在于,所述第二设备接收所述第一设备发送的第一配置信息之前,还包括:
    所述第二设备接收所述第一设备发送的第一关注类别列表;其中,所述第一关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第一关注类别列表中,所述第一 设备蒸馏训练所关注类别的集合为第一类别列表中类别集合的子集;所述第一类别列表包括所述第一参考神经网络模型预设类别的集合。
  9. 如权利要求8所述的方法,其特征在于,所述第一指示信息为所述第二设备将所述第一样本数据通过所述第一参考神经网络模型计算处理,并将处理后的第一样本数据基于所述第一关注类别列表进行筛选所得到的。
  10. 如权利要求6-9任一项所述的方法,其特征在于,所述第二设备接收所述第一设备发送的第一配置信息之前,还包括:
    所述第二设备接收所述第一设备发送的第一网络结构请求;其中,所述第一网络结构请求用于所述第一设备向所述第二设备获取所述第一参考神经网络模型的结构信息;
    所述第二设备基于所述第一网络结构请求向所述第一设备发送第一结构请求响应;其中,所述第一结构请求响应包括所述第一参考神经网络模型的结构信息。
  11. 一种模型蒸馏训练的方法,其特征在于,包括:
    第三设备接收第二设备发送的注册信息;其中,所述注册信息包括第三训练类型ID、第三神经网络模型ID、第二存储信息、第二类别列表、训练响应信息;其中,所述训练响应信息用于指示所述第二设备是否支持蒸馏训练神经网络模型以及在支持蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式;所述第三训练类型ID用于指示所述第二设备支持蒸馏训练神经网络模型的功能类型;
    所述第三设备接收第一设备发送的第二训练请求;其中,所述第二训练请求包括第四训练类型ID、第二蒸馏询问信息和第二蒸馏能力信息;所述第四训练类型ID用于指示所述第一设备待蒸馏训练神经网络模型的功能类型;
    若第四训练类型ID与所述第三训练类型ID一致,所述第三设备基于所述第二训练请求生成第三响应,并将所述第三响应发送给所述第一设备;其中,所述第三响应包括训练响应信息、第三神经网络模型ID、第二存储信息和第二类别列表;
    所述第三设备接收所述第一设备发送的蒸馏通知;其中,所述蒸馏结果通知用于指示所述第一设备与所述第二设备是否匹配成功。
  12. 一种模型蒸馏训练的方法,其特征在于,包括:
    第一设备向第三设备发送第二训练请求;其中,所述第二训练请求包括第四训练类型ID、第二蒸馏询问信息和第二蒸馏能力信息;其中,所述第四训练类型ID用于指示第一设备待蒸馏训练的神经网络模型的功能类型;
    若所述第四训练类型ID与第三训练类型ID一致,所述第一设备接收所述第三设备发送的第三响应;其中,所述第三响应包括训练响应信息、第三神经网络模型ID、第二存储信息和第二类别列表;所述第三训练类型ID用于指示所述第二设备支持蒸馏训练神经网络模型的功能类型;
    所述第一设备向所述第三设备发送蒸馏通知;其中,所述蒸馏结果通知用于指示所述 第一设备与所述第二设备是否匹配成功。
  13. 如权利要求12所述的方法,其特征在于,所述第一设备向所述第三设备发送蒸馏通知之后,还包括:
    所述第一设备设计第二神经网络模型;
    所述第一设备向第二设备发送第二配置信息;其中,所述第二配置信息用于所述第二设备配置所述第二参考神经网络模型;
    所述第一设备向所述第二设备发送第二数据信息;其中,所述第二数据信息包括所述第二参考神经网络模型用于蒸馏训练的第二样本数据;
    所述第一设备接收所述第二设备返回的第二指示信息,并使用所述第二指示信息训练所述第二神经网络模型;所述第二指示信息为所述第二样本数据输入到所述第二参考神经网络模型得到的信息。
  14. 如权利要求13所述的方法,其特征在于,所述第一设备向第二设备发送第二配置信息之前,还包括:
    所述第一设备向所述第二设备发送第二关注类别列表;其中,所述第二关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第二关注类别列表中,第一设备蒸馏训练所关注类别的集合为第二类别列表中类别集合的子集;所述第二类别列表包括所述第二参考神经网络模型预设类别的集合。
  15. 如权利要求14所述的方法,其特征在于,所述第二指示信息为所述第二设备将所述第二样本数据通过所述第二参考神经网络模型计算处理,并将处理后的第二样本数据基于所述第二关注类别列表进行筛选所得到的。
  16. 如权利要求13-15任一项所述的方法,其特征在于,所述第一设备设计第二神经网络模型,包括:
    所述第一设备向所述第二设备发送第二网络结构请求;其中,所述第二网络结构请求用于所述第一设备向所述第二设备获取所述第二参考神经网络模型的结构信息;
    所述第一设备接收所述第二设备发送的第二结构请求响应;其中,所述第二结构请求响应包括所述第二参考神经网络模型的结构信息;
    所述第一设备基于所述第二参考神经网络模型的结构信息设计第二神经网络模型。
  17. 一种模型蒸馏训练的方法,其特征在于,包括:
    第二设备向第三设备发送注册信息;其中,所述注册信息包括第三训练类型ID、第三神经网络模型ID、第二存储信息、第二类别列表、训练响应信息;所述训练响应信息用于指示所述第二设备是否支持蒸馏训练神经网络模型以及在支持蒸馏训练神经网络模型的情况下,支持蒸馏训练神经网络模型的方式;
    所述第二设备接收第一设备发送的第二配置信息;其中,所述第二配置信息用于所述 第二设备配置所述第二参考神经网络模型;
    所述第二设备接收所述第一设备发送的第二数据信息;其中,所述第二数据信息包括所述第二参考神经网络模型用于蒸馏训练的第二样本数据;
    所述第二设备向所述第一设备发送第二指示信息;其中所述第二指示信息用于所述第一设备训练第二神经网络模型;所述第二指示信息为所述第二本数据输入到所述第二参考神经网络模型得到的信息。
  18. 如权利要求17所述的方法,其特征在于,所述第二设备接收第一设备发送的第二配置信息之前,还包括:
    所述第二设备接收所述第一设备发送的第二关注类别列表;其中,所述第二关注类别列表包括所述第一设备蒸馏训练所关注类别的集合;所述第二关注类别列表中,第一设备蒸馏训练所关注类别的集合为第二类别列表中类别集合的子集;所述第二类别列表包括所述第二参考神经网络模型预设类别的集合。
  19. 如权利要求18所述的方法,其特征在于,所述第二指示信息为所述第二设备将所述第二样本数据通过所述第二参考神经网络模型计算处理,并将处理后的第二样本数据基于所述第二关注类别列表进行筛选所得到的。
  20. 如权利要求17-19任一项所述的方法,其特征在于,所述第二设备接收第一设备发送的第二配置信息之前,还包括:
    所述第二设备接收所述第一设备发送的第二网络结构请求;其中,所述第二网络结构请求用于所述第一设备向所述第二设备获取第二参考神经网络模型的结构信息;
    所述第二设备基于所述第二网络结构请求向所述第一设备发送第二结构请求响应;其中,所述第二结构请求响应包括所述第二参考神经网络模型的结构信息。
  21. 一种模型蒸馏训练装置,其特征在于,所述模型蒸馏训练装置为第一设备,包括执行如权利要求1-5或12-16任意一项所述方法的单元。
  22. 一种模型蒸馏训练装置,其特征在于,所述模型蒸馏训练装置为第二设备,包括执行如权利要求6-10或17-20任意一项所述方法的单元。
  23. 一种模型蒸馏训练装置,其特征在于,所述模型蒸馏训练装置为第三设备,包括执行如权利要求11所述方法的单元。
  24. 一种模型蒸馏训练设备,其特征在于,所述模型蒸馏训练设备为第一设备,包括处理器和存储器;
    其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代 码,执行如权利要求1-5或12-16任意一项所述模型蒸馏训练的方法。
  25. 一种模型蒸馏训练设备,其特征在于,所述模型蒸馏训练设备为第二设备,包括处理器和存储器;
    其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代码,执行如权利要求6-10或17-20任意一项所述模型蒸馏训练的方法。
  26. 一种模型蒸馏训练设备,其特征在于,所述模型蒸馏训练设备为第三设备,包括处理器和存储器;
    其中,所述存储器用于存储程序代码,所述处理器用于调用所述存储器存储的程序代码,执行如权利要求11所述模型蒸馏训练的方法。
  27. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,该计算机程序被处理器执行时,实现如权利要求1-20任意一项所述方法。
PCT/CN2020/129478 2020-11-17 2020-11-17 模型蒸馏训练的方法及相关装置和设备、可读存储介质 WO2022104550A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CN202080106943.6A CN116438546A (zh) 2020-11-17 2020-11-17 模型蒸馏训练的方法及相关装置和设备、可读存储介质
EP20961841.2A EP4227857A4 (en) 2020-11-17 2020-11-17 MODEL DISTILLATION LEARNING METHOD AND ASSOCIATED APPARATUS, DEVICE AND READABLE STORAGE MEDIUM
PCT/CN2020/129478 WO2022104550A1 (zh) 2020-11-17 2020-11-17 模型蒸馏训练的方法及相关装置和设备、可读存储介质
US18/318,616 US20230289600A1 (en) 2020-11-17 2023-05-16 Model distillation training method, related apparatus and device, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/129478 WO2022104550A1 (zh) 2020-11-17 2020-11-17 模型蒸馏训练的方法及相关装置和设备、可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/318,616 Continuation US20230289600A1 (en) 2020-11-17 2023-05-16 Model distillation training method, related apparatus and device, and readable storage medium

Publications (1)

Publication Number Publication Date
WO2022104550A1 true WO2022104550A1 (zh) 2022-05-27

Family

ID=81707980

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129478 WO2022104550A1 (zh) 2020-11-17 2020-11-17 模型蒸馏训练的方法及相关装置和设备、可读存储介质

Country Status (4)

Country Link
US (1) US20230289600A1 (zh)
EP (1) EP4227857A4 (zh)
CN (1) CN116438546A (zh)
WO (1) WO2022104550A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115578613A (zh) * 2022-10-18 2023-01-06 北京百度网讯科技有限公司 目标再识别模型的训练方法和目标再识别方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN109711544A (zh) * 2018-12-04 2019-05-03 北京市商汤科技开发有限公司 模型压缩的方法、装置、电子设备及计算机存储介质
CN110163236A (zh) * 2018-10-15 2019-08-23 腾讯科技(深圳)有限公司 模型的训练方法和装置、存储介质、电子装置
CN110311900A (zh) * 2019-06-19 2019-10-08 微梦创科网络科技(中国)有限公司 一种服务调用方法、装置、电子设备及存储介质
US20200134506A1 (en) * 2018-10-29 2020-04-30 Fujitsu Limited Model training method, data identification method and data identification device
CN111709476A (zh) * 2020-06-17 2020-09-25 浪潮集团有限公司 一种基于知识蒸馏的小分类模型训练方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11410029B2 (en) * 2018-01-02 2022-08-09 International Business Machines Corporation Soft label generation for knowledge distillation

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110163236A (zh) * 2018-10-15 2019-08-23 腾讯科技(深圳)有限公司 模型的训练方法和装置、存储介质、电子装置
US20200134506A1 (en) * 2018-10-29 2020-04-30 Fujitsu Limited Model training method, data identification method and data identification device
CN109711544A (zh) * 2018-12-04 2019-05-03 北京市商汤科技开发有限公司 模型压缩的方法、装置、电子设备及计算机存储介质
CN110311900A (zh) * 2019-06-19 2019-10-08 微梦创科网络科技(中国)有限公司 一种服务调用方法、装置、电子设备及存储介质
CN111709476A (zh) * 2020-06-17 2020-09-25 浪潮集团有限公司 一种基于知识蒸馏的小分类模型训练方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4227857A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115578613A (zh) * 2022-10-18 2023-01-06 北京百度网讯科技有限公司 目标再识别模型的训练方法和目标再识别方法
CN115578613B (zh) * 2022-10-18 2024-03-08 北京百度网讯科技有限公司 目标再识别模型的训练方法和目标再识别方法

Also Published As

Publication number Publication date
EP4227857A4 (en) 2023-12-20
EP4227857A1 (en) 2023-08-16
US20230289600A1 (en) 2023-09-14
CN116438546A (zh) 2023-07-14

Similar Documents

Publication Publication Date Title
CN112561078B (zh) 分布式的模型训练方法及相关装置
US10120820B2 (en) Direct memory access transmission control method and apparatus
US20180324106A1 (en) Dynamic resource allocation method and apparatus in software-defined network
WO2018167537A1 (en) Method and system for assisted automatic network service request and delivery in a network environment
EP4016398A1 (en) Apparatus and method for distributed training model, and computer program product
US20130091502A1 (en) System and method of providing virtual machine using device cloud
US11516628B2 (en) Media streaming with edge computing
US20220053348A1 (en) Method, apparatus, and computer program product for enhanced data analytics in multiple nwdaf deployments
WO2022104550A1 (zh) 模型蒸馏训练的方法及相关装置和设备、可读存储介质
WO2023020214A1 (zh) 检索模型的训练和检索方法、装置、设备及介质
WO2017128701A1 (zh) 存储数据的方法和装置
CN113468344B (zh) 实体关系抽取方法、装置、电子设备和计算机可读介质
US20220309371A1 (en) Automated quantum circuit job submission and status determination
US20210368428A1 (en) Endpoint computing device multi-network slice utilization system
CN114116842A (zh) 多维医疗数据实时获取方法、装置、电子设备及存储介质
CN106469168B (zh) 数据集成系统中多类型数据处理的方法及装置
CN115080771A (zh) 基于人工智能的数据处理方法及装置、介质、网关设备
CN115576661A (zh) 数据处理系统、方法和控制器
CN114090262A (zh) 对象处理方法及装置、电子设备、存储介质
US10958597B2 (en) General purpose ring buffer handling in a network controller
CN112866897B (zh) 一种定位测量方法、终端和网络节点
US20210294758A1 (en) Systems and methods for queuing device management configuration requests
WO2022076042A1 (en) Method and apparatus for cloud service
US20210103830A1 (en) Machine learning based clustering and patterning system and method for network traffic data and its application
CN109344836A (zh) 一种文字识别方法和设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961841

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020961841

Country of ref document: EP

Effective date: 20230511

NENP Non-entry into the national phase

Ref country code: DE