CN112862095A

CN112862095A - Self-distillation learning method and device based on characteristic analysis and readable storage medium

Info

Publication number: CN112862095A
Application number: CN202110146048.3A
Authority: CN
Inventors: 袁雷; 魏乃科; 潘华东; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2021-05-28
Anticipated expiration: 2041-02-02
Also published as: CN112862095B

Abstract

The application discloses a self-distillation learning method, equipment and a readable storage medium based on feature analysis, wherein the self-distillation learning method based on feature analysis comprises the following steps: dividing a convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.

Description

Self-distillation learning method and device based on characteristic analysis and readable storage medium

Technical Field

The present application relates to the field of convolutional neural network training technologies, and in particular, to a self-distillation learning method and device based on feature analysis, and a readable storage medium.

Background

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the range of applications to some areas where accuracy is critical, researchers have been studying methods to increase accuracy through deeper or wider network structures, which can bring exponential growth in computation and storage costs, and thus delay response times.

Applications such as image classification, object detection and semantic segmentation are currently evolving at an unprecedented rate with the help of convolutional neural networks. However, in some applications requiring non-fault tolerance, such as autopilot and medical image analysis, there is a need for further improvement of prediction and analysis accuracy, while requiring shorter response times. This leads to a huge challenge for current convolutional neural networks. The prior art approaches have focused on performance improvement or reduction of computational resources, thereby enabling reduction of response times. For example, on the one hand, ResNet 150 or even larger ResNet 1000 have been proposed to improve very limited performance margins, but at a large computational cost. On the other hand, with a predefined performance penalty compared to neural networks, various techniques have been proposed to reduce the amount of computation and memory to match the limitations imposed by hardware implementations. Such techniques include lightweight network design, pruning, quantization, etc., where knowledge distillation is one of the possible ways to achieve model compression.

The self-distillation learning method can be used for efficient training in the prior art, but the characteristics of feature layer knowledge of different depths are not considered, self-learning is uniformly carried out, and teaching according to the material is not carried out, so that the self-distillation learning effect is limited.

Disclosure of Invention

The application provides a self-distillation learning method, a self-distillation learning device and a readable storage medium based on feature analysis.

The technical scheme provided by the application is as follows: provided is a feature analysis-based self-distillation learning method, including:

dividing the convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;

inputting a training set into the convolutional neural network for training to obtain a loss function of each part of feature layer;

and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.

In some possible embodiments, the feature layers of the convolutional neural network partition include at least a shallow feature layer, a middle feature layer and a deep feature layer; wherein the shallow feature layer, the middle feature layer and the deep feature layer are connected in sequence;

the step of inputting a training set into the convolutional neural network for training includes:

inputting the training set into the shallow feature layer to obtain shallow feature knowledge;

inputting the shallow layer feature knowledge into the middle layer feature layer to obtain middle layer feature knowledge;

and inputting the middle layer characteristic knowledge into the deep layer characteristic layer to obtain deep layer characteristic knowledge.

In some possible embodiments, the self-distillation learning method further comprises:

inputting the training set into the shallow feature layer to obtain a loss factor of the shallow feature layer;

outputting a structural loss function of the shallow feature layer based on the loss factor of the shallow feature layer;

wherein the functional structure of the structure loss function is designed based on the specific characteristics of the shallow feature layer.

inputting the shallow feature knowledge into the middle feature layer to obtain a loss factor of the middle feature layer;

outputting a pairing loss function of the middle layer feature layer based on the loss factor of the middle layer feature layer;

wherein the function structure of the pairing loss function is designed based on the specific characteristics of the middle layer characteristic layer.

inputting the middle layer feature knowledge into the deep layer feature layer to obtain a loss factor of the deep layer feature layer;

outputting a probability distribution loss function of the deep feature layer based on the loss factor of the deep feature layer;

wherein a function structure of the probability distribution loss function is designed based on the specific characteristics of the deep feature layer.

In some possible embodiments, the step of optimizing the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network includes:

obtaining an overall loss function output by the convolutional neural network;

weighting the whole loss function by the loss function of each part of feature layer according to preset weight to obtain a target loss function;

and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.

In some possible embodiments, the self-distillation learning method further comprises

The step of weighting the overall loss function by the loss function of each partial feature layer according to a preset weight to obtain a target loss function comprises:

comparing the values of the loss function of each partial feature layer;

setting the weight value of the loss function of each part of the feature layer according to the comparison result;

and weighting the whole loss function by the loss function of each partial characteristic layer according to the weight value of the loss function of each partial characteristic layer to obtain the target loss function.

Another technical solution provided by the present application is: providing a terminal device, wherein the terminal device comprises a dividing module, a training module and an optimizing module; wherein,

the dividing module is used for dividing the convolutional layer of the convolutional neural network into n parts of characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;

the training module is used for inputting a training set into the convolutional neural network for training to obtain a loss function of each part of the feature layer;

and the optimization module is used for optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.

Another technical solution provided by the present application is: there is provided another terminal device comprising a processor and a memory, the memory having stored therein a computer program, the processor being configured to execute the computer program to implement the steps of the above-described feature analysis based self-distillation learning method.

Another technical scheme adopted by the application is as follows: there is provided a computer readable storage medium, wherein the computer readable storage medium stores a computer program which, when executed, implements the steps of the above-described feature analysis based self-distillation learning method.

Different from the prior art, the beneficial effects of this application lie in: the terminal equipment divides the convolutional layer of the convolutional neural network into n partial characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a method for feature analysis-based self-distillation learning provided herein;

FIG. 2 is a schematic structural diagram of an embodiment of a convolutional neural network provided herein;

FIG. 3 is a training process for the structural loss function provided herein;

FIG. 4 is a training process for the pairwise loss function provided herein;

FIG. 5 is a training process for a probability distribution loss function provided herein;

FIG. 6 is a schematic flow chart showing the specific process of step 13 in the self-distillation learning method shown in FIG. 1;

fig. 7 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;

fig. 8 is a schematic structural diagram of another embodiment of a terminal device provided in the present application;

FIG. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The human body key point detection technology is used for accurately estimating n main key points of a human body in a picture or a video, and comprises the following steps: the main key points of the human body are left and right elbows, wrists, shoulders, heads, necks, ankles, knees, hips, soles and the like. The human body key point detection technology can be applied to judging the state of the human body, the posture of the human body and the like.

The convolutional neural network trained by the method can be used for a human key point detection technology, and a training set required by training comprises a plurality of human body images and relates to training conditions of different scenes, different angles and different illuminations.

The dynamic combined distillation learning method based on the feature analysis utilizes features between different layers of a convolutional neural network to perform distillation learning, effectively utilizes structural information of the convolutional neural network, and solves the limitation of distillation learning due to the fact that output information is directly used. Referring specifically to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a feature analysis-based self-distillation learning method provided in the present application.

The main body of the self-distillation learning method of the present application may be a terminal device, for example, the self-distillation learning method may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a wireless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the self-distillation learning method may be implemented by a processor calling computer readable instructions stored in a memory.

As shown in fig. 1, the self-distillation learning method based on feature analysis of the present embodiment specifically includes the following steps:

step S11: based on the depth and the original structure of the convolutional neural network, the convolutional layer of the convolutional neural network is divided into n partial feature layers by a set depth interval, wherein n is a positive integer and is more than or equal to 2.

The terminal equipment divides the convolutional layer of the convolutional neural network into at least two characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network required by the human body key point detection technology.

Fig. 2 is a schematic structural diagram of an embodiment of a convolutional neural network provided in the present application. Since the existing convolutional neural network is mostly distinguished in the form of stage, the convolutional neural network of fig. 2 can also be divided into four layers in the unit of stage in the embodiment of the present disclosure. It should be noted that the self-distillation learning method according to the embodiment of the present disclosure is also applicable to convolutional neural networks with other structures, and is not described herein again.

Wherein, Stage1 in fig. 2 can be called as a shallow feature layer, and the output of the shallow feature layer is called as shallow feature knowledge; stage2 in fig. 2 can be referred to as a middle layer feature layer, and the output of the middle layer feature layer is referred to as middle layer feature knowledge; stage3 in fig. 2 may be referred to as a deep feature layer, the output of which is referred to as deep feature knowledge. In addition, the feature layer of Stage4 in fig. 2 may be a feature layer with a deeper depth than a deeper feature layer, or may be a feature layer with the same depth as a deeper feature layer.

Step S12: and inputting the training set into a convolutional neural network for training to obtain the loss function of each part of the feature layer.

The terminal equipment inputs a training set prepared in advance into the convolutional neural network for training. Specifically, a training set is input into a Stage1 shallow feature layer for feature extraction and feature analysis to obtain shallow feature knowledge and loss factors of the shallow feature layer in the training process; then, the terminal equipment inputs the shallow feature knowledge into a middle feature layer of Stage2 to perform feature extraction and feature analysis, so as to obtain the middle feature knowledge and loss factors of the middle feature layer in the training process; and finally, the terminal equipment inputs the middle-layer feature knowledge into the Stage2 deep feature layer for feature extraction and feature analysis to obtain the deep feature knowledge and loss factors of the deep feature layer in the training process.

Through the training process, the terminal equipment can analyze the specificity of different characteristic layer characteristics according to the shallow characteristic knowledge, the middle characteristic knowledge and the deep characteristic knowledge. With a convolutional neural network required by a human key point detection technology, shallow feature knowledge output by a Stage1 shallow feature layer contains more accurate position information, but semantic information is less; the middle-layer feature knowledge output by the middle-layer feature layer of Stage2 not only contains relatively accurate position information, but also has semantic information to a certain extent; the deep feature knowledge output by the Stage3 deep feature layer has relatively strong semantic information, but less location information. Therefore, the characteristic specificity output by the characteristic layers with different depths has certain difference, the characteristic specificity brought by the structural information of the convolutional neural network is effectively utilized, and the efficiency and the effect of self-distillation learning can be improved by using a multilayer characteristic analysis strategy.

Further, the terminal device designs a targeted loss function for each part of the feature layer based on the feature layer specificity characteristics of each part of the feature layer. The loss function is used for measuring the prediction effect of the convolutional neural network and is used for expressing the gap degree between the predicted data and the actual data.

Specifically, according to the characteristics of the Stage1 shallow feature layer, the disclosed embodiment designs a StructureLoss function for constraining the structural information of the human body in the severe occlusion situation. Referring specifically to fig. 3, fig. 3 illustrates a training process of the structure loss function provided in the present application. The terminal equipment inputs a portrait image, namely Init person in fig. 3, to the convolutional neural network, then the convolutional neural network obtains a loss factor according to the difference degree of the prediction data PTStructure and the actual data GTStructure, and finally outputs a structural loss function of the shallow feature layer according to the loss factor. The calculation of the structural loss function requires comparison of position information without semantic information, and therefore can be better adapted to shallow features than other types of loss functions.

According to the characteristics of a layer feature layer in Stage2, the PairLoss (pairing loss) function is designed to restrict the problem of prediction confusion between pairs of similar key points of a human body under the condition of occlusion congestion. Referring specifically to fig. 4, fig. 4 illustrates a training process of the pairing loss function provided in the present application. Taking human body paired similar key points of the left wrist and the right wrist as an example, the terminal device inputs a portrait image, i.e., Init person in fig. 4, to the convolutional neural network, then the convolutional neural network outputs a right-hand thermal map RightHand Heatmap and a left-hand thermal map LeftHand Heatmap, the difference degree between the right-hand thermal map and the left-hand thermal map is compared to obtain a loss factor, and finally, a pairing loss function of the middle layer feature layer is output according to the loss factor. The calculation of the pairing loss function not only needs corresponding position information, but also needs certain semantic information, so that the pairing loss function can be better adapted to middle-layer characteristics compared with other types of loss functions.

According to the characteristics of the Stage3 deep feature layer, the embodiment of the disclosure designs a Probability distribution Loss function for representing the feature distribution of key points of a human body, so that the network can be converged better. Referring specifically to fig. 5, fig. 5 illustrates a training process of the probability distribution loss function provided in the present application. The convolutional neural network identifies and acquires the positions of the characteristic points in the image, generates probability distribution of the characteristic points, obtains loss factors by comparing the probability distribution condition of the middle convolutional layer with the probability distribution condition of the final convolutional layer, and finally outputs the probability distribution loss function of the deep characteristic layer according to the loss factors. The probability distribution loss function needs stronger feature resolution capability and semantic capability, so that the probability distribution loss function can be better adapted to deep features compared with other types of loss functions.

The self-distillation learning method of the embodiment of the disclosure analyzes the properties of the feature knowledge of different layers, makes the learning process easier by guiding the learning process layer by layer and gradually, and solves the problem that the learning becomes difficult by a single learning process.

Step S13: and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.

The terminal device obtains the specific loss functions of three different levels according to the steps, and finally considers that the influence degrees of the specific loss functions of the different levels on learning are different in order to better integrate the influence of all the loss functions on the final learning result, so that the weight dynamically updated according to the learning process is designed in the embodiment of the disclosure, and the whole learning process is more stable and effective.

Referring to fig. 6, fig. 6 is a schematic flow chart of step 13 in the self-distillation learning method shown in fig. 1. As shown in fig. 6, step 13 may specifically include the following steps:

step S131: and acquiring the overall loss function of the output of the convolutional neural network.

The terminal equipment acquires a loss function of each part of the feature layer on one hand and acquires an overall loss function output by the convolutional neural network on the other hand, and specifically may be a KL divergence (Kullback-Leibler) loss function and/or a MSE mean square error loss function.

Step S132: and weighting the whole loss function by the loss function of each part of the characteristic layer according to a preset weight to obtain a target loss function.

And the terminal equipment performs weighting processing on the loss function of each part of the characteristic layer and the whole loss function according to a preset weight to obtain a target loss function. It should be noted that the preset weight may also be dynamically updated in the learning process, so as to balance the influence of each loss function on the final learning result.

Specifically, after the terminal device obtains the loss function of each partial feature layer and the specific value of the overall loss function, the terminal device calculates an appropriate weight according to the value of each loss function, so that the value of each loss function is weighted and then is more average, and the influence of each loss function on the final learning result is balanced.

Step S133: and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.

And the terminal equipment optimizes the convolutional neural network according to the target loss function to obtain the trained convolutional neural network. The embodiment of the disclosure specifically analyzes specific characteristics of different layers of feature knowledge, sets a specific loss function according to the specific characteristics, realizes teaching according to the material, and constructs dynamic weights for the features between different layers by using dynamic shadows, thereby solving the problem of limited self-distillation learning effect caused by teaching according to the material because of no differential treatment.

In the embodiment of the disclosure, the terminal device divides the convolutional layer of the convolutional neural network into n partial feature layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is greater than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.

It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.

In order to implement the feature analysis-based self-distillation learning method of the foregoing embodiment, the present application further provides a terminal device, and specifically please refer to fig. 7, where fig. 7 is a schematic structural diagram of an embodiment of the terminal device provided in the present application.

As shown in fig. 7, the terminal device 400 of the present embodiment includes a dividing module 41, a training module 42, and an optimizing module 43.

The dividing module 41 is configured to divide a convolutional layer of a convolutional neural network into n partial feature layers in a set depth interval based on a depth and an original structure of the convolutional neural network, where n is a positive integer and n is greater than or equal to 2; the training module 42 is configured to input a training set into the convolutional neural network for training, and obtain a loss function of each partial feature layer; and the optimization module 43 is configured to optimize the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network.

In order to implement the feature analysis-based self-distillation learning method of the foregoing embodiment, the present application further provides another terminal device, and specifically please refer to fig. 8, where fig. 8 is a schematic structural diagram of another embodiment of the terminal device provided by the present application.

As shown in fig. 8, the terminal device 500 of the present embodiment includes a processor 51, a memory 52, an input-output device 53, and a bus 54.

The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, the memory 52 stores a computer program, and the processor 51 is configured to execute the computer program to implement the feature analysis-based self-distillation learning method according to the above embodiment.

In the present embodiment, the processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The processor 51 may also be a GPU (Graphics Processing Unit), which is also called a display core, a visual processor, and a display chip, and is a microprocessor specially used for image operation on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.). The GPU is used for converting and driving display information required by a computer system, providing a line scanning signal for a display and controlling the display of the display correctly, is an important element for connecting the display and a personal computer mainboard, and is also one of important devices for man-machine conversation. The display card is an important component in the computer host, takes charge of outputting display graphics, and is very important for people engaged in professional graphic design. A general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.

The present application also provides a computer-readable storage medium, as shown in fig. 9, the computer-readable storage medium 600 is used for storing a computer program 61, and the computer program 61 is used for implementing the method as described in the embodiment of the self-distillation learning method based on feature analysis in the present application when being executed by a processor.

The method involved in the embodiment of the self-distillation learning method based on feature analysis can be stored in equipment, such as a computer readable storage medium, when the method is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A self-distillation learning method based on feature analysis, the self-distillation learning method comprising:

2. The feature analysis-based self-distillation learning method according to claim 1, wherein the feature layers divided by the convolutional neural network at least comprise a shallow feature layer, a middle feature layer and a deep feature layer; wherein the shallow feature layer, the middle feature layer and the deep feature layer are connected in sequence;

3. The feature analysis based self-distillation learning method of claim 2, further comprising:

4. The feature analysis based self-distillation learning method of claim 2, further comprising:

5. The feature analysis based self-distillation learning method of claim 2, further comprising:

6. The feature analysis-based self-distillation learning method according to claim 1, wherein the step of optimizing the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network comprises:

obtaining an overall loss function output by the convolutional neural network;

7. The feature analysis based self-distillation learning method of claim 6, further comprising

comparing the values of the loss function of each partial feature layer;

8. The terminal equipment is characterized by comprising a dividing module, a training module and an optimizing module; wherein,

9. A terminal device, characterized in that the terminal device comprises a processor and a memory; the memory is stored with a computer program, and the processor is used for executing the computer program to realize the steps of the feature analysis-based self-distillation learning method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed, implements the steps of the feature analysis based self-distillation learning method according to any of claims 1-7.