CN112862095B

CN112862095B - Self-distillation learning method and device based on feature analysis and readable storage medium

Info

Publication number: CN112862095B
Application number: CN202110146048.3A
Authority: CN
Inventors: 袁雷; 魏乃科; 潘华东; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-02-02
Filing date: 2021-02-02
Publication date: 2023-09-29
Anticipated expiration: 2041-02-02
Also published as: CN112862095A

Abstract

The application discloses a self-distillation learning method and equipment based on feature analysis and a readable storage medium, wherein the self-distillation learning method based on feature analysis comprises the following steps: based on the depth and the original structure of the convolutional neural network, dividing the convolutional layer of the convolutional neural network into n partial characteristic layers in a set depth interval, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training, and obtaining a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain the trained convolutional neural network. By the method, the self-distillation learning effect can be improved by utilizing the loss function of the characteristic layer of each different part to carry out distillation learning, thereby effectively utilizing the structural information of the convolutional neural network.

Description

Self-distillation learning method and device based on feature analysis and readable storage medium

Technical Field

The application relates to the technical field of convolutional neural network training, in particular to a self-distillation learning method and device based on feature analysis and a readable storage medium.

Background

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the range of applications to some fields where accuracy is critical, researchers have been researching methods that increase accuracy through deeper or wider network structures, which can lead to an exponential increase in computational and storage costs, and thus delay response times.

With the help of convolutional neural networks, applications such as image classification, object detection, and semantic segmentation are currently under development at unprecedented speeds. However, in some applications requiring non-fault tolerance, such as autopilot and medical image analysis, further improvements in prediction and analysis accuracy are needed, while shorter response times are required. This results in current convolutional neural networks facing significant challenges. The prior art approaches focus on performance improvements or reduction of computational resources, enabling reduced response times. For example, on the one hand, resNet 150 or even larger ResNet 1000 has been proposed to improve very limited performance margins, but at a significant computational cost. On the other hand, with predefined performance losses compared to neural networks, various techniques have been proposed to reduce computation and memory to match the limitations imposed by hardware implementations. Such techniques include lightweight network design, pruning, and quantization, among others, where knowledge distillation is one of the possible ways to achieve model compression.

In the prior art, the self-distillation learning method can be used for high-efficiency training, but the characteristics of knowledge of feature layers with different depths are not considered, the self-distillation learning is uniformly performed, the teaching is not performed according to the material, and the self-distillation learning effect is limited.

Disclosure of Invention

The application provides a self-distillation learning method and device based on feature analysis and a readable storage medium.

The technical scheme provided by the application is as follows: provided is a self-distillation learning method based on feature analysis, the self-distillation learning method including:

based on the depth and the original structure of the convolutional neural network, dividing a convolutional layer of the convolutional neural network into n partial characteristic layers in a set depth interval, wherein n is a positive integer and is more than or equal to 2;

inputting a training set into the convolutional neural network for training, and obtaining a loss function of each part of feature layer;

and optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain the trained convolutional neural network.

In some possible embodiments, the feature layers of the convolutional neural network include at least a shallow feature layer, a middle feature layer, and a deep feature layer; wherein the shallow layer feature layer, the middle layer feature layer and the deep layer feature layer are sequentially connected;

the step of inputting the training set into the convolutional neural network for training comprises the following steps:

inputting the training set into the shallow feature layer to obtain shallow feature knowledge;

inputting the shallow characteristic knowledge into the middle characteristic layer to obtain middle characteristic knowledge;

and inputting the middle-layer feature knowledge into the deep feature layer to obtain deep feature knowledge.

In some possible embodiments, the self-distillation learning method further comprises:

inputting the training set into the shallow feature layer to obtain a loss factor of the shallow feature layer;

outputting a structural loss function of the shallow feature layer based on the loss factor of the shallow feature layer;

the functional structure of the structure loss function is designed based on the specific characteristics of the shallow characteristic layer.

inputting the shallow characteristic knowledge into the middle characteristic layer to obtain a loss factor of the middle characteristic layer;

outputting a pairing loss function of the middle-layer characteristic layer based on the loss factor of the middle-layer characteristic layer;

the functional structure of the pairing loss function is designed based on the specificity characteristics of the middle-layer characteristic layer.

inputting the middle-layer feature knowledge into the deep feature layer to obtain a loss factor of the deep feature layer;

outputting a probability distribution loss function of the deep feature layer based on the loss factor of the deep feature layer;

the function structure of the probability distribution loss function is designed based on the specificity characteristics of the deep feature layer.

In some possible embodiments, the step of optimizing the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network includes:

acquiring an overall loss function of the convolutional neural network output;

weighting the whole loss function according to a preset weight by using the loss function of each part of the characteristic layer to obtain a target loss function;

and optimizing the convolutional neural network based on the target loss function to obtain a trained convolutional neural network.

In some possible embodiments, the self-distillation learning method further comprises

The step of weighting the overall loss function with the loss function of each part of the feature layer according to a preset weight to obtain a target loss function comprises the following steps:

comparing the value of the loss function of each part of characteristic layer;

setting a weight value of a loss function of each part of feature layer according to the comparison result;

and weighting the whole loss function with the loss function of each part of the characteristic layer according to the weight value of the loss function of each part of the characteristic layer to obtain the target loss function.

The other technical scheme provided by the application is as follows: providing terminal equipment, wherein the terminal equipment comprises a dividing module, a training module and an optimizing module; wherein,,

the division module is used for dividing a convolution layer of the convolution neural network into n partial characteristic layers according to a set depth interval based on the depth and the original structure of the convolution neural network, wherein n is a positive integer, and n is more than or equal to 2;

the training module is used for inputting a training set into the convolutional neural network to train and obtaining a loss function of each part of characteristic layer;

and the optimizing module is used for optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain a trained convolutional neural network.

The other technical scheme provided by the application is as follows: there is provided another terminal device comprising a processor and a memory, the memory storing a computer program, the processor being adapted to execute the computer program to implement the steps of the self-distilling learning method based on feature analysis described above.

The application adopts another technical scheme that: there is provided a computer-readable storage medium storing a computer program which, when executed, implements the steps of the self-distillation learning method based on feature analysis described above.

Compared with the prior art, the application has the beneficial effects that: the terminal equipment divides a convolution layer of the convolution neural network into n partial characteristic layers according to a set depth interval based on the depth and the original structure of the convolution neural network, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training, and obtaining a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain the trained convolutional neural network. By the method, the self-distillation learning effect can be improved by utilizing the loss function of the characteristic layer of each different part to carry out distillation learning, thereby effectively utilizing the structural information of the convolutional neural network.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained based on these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of an embodiment of a self-distillation learning method based on feature analysis according to the present application;

FIG. 2 is a schematic diagram of a convolutional neural network according to one embodiment of the present application;

FIG. 3 is a training process for the structure loss function provided by the present application;

FIG. 4 is a training process for the pairing loss function provided by the present application;

FIG. 5 is a training process for the probability distribution penalty function provided by the present application;

FIG. 6 is a schematic diagram showing a specific flow of step 13 in the self-distillation learning method shown in FIG. 1;

fig. 7 is a schematic structural diagram of an embodiment of a terminal device provided by the present application;

fig. 8 is a schematic structural diagram of another embodiment of a terminal device provided by the present application;

fig. 9 is a schematic structural diagram of an embodiment of a computer readable storage medium provided by the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The human body key point detection technology is used for accurately estimating n main key points of a human body in a picture or a video, and comprises the following steps: the main key points of the human body are left and right elbows, left and right wrists, left and right shoulders, head, neck, left and right ankle, left and right knee, left and right hip, sole and the like. The human body key point detection technology can be applied to judging the state of a human body, the posture of the human body and the like.

The convolutional neural network trained by the application can be used for human body key point detection technology, and the training set required by training comprises a plurality of human body images, and relates to training conditions of different scenes, different angles and different illumination.

The application provides a dynamic combined distillation learning method based on feature analysis, which utilizes features among different layers of a convolutional neural network to carry out distillation learning, effectively utilizes structural information of the convolutional neural network, and solves the limitation of distillation learning due to direct use of output information. Referring to fig. 1 specifically, fig. 1 is a schematic flow chart of an embodiment of a self-distillation learning method based on feature analysis according to the present application.

The subject of execution of the self-distilled learning method of the present application may be a terminal device, for example, the self-distilled learning method may be executed by a terminal device or a server or other processing device, wherein the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a wireless phone, a personal digital assistant (Personal Digital Assistant, PDA), a handheld device, a computing device, an in-vehicle device, a wearable device, or the like. In some possible implementations, the self-distilling learning method may be implemented by way of a processor invoking computer readable instructions stored in a memory.

As shown in fig. 1, the self-distillation learning method based on feature analysis of the present embodiment specifically includes the following steps:

step S11: based on the depth and the original structure of the convolutional neural network, dividing the convolutional layer of the convolutional neural network into n partial characteristic layers in a set depth interval, wherein n is a positive integer and is more than or equal to 2.

The terminal equipment divides a convolution layer of the convolution neural network into at least two part of characteristic layers according to a set depth interval based on the depth and the original structure of the convolution neural network required by the human body key point detection technology.

Referring to fig. 2, fig. 2 is a schematic structural diagram of an embodiment of a convolutional neural network according to the present application. Since the existing convolutional neural network is mostly distinguished in a stage form, the embodiment of the present disclosure can divide the convolutional neural network of fig. 2 into four layers in stage units as well. It should be noted that, the self-distillation learning method in the embodiment of the disclosure is also applicable to convolutional neural networks with other structures, which are not described herein.

In fig. 2, stage1 may be referred to as a shallow feature layer, and an output of the shallow feature layer is referred to as shallow feature knowledge; stage2 in fig. 2 may be referred to as a middle feature layer, the output of which is referred to as middle feature knowledge; stage3 in fig. 2 may be referred to as a deep feature layer, and the output of the deep feature layer is referred to as deep feature knowledge. Further, the feature layer of Stage4 in fig. 2 may be a feature layer deeper than the deep feature layer, or may be a feature layer having the same depth as the deep feature layer.

Step S12: and inputting the training set into a convolutional neural network for training, and obtaining the loss function of each part of the feature layer.

The terminal equipment inputs a training set prepared in advance into the convolutional neural network for training. Specifically, inputting a training set into a Stage1 shallow feature layer for feature extraction and feature analysis to obtain shallow feature knowledge and loss factors of the shallow feature layer in the training process; then, the terminal equipment inputs the shallow characteristic knowledge into a middle layer characteristic layer of Stage2 for characteristic extraction and characteristic analysis to obtain the middle layer characteristic knowledge and loss factors of the middle layer characteristic layer in the training process; and finally, the terminal equipment inputs the middle-layer feature knowledge into the Stage2 deep feature layer for feature extraction and feature analysis to obtain the deep feature knowledge and loss factors of the deep feature layer in the training process.

Through the training process, the terminal equipment can analyze the specificity of the features of different feature layers according to the shallow feature knowledge, the middle layer feature knowledge and the deep feature knowledge. The shallow feature knowledge output by the Stage1 shallow feature layer contains more accurate position information by using a convolutional neural network required by a human body key point detection technology, but semantic information is less; the middle-layer feature knowledge output by the middle-layer feature layer of Stage2 not only contains relatively accurate position information, but also has semantic information to a certain extent; the deep feature knowledge output by the Stage3 deep feature layer has relatively strong semantic information, but less position information. Therefore, the feature specificity of the feature layer output at different depths has certain difference, the feature specificity brought by the structural information of the convolutional neural network is effectively utilized, and the efficiency and the effect of self-distillation learning can be improved by using a multi-layer feature analysis strategy.

Further, the terminal device designs a targeted loss function for each part of the feature layer based on the feature layer specificity characteristics of each part of the feature layer. The loss function is used for measuring the prediction effect of the convolutional neural network and representing the difference degree between the prediction data and the actual data.

Specifically, according to the characteristics of the Stage1 shallow feature layer, the embodiment of the disclosure designs a StructureLoss function for restricting the structural information of the human body under the serious shielding condition. Referring specifically to fig. 3, fig. 3 is a training process of the structure loss function provided in the present application. The terminal equipment inputs a portrait image to the convolutional neural network, namely the Init person in fig. 3, then the convolutional neural network obtains a loss factor according to the difference degree of the predicted data PTstructure and the actual data GTStructure, and finally the structural loss function of the shallow characteristic layer is output according to the loss factor. The calculation of the structural loss function requires contrast position information, but does not require semantic information, so that the structural loss function can be better adapted to shallow features than other types of loss functions.

According to the characteristics of the middle layer feature layer in Stage2, the embodiment of the disclosure designs a PairLoss function for restraining the prediction confusion problem between paired similar key points of a human body under the condition of shielding congestion. Referring specifically to fig. 4, fig. 4 is a training process of the pairing loss function provided in the present application. Taking the human body paired similar key points of the left wrist and the right wrist as an example, the terminal equipment inputs a portrait image to the convolutional neural network, namely Init person in fig. 4, then the convolutional neural network outputs a right-hand heat map RightHand Heatmap and a left-hand heat map LeftHand hand heat map Heatm, the difference degree of the right-hand heat map and the left-hand heat map is compared to obtain a loss factor, and finally the paired loss function of the middle-layer feature layer is output according to the loss factor. The calculation of the pairing loss function not only needs corresponding position information, but also needs certain semantic information, so that the pairing loss function can be better suitable for middle-layer characteristics compared with other types of loss functions.

According to the characteristics of the Stage3 deep feature layer, probability Distributions Loss (probability distribution loss) functions are designed to characterize feature distribution of key points of a human body, so that the network can be converged better. Referring specifically to fig. 5, fig. 5 is a training process of the probability distribution loss function provided by the present application. The convolutional neural network identifies and acquires the positions of the occurrence of the feature points in the image, generates probability distribution of the occurrence of the feature points, obtains loss factors by comparing the probability distribution conditions of the middle convolutional layer and the probability distribution conditions of the final convolutional layer, and finally outputs probability distribution loss functions of deep feature layers according to the loss factors. The probability distribution loss function requires stronger feature resolution and semantic capabilities, and thus can adapt better to deep features than other types of loss functions.

The self-distillation learning method of the embodiment of the disclosure analyzes the properties of the feature knowledge of different layers, makes the learning process easier by guiding the step by step, and solves the problem that learning becomes difficult by a single learning process.

Step S13: and optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain the trained convolutional neural network.

The terminal equipment obtains the specific loss functions of three different levels according to the steps, and finally, in order to better integrate the influence of all the loss functions on the final learning result, the influence degree of the specific loss functions of different levels on learning is considered to be different, so that the embodiment of the disclosure also designs the weight dynamically updated according to the learning process, and the whole learning process is more stable and effective.

Referring specifically to fig. 6, fig. 6 is a schematic flow chart of step 13 in the self-distillation learning method shown in fig. 1. As shown in fig. 6, step 13 may specifically further include the following steps:

step S131: and obtaining the integral loss function of the convolutional neural network output.

The terminal device obtains a loss function of each part of feature layer on one hand, and obtains an overall loss function output by the convolutional neural network on the other hand, wherein the overall loss function can be a KL divergence (Kullback-Leibler divergence) loss function and/or an MSE mean square error loss function.

Step S132: and weighting the whole loss function according to a preset weight and processing the whole loss function by using the loss function of each part of the characteristic layer to obtain a target loss function.

The terminal equipment carries out weighting treatment on the loss function and the whole loss function of each part of the feature layer according to preset weights to obtain a target loss function. It should be noted that, the preset weights may be dynamically updated in the learning process, so as to balance the influence of each loss function on the final learning result.

Specifically, after the terminal device obtains the specific values of the loss function and the overall loss function of each part of the feature layer, the terminal device calculates a proper weight according to the value of each loss function, so that the value of each loss function is weighted and is more even, and the influence of each loss function on the final learning result is balanced.

Step S133: and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.

The terminal equipment optimizes the convolutional neural network according to the target loss function to obtain the trained convolutional neural network. According to the embodiment of the disclosure, the specific characteristics of the characteristic knowledge of different layers are specifically analyzed, the specific loss function is set according to the specific characteristics, the teaching according to the materials is realized, the dynamic weight is built for the characteristics of different layers by utilizing the dynamic shadow, and the problem that the self-distillation learning effect is limited due to the fact that the self-distillation learning effect is not treated differently and the teaching according to the materials is solved.

In the embodiment of the disclosure, based on the depth and the original structure of the convolutional neural network, the terminal equipment divides the convolutional layer of the convolutional neural network into n partial characteristic layers in a set depth interval, wherein n is a positive integer, and n is more than or equal to 2; inputting the training set into a convolutional neural network for training, and obtaining a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain the trained convolutional neural network. By the method, the self-distillation learning effect can be improved by utilizing the loss function of the characteristic layer of each different part to carry out distillation learning, thereby effectively utilizing the structural information of the convolutional neural network.

It will be appreciated by those skilled in the art that in the above-described method of the specific embodiments, the written order of steps is not meant to imply a strict order of execution but rather should be construed according to the function and possibly inherent logic of the steps.

In order to implement the self-distillation learning method based on feature analysis in the above embodiment, the present application further provides a terminal device, and referring specifically to fig. 7, fig. 7 is a schematic structural diagram of an embodiment of the terminal device provided by the present application.

As shown in fig. 7, the terminal device 400 of the present embodiment includes a dividing module 41, a training module 42, and an optimizing module 43.

The dividing module 41 is configured to divide a convolutional layer of the convolutional neural network into n partial feature layers according to a set depth interval based on a depth and an original structure of the convolutional neural network, where n is a positive integer and n is greater than or equal to 2; the training module 42 is configured to input a training set into the convolutional neural network to perform training, and obtain a loss function of each part of the feature layers; the optimizing module 43 is configured to optimize the convolutional neural network based on the loss functions of all feature layers, so as to obtain a trained convolutional neural network.

In order to implement the self-distillation learning method based on feature analysis in the above embodiment, the present application further provides another terminal device, and referring specifically to fig. 8, fig. 8 is a schematic structural diagram of another embodiment of the terminal device provided by the present application.

As shown in fig. 8, the terminal device 500 of the present embodiment includes a processor 51, a memory 52, an input-output device 53, and a bus 54.

The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, and the memory 52 stores a computer program, and the processor 51 is configured to execute the computer program to implement the self-distillation learning method based on feature analysis of the above embodiment.

In the present embodiment, the processor 51 may also be referred to as a CPU (Central Processing Unit ). The processor 51 may be an integrated circuit chip with signal processing capabilities. Processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The processor 51 may also be a GPU (Graphics Processing Unit, graphics processor), also called a display core, a vision processor, a display chip, and is a microprocessor that is specially used for image computation on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.). The GPU is used for converting and driving display information required by a computer system, providing a line scanning signal for a display, controlling the correct display of the display, and is an important element for connecting the display and a personal computer mainboard and is also one of important equipment for 'man-machine conversation'. The display card is an important component in the host computer, and is very important for people who are engaged in professional graphic design to take on the task of outputting and displaying graphics. The general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.

The present application also provides a computer readable storage medium 600 for storing a computer program 61, as shown in fig. 9, which computer program 61, when executed by a processor, is adapted to carry out the method as described in the embodiment of the self-distilling learning method based on feature analysis according to the present application.

The method according to the embodiment of the self-distillation learning method based on the feature analysis of the present application may be stored in a device, such as a computer readable storage medium, when implemented in the form of a software functional unit and sold or used as a stand alone product. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.

Claims

1. A self-distillation learning method based on feature analysis, characterized in that the self-distillation learning method comprises:

optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain a trained convolutional neural network;

the characteristic layers divided by the convolutional neural network at least comprise a shallow characteristic layer, a middle characteristic layer and a deep characteristic layer, wherein the shallow characteristic layer, the middle characteristic layer and the deep characteristic layer are sequentially connected;

the loss function of the shallow characteristic layer is used for training position information of human body structures, the loss function of the middle characteristic layer is used for training pairing information of human body paired similar key points, and the loss function of the deep characteristic layer is used for training characteristic distribution information of human body key points.

2. The self-distilling learning method based on feature analysis according to claim 1, wherein the step of inputting a training set into the convolutional neural network for training comprises:

3. The self-distillation learning method based on feature analysis according to claim 2, wherein the self-distillation learning method further comprises:

4. The self-distillation learning method based on feature analysis according to claim 2, wherein the self-distillation learning method further comprises:

5. The self-distillation learning method based on feature analysis according to claim 2, wherein the self-distillation learning method further comprises:

6. The self-distillation learning method based on feature analysis according to claim 1, wherein the step of optimizing the convolutional neural network based on the loss function of all feature layers to obtain a trained convolutional neural network comprises:

acquiring an overall loss function of the convolutional neural network output;

7. The self-distillation learning method based on feature analysis according to claim 6, further comprising

comparing the value of the loss function of each part of characteristic layer;

8. The terminal equipment is characterized by comprising a dividing module, a training module and an optimizing module; wherein,,

the optimizing module is used for optimizing the convolutional neural network based on the loss functions of all the feature layers to obtain a trained convolutional neural network;

9. A terminal device, characterized in that the terminal device comprises a processor and a memory; the memory stores a computer program, and the processor is configured to execute the computer program to implement the steps of the self-distillation learning method based on feature analysis according to any one of claims 1 to 7.

10. A computer-readable storage medium, wherein the computer-readable storage medium stores a computer program which, when executed, implements the steps of the self-distillation learning method based on feature analysis as claimed in any one of claims 1 to 7.