CN112862095A - Self-distillation learning method and device based on characteristic analysis and readable storage medium - Google Patents

Self-distillation learning method and device based on characteristic analysis and readable storage medium Download PDF

Info

Publication number
CN112862095A
CN112862095A CN202110146048.3A CN202110146048A CN112862095A CN 112862095 A CN112862095 A CN 112862095A CN 202110146048 A CN202110146048 A CN 202110146048A CN 112862095 A CN112862095 A CN 112862095A
Authority
CN
China
Prior art keywords
layer
feature
neural network
convolutional neural
loss function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110146048.3A
Other languages
Chinese (zh)
Other versions
CN112862095B (en
Inventor
袁雷
魏乃科
潘华东
殷俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Dahua Technology Co Ltd
Original Assignee
Zhejiang Dahua Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Dahua Technology Co Ltd filed Critical Zhejiang Dahua Technology Co Ltd
Priority to CN202110146048.3A priority Critical patent/CN112862095B/en
Publication of CN112862095A publication Critical patent/CN112862095A/en
Application granted granted Critical
Publication of CN112862095B publication Critical patent/CN112862095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a self-distillation learning method, equipment and a readable storage medium based on feature analysis, wherein the self-distillation learning method based on feature analysis comprises the following steps: dividing a convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.

Description

Self-distillation learning method and device based on characteristic analysis and readable storage medium
Technical Field
The present application relates to the field of convolutional neural network training technologies, and in particular, to a self-distillation learning method and device based on feature analysis, and a readable storage medium.
Background
Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the range of applications to some areas where accuracy is critical, researchers have been studying methods to increase accuracy through deeper or wider network structures, which can bring exponential growth in computation and storage costs, and thus delay response times.
Applications such as image classification, object detection and semantic segmentation are currently evolving at an unprecedented rate with the help of convolutional neural networks. However, in some applications requiring non-fault tolerance, such as autopilot and medical image analysis, there is a need for further improvement of prediction and analysis accuracy, while requiring shorter response times. This leads to a huge challenge for current convolutional neural networks. The prior art approaches have focused on performance improvement or reduction of computational resources, thereby enabling reduction of response times. For example, on the one hand, ResNet 150 or even larger ResNet 1000 have been proposed to improve very limited performance margins, but at a large computational cost. On the other hand, with a predefined performance penalty compared to neural networks, various techniques have been proposed to reduce the amount of computation and memory to match the limitations imposed by hardware implementations. Such techniques include lightweight network design, pruning, quantization, etc., where knowledge distillation is one of the possible ways to achieve model compression.
The self-distillation learning method can be used for efficient training in the prior art, but the characteristics of feature layer knowledge of different depths are not considered, self-learning is uniformly carried out, and teaching according to the material is not carried out, so that the self-distillation learning effect is limited.
Disclosure of Invention
The application provides a self-distillation learning method, a self-distillation learning device and a readable storage medium based on feature analysis.
The technical scheme provided by the application is as follows: provided is a feature analysis-based self-distillation learning method, including:
dividing the convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
inputting a training set into the convolutional neural network for training to obtain a loss function of each part of feature layer;
and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
In some possible embodiments, the feature layers of the convolutional neural network partition include at least a shallow feature layer, a middle feature layer and a deep feature layer; wherein the shallow feature layer, the middle feature layer and the deep feature layer are connected in sequence;
the step of inputting a training set into the convolutional neural network for training includes:
inputting the training set into the shallow feature layer to obtain shallow feature knowledge;
inputting the shallow layer feature knowledge into the middle layer feature layer to obtain middle layer feature knowledge;
and inputting the middle layer characteristic knowledge into the deep layer characteristic layer to obtain deep layer characteristic knowledge.
In some possible embodiments, the self-distillation learning method further comprises:
inputting the training set into the shallow feature layer to obtain a loss factor of the shallow feature layer;
outputting a structural loss function of the shallow feature layer based on the loss factor of the shallow feature layer;
wherein the functional structure of the structure loss function is designed based on the specific characteristics of the shallow feature layer.
In some possible embodiments, the self-distillation learning method further comprises:
inputting the shallow feature knowledge into the middle feature layer to obtain a loss factor of the middle feature layer;
outputting a pairing loss function of the middle layer feature layer based on the loss factor of the middle layer feature layer;
wherein the function structure of the pairing loss function is designed based on the specific characteristics of the middle layer characteristic layer.
In some possible embodiments, the self-distillation learning method further comprises:
inputting the middle layer feature knowledge into the deep layer feature layer to obtain a loss factor of the deep layer feature layer;
outputting a probability distribution loss function of the deep feature layer based on the loss factor of the deep feature layer;
wherein a function structure of the probability distribution loss function is designed based on the specific characteristics of the deep feature layer.
In some possible embodiments, the step of optimizing the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network includes:
obtaining an overall loss function output by the convolutional neural network;
weighting the whole loss function by the loss function of each part of feature layer according to preset weight to obtain a target loss function;
and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.
In some possible embodiments, the self-distillation learning method further comprises
The step of weighting the overall loss function by the loss function of each partial feature layer according to a preset weight to obtain a target loss function comprises:
comparing the values of the loss function of each partial feature layer;
setting the weight value of the loss function of each part of the feature layer according to the comparison result;
and weighting the whole loss function by the loss function of each partial characteristic layer according to the weight value of the loss function of each partial characteristic layer to obtain the target loss function.
Another technical solution provided by the present application is: providing a terminal device, wherein the terminal device comprises a dividing module, a training module and an optimizing module; wherein,
the dividing module is used for dividing the convolutional layer of the convolutional neural network into n parts of characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
the training module is used for inputting a training set into the convolutional neural network for training to obtain a loss function of each part of the feature layer;
and the optimization module is used for optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
Another technical solution provided by the present application is: there is provided another terminal device comprising a processor and a memory, the memory having stored therein a computer program, the processor being configured to execute the computer program to implement the steps of the above-described feature analysis based self-distillation learning method.
Another technical scheme adopted by the application is as follows: there is provided a computer readable storage medium, wherein the computer readable storage medium stores a computer program which, when executed, implements the steps of the above-described feature analysis based self-distillation learning method.
Different from the prior art, the beneficial effects of this application lie in: the terminal equipment divides the convolutional layer of the convolutional neural network into n partial characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a method for feature analysis-based self-distillation learning provided herein;
FIG. 2 is a schematic structural diagram of an embodiment of a convolutional neural network provided herein;
FIG. 3 is a training process for the structural loss function provided herein;
FIG. 4 is a training process for the pairwise loss function provided herein;
FIG. 5 is a training process for a probability distribution loss function provided herein;
FIG. 6 is a schematic flow chart showing the specific process of step 13 in the self-distillation learning method shown in FIG. 1;
fig. 7 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;
fig. 8 is a schematic structural diagram of another embodiment of a terminal device provided in the present application;
FIG. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The human body key point detection technology is used for accurately estimating n main key points of a human body in a picture or a video, and comprises the following steps: the main key points of the human body are left and right elbows, wrists, shoulders, heads, necks, ankles, knees, hips, soles and the like. The human body key point detection technology can be applied to judging the state of the human body, the posture of the human body and the like.
The convolutional neural network trained by the method can be used for a human key point detection technology, and a training set required by training comprises a plurality of human body images and relates to training conditions of different scenes, different angles and different illuminations.
The dynamic combined distillation learning method based on the feature analysis utilizes features between different layers of a convolutional neural network to perform distillation learning, effectively utilizes structural information of the convolutional neural network, and solves the limitation of distillation learning due to the fact that output information is directly used. Referring specifically to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a feature analysis-based self-distillation learning method provided in the present application.
The main body of the self-distillation learning method of the present application may be a terminal device, for example, the self-distillation learning method may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a wireless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the self-distillation learning method may be implemented by a processor calling computer readable instructions stored in a memory.
As shown in fig. 1, the self-distillation learning method based on feature analysis of the present embodiment specifically includes the following steps:
step S11: based on the depth and the original structure of the convolutional neural network, the convolutional layer of the convolutional neural network is divided into n partial feature layers by a set depth interval, wherein n is a positive integer and is more than or equal to 2.
The terminal equipment divides the convolutional layer of the convolutional neural network into at least two characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network required by the human body key point detection technology.
Fig. 2 is a schematic structural diagram of an embodiment of a convolutional neural network provided in the present application. Since the existing convolutional neural network is mostly distinguished in the form of stage, the convolutional neural network of fig. 2 can also be divided into four layers in the unit of stage in the embodiment of the present disclosure. It should be noted that the self-distillation learning method according to the embodiment of the present disclosure is also applicable to convolutional neural networks with other structures, and is not described herein again.
Wherein, Stage1 in fig. 2 can be called as a shallow feature layer, and the output of the shallow feature layer is called as shallow feature knowledge; stage2 in fig. 2 can be referred to as a middle layer feature layer, and the output of the middle layer feature layer is referred to as middle layer feature knowledge; stage3 in fig. 2 may be referred to as a deep feature layer, the output of which is referred to as deep feature knowledge. In addition, the feature layer of Stage4 in fig. 2 may be a feature layer with a deeper depth than a deeper feature layer, or may be a feature layer with the same depth as a deeper feature layer.
Step S12: and inputting the training set into a convolutional neural network for training to obtain the loss function of each part of the feature layer.
The terminal equipment inputs a training set prepared in advance into the convolutional neural network for training. Specifically, a training set is input into a Stage1 shallow feature layer for feature extraction and feature analysis to obtain shallow feature knowledge and loss factors of the shallow feature layer in the training process; then, the terminal equipment inputs the shallow feature knowledge into a middle feature layer of Stage2 to perform feature extraction and feature analysis, so as to obtain the middle feature knowledge and loss factors of the middle feature layer in the training process; and finally, the terminal equipment inputs the middle-layer feature knowledge into the Stage2 deep feature layer for feature extraction and feature analysis to obtain the deep feature knowledge and loss factors of the deep feature layer in the training process.
Through the training process, the terminal equipment can analyze the specificity of different characteristic layer characteristics according to the shallow characteristic knowledge, the middle characteristic knowledge and the deep characteristic knowledge. With a convolutional neural network required by a human key point detection technology, shallow feature knowledge output by a Stage1 shallow feature layer contains more accurate position information, but semantic information is less; the middle-layer feature knowledge output by the middle-layer feature layer of Stage2 not only contains relatively accurate position information, but also has semantic information to a certain extent; the deep feature knowledge output by the Stage3 deep feature layer has relatively strong semantic information, but less location information. Therefore, the characteristic specificity output by the characteristic layers with different depths has certain difference, the characteristic specificity brought by the structural information of the convolutional neural network is effectively utilized, and the efficiency and the effect of self-distillation learning can be improved by using a multilayer characteristic analysis strategy.
Further, the terminal device designs a targeted loss function for each part of the feature layer based on the feature layer specificity characteristics of each part of the feature layer. The loss function is used for measuring the prediction effect of the convolutional neural network and is used for expressing the gap degree between the predicted data and the actual data.
Specifically, according to the characteristics of the Stage1 shallow feature layer, the disclosed embodiment designs a StructureLoss function for constraining the structural information of the human body in the severe occlusion situation. Referring specifically to fig. 3, fig. 3 illustrates a training process of the structure loss function provided in the present application. The terminal equipment inputs a portrait image, namely Init person in fig. 3, to the convolutional neural network, then the convolutional neural network obtains a loss factor according to the difference degree of the prediction data PTStructure and the actual data GTStructure, and finally outputs a structural loss function of the shallow feature layer according to the loss factor. The calculation of the structural loss function requires comparison of position information without semantic information, and therefore can be better adapted to shallow features than other types of loss functions.
According to the characteristics of a layer feature layer in Stage2, the PairLoss (pairing loss) function is designed to restrict the problem of prediction confusion between pairs of similar key points of a human body under the condition of occlusion congestion. Referring specifically to fig. 4, fig. 4 illustrates a training process of the pairing loss function provided in the present application. Taking human body paired similar key points of the left wrist and the right wrist as an example, the terminal device inputs a portrait image, i.e., Init person in fig. 4, to the convolutional neural network, then the convolutional neural network outputs a right-hand thermal map RightHand Heatmap and a left-hand thermal map LeftHand Heatmap, the difference degree between the right-hand thermal map and the left-hand thermal map is compared to obtain a loss factor, and finally, a pairing loss function of the middle layer feature layer is output according to the loss factor. The calculation of the pairing loss function not only needs corresponding position information, but also needs certain semantic information, so that the pairing loss function can be better adapted to middle-layer characteristics compared with other types of loss functions.
According to the characteristics of the Stage3 deep feature layer, the embodiment of the disclosure designs a Probability distribution Loss function for representing the feature distribution of key points of a human body, so that the network can be converged better. Referring specifically to fig. 5, fig. 5 illustrates a training process of the probability distribution loss function provided in the present application. The convolutional neural network identifies and acquires the positions of the characteristic points in the image, generates probability distribution of the characteristic points, obtains loss factors by comparing the probability distribution condition of the middle convolutional layer with the probability distribution condition of the final convolutional layer, and finally outputs the probability distribution loss function of the deep characteristic layer according to the loss factors. The probability distribution loss function needs stronger feature resolution capability and semantic capability, so that the probability distribution loss function can be better adapted to deep features compared with other types of loss functions.
The self-distillation learning method of the embodiment of the disclosure analyzes the properties of the feature knowledge of different layers, makes the learning process easier by guiding the learning process layer by layer and gradually, and solves the problem that the learning becomes difficult by a single learning process.
Step S13: and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
The terminal device obtains the specific loss functions of three different levels according to the steps, and finally considers that the influence degrees of the specific loss functions of the different levels on learning are different in order to better integrate the influence of all the loss functions on the final learning result, so that the weight dynamically updated according to the learning process is designed in the embodiment of the disclosure, and the whole learning process is more stable and effective.
Referring to fig. 6, fig. 6 is a schematic flow chart of step 13 in the self-distillation learning method shown in fig. 1. As shown in fig. 6, step 13 may specifically include the following steps:
step S131: and acquiring the overall loss function of the output of the convolutional neural network.
The terminal equipment acquires a loss function of each part of the feature layer on one hand and acquires an overall loss function output by the convolutional neural network on the other hand, and specifically may be a KL divergence (Kullback-Leibler) loss function and/or a MSE mean square error loss function.
Step S132: and weighting the whole loss function by the loss function of each part of the characteristic layer according to a preset weight to obtain a target loss function.
And the terminal equipment performs weighting processing on the loss function of each part of the characteristic layer and the whole loss function according to a preset weight to obtain a target loss function. It should be noted that the preset weight may also be dynamically updated in the learning process, so as to balance the influence of each loss function on the final learning result.
Specifically, after the terminal device obtains the loss function of each partial feature layer and the specific value of the overall loss function, the terminal device calculates an appropriate weight according to the value of each loss function, so that the value of each loss function is weighted and then is more average, and the influence of each loss function on the final learning result is balanced.
Step S133: and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.
And the terminal equipment optimizes the convolutional neural network according to the target loss function to obtain the trained convolutional neural network. The embodiment of the disclosure specifically analyzes specific characteristics of different layers of feature knowledge, sets a specific loss function according to the specific characteristics, realizes teaching according to the material, and constructs dynamic weights for the features between different layers by using dynamic shadows, thereby solving the problem of limited self-distillation learning effect caused by teaching according to the material because of no differential treatment.
In the embodiment of the disclosure, the terminal device divides the convolutional layer of the convolutional neural network into n partial feature layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is greater than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
In order to implement the feature analysis-based self-distillation learning method of the foregoing embodiment, the present application further provides a terminal device, and specifically please refer to fig. 7, where fig. 7 is a schematic structural diagram of an embodiment of the terminal device provided in the present application.
As shown in fig. 7, the terminal device 400 of the present embodiment includes a dividing module 41, a training module 42, and an optimizing module 43.
The dividing module 41 is configured to divide a convolutional layer of a convolutional neural network into n partial feature layers in a set depth interval based on a depth and an original structure of the convolutional neural network, where n is a positive integer and n is greater than or equal to 2; the training module 42 is configured to input a training set into the convolutional neural network for training, and obtain a loss function of each partial feature layer; and the optimization module 43 is configured to optimize the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network.
In order to implement the feature analysis-based self-distillation learning method of the foregoing embodiment, the present application further provides another terminal device, and specifically please refer to fig. 8, where fig. 8 is a schematic structural diagram of another embodiment of the terminal device provided by the present application.
As shown in fig. 8, the terminal device 500 of the present embodiment includes a processor 51, a memory 52, an input-output device 53, and a bus 54.
The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, the memory 52 stores a computer program, and the processor 51 is configured to execute the computer program to implement the feature analysis-based self-distillation learning method according to the above embodiment.
In the present embodiment, the processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The processor 51 may also be a GPU (Graphics Processing Unit), which is also called a display core, a visual processor, and a display chip, and is a microprocessor specially used for image operation on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.). The GPU is used for converting and driving display information required by a computer system, providing a line scanning signal for a display and controlling the display of the display correctly, is an important element for connecting the display and a personal computer mainboard, and is also one of important devices for man-machine conversation. The display card is an important component in the computer host, takes charge of outputting display graphics, and is very important for people engaged in professional graphic design. A general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.
The present application also provides a computer-readable storage medium, as shown in fig. 9, the computer-readable storage medium 600 is used for storing a computer program 61, and the computer program 61 is used for implementing the method as described in the embodiment of the self-distillation learning method based on feature analysis in the present application when being executed by a processor.
The method involved in the embodiment of the self-distillation learning method based on feature analysis can be stored in equipment, such as a computer readable storage medium, when the method is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A self-distillation learning method based on feature analysis, the self-distillation learning method comprising:
dividing the convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
inputting a training set into the convolutional neural network for training to obtain a loss function of each part of feature layer;
and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
2. The feature analysis-based self-distillation learning method according to claim 1, wherein the feature layers divided by the convolutional neural network at least comprise a shallow feature layer, a middle feature layer and a deep feature layer; wherein the shallow feature layer, the middle feature layer and the deep feature layer are connected in sequence;
the step of inputting a training set into the convolutional neural network for training includes:
inputting the training set into the shallow feature layer to obtain shallow feature knowledge;
inputting the shallow layer feature knowledge into the middle layer feature layer to obtain middle layer feature knowledge;
and inputting the middle layer characteristic knowledge into the deep layer characteristic layer to obtain deep layer characteristic knowledge.
3. The feature analysis based self-distillation learning method of claim 2, further comprising:
inputting the training set into the shallow feature layer to obtain a loss factor of the shallow feature layer;
outputting a structural loss function of the shallow feature layer based on the loss factor of the shallow feature layer;
wherein the functional structure of the structure loss function is designed based on the specific characteristics of the shallow feature layer.
4. The feature analysis based self-distillation learning method of claim 2, further comprising:
inputting the shallow feature knowledge into the middle feature layer to obtain a loss factor of the middle feature layer;
outputting a pairing loss function of the middle layer feature layer based on the loss factor of the middle layer feature layer;
wherein the function structure of the pairing loss function is designed based on the specific characteristics of the middle layer characteristic layer.
5. The feature analysis based self-distillation learning method of claim 2, further comprising:
inputting the middle layer feature knowledge into the deep layer feature layer to obtain a loss factor of the deep layer feature layer;
outputting a probability distribution loss function of the deep feature layer based on the loss factor of the deep feature layer;
wherein a function structure of the probability distribution loss function is designed based on the specific characteristics of the deep feature layer.
6. The feature analysis-based self-distillation learning method according to claim 1, wherein the step of optimizing the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network comprises:
obtaining an overall loss function output by the convolutional neural network;
weighting the whole loss function by the loss function of each part of feature layer according to preset weight to obtain a target loss function;
and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.
7. The feature analysis based self-distillation learning method of claim 6, further comprising
The step of weighting the overall loss function by the loss function of each partial feature layer according to a preset weight to obtain a target loss function comprises:
comparing the values of the loss function of each partial feature layer;
setting the weight value of the loss function of each part of the feature layer according to the comparison result;
and weighting the whole loss function by the loss function of each partial characteristic layer according to the weight value of the loss function of each partial characteristic layer to obtain the target loss function.
8. The terminal equipment is characterized by comprising a dividing module, a training module and an optimizing module; wherein,
the dividing module is used for dividing the convolutional layer of the convolutional neural network into n parts of characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
the training module is used for inputting a training set into the convolutional neural network for training to obtain a loss function of each part of the feature layer;
and the optimization module is used for optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
9. A terminal device, characterized in that the terminal device comprises a processor and a memory; the memory is stored with a computer program, and the processor is used for executing the computer program to realize the steps of the feature analysis-based self-distillation learning method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed, implements the steps of the feature analysis based self-distillation learning method according to any of claims 1-7.
CN202110146048.3A 2021-02-02 2021-02-02 Self-distillation learning method and device based on feature analysis and readable storage medium Active CN112862095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110146048.3A CN112862095B (en) 2021-02-02 2021-02-02 Self-distillation learning method and device based on feature analysis and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110146048.3A CN112862095B (en) 2021-02-02 2021-02-02 Self-distillation learning method and device based on feature analysis and readable storage medium

Publications (2)

Publication Number Publication Date
CN112862095A true CN112862095A (en) 2021-05-28
CN112862095B CN112862095B (en) 2023-09-29

Family

ID=75986335

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110146048.3A Active CN112862095B (en) 2021-02-02 2021-02-02 Self-distillation learning method and device based on feature analysis and readable storage medium

Country Status (1)

Country Link
CN (1) CN112862095B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486990A (en) * 2021-09-06 2021-10-08 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device
CN113507466A (en) * 2021-07-07 2021-10-15 浙江大学 Method and system for defending backdoor attack by knowledge distillation based on attention mechanism

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300811A1 (en) * 2016-04-14 2017-10-19 Linkedin Corporation Dynamic loss function based on statistics in loss layer of deep convolutional neural network
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN109858466A (en) * 2019-03-01 2019-06-07 北京视甄智能科技有限公司 A kind of face critical point detection method and device based on convolutional neural networks
CN109948573A (en) * 2019-03-27 2019-06-28 厦门大学 A kind of noise robustness face identification method based on cascade deep convolutional neural networks
CN110232203A (en) * 2019-04-22 2019-09-13 山东大学 Knowledge distillation optimization RNN has a power failure prediction technique, storage medium and equipment in short term
US20190325313A1 (en) * 2018-04-20 2019-10-24 Google Llc Systems and Methods for Regularizing Neural Networks
CN110472730A (en) * 2019-08-07 2019-11-19 交叉信息核心技术研究院(西安)有限公司 A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks
US20190354857A1 (en) * 2018-05-17 2019-11-21 Raytheon Company Machine learning using informed pseudolabels
CN111368673A (en) * 2020-02-26 2020-07-03 华南理工大学 Method for quickly extracting human body key points based on neural network
WO2020143225A1 (en) * 2019-01-08 2020-07-16 南京人工智能高等研究院有限公司 Neural network training method and apparatus, and electronic device
CN111598793A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Method and system for defogging image of power transmission line and storage medium
CN112016591A (en) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 Training method of image recognition model and image recognition method
WO2021012494A1 (en) * 2019-07-19 2021-01-28 平安科技(深圳)有限公司 Deep learning-based face recognition method and apparatus, and computer-readable storage medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300811A1 (en) * 2016-04-14 2017-10-19 Linkedin Corporation Dynamic loss function based on statistics in loss layer of deep convolutional neural network
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
US20190325313A1 (en) * 2018-04-20 2019-10-24 Google Llc Systems and Methods for Regularizing Neural Networks
US20190354857A1 (en) * 2018-05-17 2019-11-21 Raytheon Company Machine learning using informed pseudolabels
WO2020143225A1 (en) * 2019-01-08 2020-07-16 南京人工智能高等研究院有限公司 Neural network training method and apparatus, and electronic device
CN109858466A (en) * 2019-03-01 2019-06-07 北京视甄智能科技有限公司 A kind of face critical point detection method and device based on convolutional neural networks
CN109948573A (en) * 2019-03-27 2019-06-28 厦门大学 A kind of noise robustness face identification method based on cascade deep convolutional neural networks
CN110232203A (en) * 2019-04-22 2019-09-13 山东大学 Knowledge distillation optimization RNN has a power failure prediction technique, storage medium and equipment in short term
WO2021012494A1 (en) * 2019-07-19 2021-01-28 平安科技(深圳)有限公司 Deep learning-based face recognition method and apparatus, and computer-readable storage medium
CN110472730A (en) * 2019-08-07 2019-11-19 交叉信息核心技术研究院(西安)有限公司 A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks
CN111368673A (en) * 2020-02-26 2020-07-03 华南理工大学 Method for quickly extracting human body key points based on neural network
CN111598793A (en) * 2020-04-24 2020-08-28 云南电网有限责任公司电力科学研究院 Method and system for defogging image of power transmission line and storage medium
CN112016591A (en) * 2020-08-04 2020-12-01 杰创智能科技股份有限公司 Training method of image recognition model and image recognition method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
景雨;祁瑞华;刘建鑫;刘朝霞;: "基于改进多尺度深度卷积网络的手势识别算法", 计算机科学, no. 06 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113507466A (en) * 2021-07-07 2021-10-15 浙江大学 Method and system for defending backdoor attack by knowledge distillation based on attention mechanism
CN113486990A (en) * 2021-09-06 2021-10-08 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device
CN113486990B (en) * 2021-09-06 2021-12-21 北京字节跳动网络技术有限公司 Training method of endoscope image classification model, image classification method and device

Also Published As

Publication number Publication date
CN112862095B (en) 2023-09-29

Similar Documents

Publication Publication Date Title
US11551333B2 (en) Image reconstruction method and device
US20220270207A1 (en) Image processing method, apparatus, device, and computer-readable storage medium
CN114186632B (en) Method, device, equipment and storage medium for training key point detection model
WO2020119527A1 (en) Human action recognition method and apparatus, and terminal device and storage medium
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
CN108921225A (en) A kind of image processing method and device, computer equipment and storage medium
KR20220038475A (en) Video content recognition method and apparatus, storage medium, and computer device
CN114092963B (en) Method, device, equipment and storage medium for key point detection and model training
CN116363261B (en) Training method of image editing model, image editing method and device
CN112508782A (en) Network model training method, face image super-resolution reconstruction method and equipment
CN113657397B (en) Training method for circularly generating network model, method and device for establishing word stock
CN112862095B (en) Self-distillation learning method and device based on feature analysis and readable storage medium
CN111860276B (en) Human body key point detection method, device, network equipment and storage medium
CN110807379B (en) Semantic recognition method, semantic recognition device and computer storage medium
KR102637342B1 (en) Method and apparatus of tracking target objects and electric device
WO2019001323A1 (en) Signal processing system and method
CN115456167B (en) Lightweight model training method, image processing device and electronic equipment
CN114049491A (en) Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium
CN111339315B (en) Knowledge graph construction method, system, computer readable medium and electronic equipment
WO2023116744A1 (en) Image processing method and apparatus, device, and medium
EP4318314A1 (en) Image acquisition model training method and apparatus, image detection method and apparatus, and device
CN114758130B (en) Image processing and model training method, device, equipment and storage medium
CN116245157A (en) Facial expression representation model training method, facial expression recognition method and facial expression recognition device
CN113055666B (en) Video quality evaluation method and device
CN114723933A (en) Region information generation method and device, electronic equipment and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant