CN112862095A - Self-distillation learning method and device based on characteristic analysis and readable storage medium - Google Patents
Self-distillation learning method and device based on characteristic analysis and readable storage medium Download PDFInfo
- Publication number
- CN112862095A CN112862095A CN202110146048.3A CN202110146048A CN112862095A CN 112862095 A CN112862095 A CN 112862095A CN 202110146048 A CN202110146048 A CN 202110146048A CN 112862095 A CN112862095 A CN 112862095A
- Authority
- CN
- China
- Prior art keywords
- layer
- feature
- neural network
- convolutional neural
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 70
- 238000004821 distillation Methods 0.000 title claims abstract description 52
- 238000004458 analytical method Methods 0.000 title claims abstract description 32
- 230000006870 function Effects 0.000 claims abstract description 113
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 91
- 238000012549 training Methods 0.000 claims abstract description 50
- 238000004590 computer program Methods 0.000 claims description 10
- 238000005457 optimization Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 7
- 230000008569 process Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 8
- 238000001514 detection method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 238000013461 design Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 210000000707 wrist Anatomy 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 210000003423 ankle Anatomy 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 210000001513 elbow Anatomy 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 210000001624 hip Anatomy 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 210000003127 knee Anatomy 0.000 description 1
- 238000013140 knowledge distillation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000003739 neck Anatomy 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000013138 pruning Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 210000002832 shoulder Anatomy 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a self-distillation learning method, equipment and a readable storage medium based on feature analysis, wherein the self-distillation learning method based on feature analysis comprises the following steps: dividing a convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.
Description
Technical Field
The present application relates to the field of convolutional neural network training technologies, and in particular, to a self-distillation learning method and device based on feature analysis, and a readable storage medium.
Background
Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the range of applications to some areas where accuracy is critical, researchers have been studying methods to increase accuracy through deeper or wider network structures, which can bring exponential growth in computation and storage costs, and thus delay response times.
Applications such as image classification, object detection and semantic segmentation are currently evolving at an unprecedented rate with the help of convolutional neural networks. However, in some applications requiring non-fault tolerance, such as autopilot and medical image analysis, there is a need for further improvement of prediction and analysis accuracy, while requiring shorter response times. This leads to a huge challenge for current convolutional neural networks. The prior art approaches have focused on performance improvement or reduction of computational resources, thereby enabling reduction of response times. For example, on the one hand, ResNet 150 or even larger ResNet 1000 have been proposed to improve very limited performance margins, but at a large computational cost. On the other hand, with a predefined performance penalty compared to neural networks, various techniques have been proposed to reduce the amount of computation and memory to match the limitations imposed by hardware implementations. Such techniques include lightweight network design, pruning, quantization, etc., where knowledge distillation is one of the possible ways to achieve model compression.
The self-distillation learning method can be used for efficient training in the prior art, but the characteristics of feature layer knowledge of different depths are not considered, self-learning is uniformly carried out, and teaching according to the material is not carried out, so that the self-distillation learning effect is limited.
Disclosure of Invention
The application provides a self-distillation learning method, a self-distillation learning device and a readable storage medium based on feature analysis.
The technical scheme provided by the application is as follows: provided is a feature analysis-based self-distillation learning method, including:
dividing the convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
inputting a training set into the convolutional neural network for training to obtain a loss function of each part of feature layer;
and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
In some possible embodiments, the feature layers of the convolutional neural network partition include at least a shallow feature layer, a middle feature layer and a deep feature layer; wherein the shallow feature layer, the middle feature layer and the deep feature layer are connected in sequence;
the step of inputting a training set into the convolutional neural network for training includes:
inputting the training set into the shallow feature layer to obtain shallow feature knowledge;
inputting the shallow layer feature knowledge into the middle layer feature layer to obtain middle layer feature knowledge;
and inputting the middle layer characteristic knowledge into the deep layer characteristic layer to obtain deep layer characteristic knowledge.
In some possible embodiments, the self-distillation learning method further comprises:
inputting the training set into the shallow feature layer to obtain a loss factor of the shallow feature layer;
outputting a structural loss function of the shallow feature layer based on the loss factor of the shallow feature layer;
wherein the functional structure of the structure loss function is designed based on the specific characteristics of the shallow feature layer.
In some possible embodiments, the self-distillation learning method further comprises:
inputting the shallow feature knowledge into the middle feature layer to obtain a loss factor of the middle feature layer;
outputting a pairing loss function of the middle layer feature layer based on the loss factor of the middle layer feature layer;
wherein the function structure of the pairing loss function is designed based on the specific characteristics of the middle layer characteristic layer.
In some possible embodiments, the self-distillation learning method further comprises:
inputting the middle layer feature knowledge into the deep layer feature layer to obtain a loss factor of the deep layer feature layer;
outputting a probability distribution loss function of the deep feature layer based on the loss factor of the deep feature layer;
wherein a function structure of the probability distribution loss function is designed based on the specific characteristics of the deep feature layer.
In some possible embodiments, the step of optimizing the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network includes:
obtaining an overall loss function output by the convolutional neural network;
weighting the whole loss function by the loss function of each part of feature layer according to preset weight to obtain a target loss function;
and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.
In some possible embodiments, the self-distillation learning method further comprises
The step of weighting the overall loss function by the loss function of each partial feature layer according to a preset weight to obtain a target loss function comprises:
comparing the values of the loss function of each partial feature layer;
setting the weight value of the loss function of each part of the feature layer according to the comparison result;
and weighting the whole loss function by the loss function of each partial characteristic layer according to the weight value of the loss function of each partial characteristic layer to obtain the target loss function.
Another technical solution provided by the present application is: providing a terminal device, wherein the terminal device comprises a dividing module, a training module and an optimizing module; wherein,
the dividing module is used for dividing the convolutional layer of the convolutional neural network into n parts of characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
the training module is used for inputting a training set into the convolutional neural network for training to obtain a loss function of each part of the feature layer;
and the optimization module is used for optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
Another technical solution provided by the present application is: there is provided another terminal device comprising a processor and a memory, the memory having stored therein a computer program, the processor being configured to execute the computer program to implement the steps of the above-described feature analysis based self-distillation learning method.
Another technical scheme adopted by the application is as follows: there is provided a computer readable storage medium, wherein the computer readable storage medium stores a computer program which, when executed, implements the steps of the above-described feature analysis based self-distillation learning method.
Different from the prior art, the beneficial effects of this application lie in: the terminal equipment divides the convolutional layer of the convolutional neural network into n partial characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a schematic flow chart diagram illustrating one embodiment of a method for feature analysis-based self-distillation learning provided herein;
FIG. 2 is a schematic structural diagram of an embodiment of a convolutional neural network provided herein;
FIG. 3 is a training process for the structural loss function provided herein;
FIG. 4 is a training process for the pairwise loss function provided herein;
FIG. 5 is a training process for a probability distribution loss function provided herein;
FIG. 6 is a schematic flow chart showing the specific process of step 13 in the self-distillation learning method shown in FIG. 1;
fig. 7 is a schematic structural diagram of an embodiment of a terminal device provided in the present application;
fig. 8 is a schematic structural diagram of another embodiment of a terminal device provided in the present application;
FIG. 9 is a schematic structural diagram of an embodiment of a computer-readable storage medium provided in the present application.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The human body key point detection technology is used for accurately estimating n main key points of a human body in a picture or a video, and comprises the following steps: the main key points of the human body are left and right elbows, wrists, shoulders, heads, necks, ankles, knees, hips, soles and the like. The human body key point detection technology can be applied to judging the state of the human body, the posture of the human body and the like.
The convolutional neural network trained by the method can be used for a human key point detection technology, and a training set required by training comprises a plurality of human body images and relates to training conditions of different scenes, different angles and different illuminations.
The dynamic combined distillation learning method based on the feature analysis utilizes features between different layers of a convolutional neural network to perform distillation learning, effectively utilizes structural information of the convolutional neural network, and solves the limitation of distillation learning due to the fact that output information is directly used. Referring specifically to fig. 1, fig. 1 is a schematic flow chart of an embodiment of a feature analysis-based self-distillation learning method provided in the present application.
The main body of the self-distillation learning method of the present application may be a terminal device, for example, the self-distillation learning method may be executed by a terminal device or a server or other processing device, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a wireless phone, a Personal Digital Assistant (PDA), a handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like. In some possible implementations, the self-distillation learning method may be implemented by a processor calling computer readable instructions stored in a memory.
As shown in fig. 1, the self-distillation learning method based on feature analysis of the present embodiment specifically includes the following steps:
step S11: based on the depth and the original structure of the convolutional neural network, the convolutional layer of the convolutional neural network is divided into n partial feature layers by a set depth interval, wherein n is a positive integer and is more than or equal to 2.
The terminal equipment divides the convolutional layer of the convolutional neural network into at least two characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network required by the human body key point detection technology.
Fig. 2 is a schematic structural diagram of an embodiment of a convolutional neural network provided in the present application. Since the existing convolutional neural network is mostly distinguished in the form of stage, the convolutional neural network of fig. 2 can also be divided into four layers in the unit of stage in the embodiment of the present disclosure. It should be noted that the self-distillation learning method according to the embodiment of the present disclosure is also applicable to convolutional neural networks with other structures, and is not described herein again.
Wherein, Stage1 in fig. 2 can be called as a shallow feature layer, and the output of the shallow feature layer is called as shallow feature knowledge; stage2 in fig. 2 can be referred to as a middle layer feature layer, and the output of the middle layer feature layer is referred to as middle layer feature knowledge; stage3 in fig. 2 may be referred to as a deep feature layer, the output of which is referred to as deep feature knowledge. In addition, the feature layer of Stage4 in fig. 2 may be a feature layer with a deeper depth than a deeper feature layer, or may be a feature layer with the same depth as a deeper feature layer.
Step S12: and inputting the training set into a convolutional neural network for training to obtain the loss function of each part of the feature layer.
The terminal equipment inputs a training set prepared in advance into the convolutional neural network for training. Specifically, a training set is input into a Stage1 shallow feature layer for feature extraction and feature analysis to obtain shallow feature knowledge and loss factors of the shallow feature layer in the training process; then, the terminal equipment inputs the shallow feature knowledge into a middle feature layer of Stage2 to perform feature extraction and feature analysis, so as to obtain the middle feature knowledge and loss factors of the middle feature layer in the training process; and finally, the terminal equipment inputs the middle-layer feature knowledge into the Stage2 deep feature layer for feature extraction and feature analysis to obtain the deep feature knowledge and loss factors of the deep feature layer in the training process.
Through the training process, the terminal equipment can analyze the specificity of different characteristic layer characteristics according to the shallow characteristic knowledge, the middle characteristic knowledge and the deep characteristic knowledge. With a convolutional neural network required by a human key point detection technology, shallow feature knowledge output by a Stage1 shallow feature layer contains more accurate position information, but semantic information is less; the middle-layer feature knowledge output by the middle-layer feature layer of Stage2 not only contains relatively accurate position information, but also has semantic information to a certain extent; the deep feature knowledge output by the Stage3 deep feature layer has relatively strong semantic information, but less location information. Therefore, the characteristic specificity output by the characteristic layers with different depths has certain difference, the characteristic specificity brought by the structural information of the convolutional neural network is effectively utilized, and the efficiency and the effect of self-distillation learning can be improved by using a multilayer characteristic analysis strategy.
Further, the terminal device designs a targeted loss function for each part of the feature layer based on the feature layer specificity characteristics of each part of the feature layer. The loss function is used for measuring the prediction effect of the convolutional neural network and is used for expressing the gap degree between the predicted data and the actual data.
Specifically, according to the characteristics of the Stage1 shallow feature layer, the disclosed embodiment designs a StructureLoss function for constraining the structural information of the human body in the severe occlusion situation. Referring specifically to fig. 3, fig. 3 illustrates a training process of the structure loss function provided in the present application. The terminal equipment inputs a portrait image, namely Init person in fig. 3, to the convolutional neural network, then the convolutional neural network obtains a loss factor according to the difference degree of the prediction data PTStructure and the actual data GTStructure, and finally outputs a structural loss function of the shallow feature layer according to the loss factor. The calculation of the structural loss function requires comparison of position information without semantic information, and therefore can be better adapted to shallow features than other types of loss functions.
According to the characteristics of a layer feature layer in Stage2, the PairLoss (pairing loss) function is designed to restrict the problem of prediction confusion between pairs of similar key points of a human body under the condition of occlusion congestion. Referring specifically to fig. 4, fig. 4 illustrates a training process of the pairing loss function provided in the present application. Taking human body paired similar key points of the left wrist and the right wrist as an example, the terminal device inputs a portrait image, i.e., Init person in fig. 4, to the convolutional neural network, then the convolutional neural network outputs a right-hand thermal map RightHand Heatmap and a left-hand thermal map LeftHand Heatmap, the difference degree between the right-hand thermal map and the left-hand thermal map is compared to obtain a loss factor, and finally, a pairing loss function of the middle layer feature layer is output according to the loss factor. The calculation of the pairing loss function not only needs corresponding position information, but also needs certain semantic information, so that the pairing loss function can be better adapted to middle-layer characteristics compared with other types of loss functions.
According to the characteristics of the Stage3 deep feature layer, the embodiment of the disclosure designs a Probability distribution Loss function for representing the feature distribution of key points of a human body, so that the network can be converged better. Referring specifically to fig. 5, fig. 5 illustrates a training process of the probability distribution loss function provided in the present application. The convolutional neural network identifies and acquires the positions of the characteristic points in the image, generates probability distribution of the characteristic points, obtains loss factors by comparing the probability distribution condition of the middle convolutional layer with the probability distribution condition of the final convolutional layer, and finally outputs the probability distribution loss function of the deep characteristic layer according to the loss factors. The probability distribution loss function needs stronger feature resolution capability and semantic capability, so that the probability distribution loss function can be better adapted to deep features compared with other types of loss functions.
The self-distillation learning method of the embodiment of the disclosure analyzes the properties of the feature knowledge of different layers, makes the learning process easier by guiding the learning process layer by layer and gradually, and solves the problem that the learning becomes difficult by a single learning process.
Step S13: and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
The terminal device obtains the specific loss functions of three different levels according to the steps, and finally considers that the influence degrees of the specific loss functions of the different levels on learning are different in order to better integrate the influence of all the loss functions on the final learning result, so that the weight dynamically updated according to the learning process is designed in the embodiment of the disclosure, and the whole learning process is more stable and effective.
Referring to fig. 6, fig. 6 is a schematic flow chart of step 13 in the self-distillation learning method shown in fig. 1. As shown in fig. 6, step 13 may specifically include the following steps:
step S131: and acquiring the overall loss function of the output of the convolutional neural network.
The terminal equipment acquires a loss function of each part of the feature layer on one hand and acquires an overall loss function output by the convolutional neural network on the other hand, and specifically may be a KL divergence (Kullback-Leibler) loss function and/or a MSE mean square error loss function.
Step S132: and weighting the whole loss function by the loss function of each part of the characteristic layer according to a preset weight to obtain a target loss function.
And the terminal equipment performs weighting processing on the loss function of each part of the characteristic layer and the whole loss function according to a preset weight to obtain a target loss function. It should be noted that the preset weight may also be dynamically updated in the learning process, so as to balance the influence of each loss function on the final learning result.
Specifically, after the terminal device obtains the loss function of each partial feature layer and the specific value of the overall loss function, the terminal device calculates an appropriate weight according to the value of each loss function, so that the value of each loss function is weighted and then is more average, and the influence of each loss function on the final learning result is balanced.
Step S133: and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.
And the terminal equipment optimizes the convolutional neural network according to the target loss function to obtain the trained convolutional neural network. The embodiment of the disclosure specifically analyzes specific characteristics of different layers of feature knowledge, sets a specific loss function according to the specific characteristics, realizes teaching according to the material, and constructs dynamic weights for the features between different layers by using dynamic shadows, thereby solving the problem of limited self-distillation learning effect caused by teaching according to the material because of no differential treatment.
In the embodiment of the disclosure, the terminal device divides the convolutional layer of the convolutional neural network into n partial feature layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is greater than or equal to 2; inputting the training set into a convolutional neural network for training to obtain a loss function of each part of feature layer; and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network. By the method, distillation learning can be performed by using the loss functions of the feature layers of different parts, the structural information of the convolutional neural network is effectively used, and the self-distillation learning effect is improved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
In order to implement the feature analysis-based self-distillation learning method of the foregoing embodiment, the present application further provides a terminal device, and specifically please refer to fig. 7, where fig. 7 is a schematic structural diagram of an embodiment of the terminal device provided in the present application.
As shown in fig. 7, the terminal device 400 of the present embodiment includes a dividing module 41, a training module 42, and an optimizing module 43.
The dividing module 41 is configured to divide a convolutional layer of a convolutional neural network into n partial feature layers in a set depth interval based on a depth and an original structure of the convolutional neural network, where n is a positive integer and n is greater than or equal to 2; the training module 42 is configured to input a training set into the convolutional neural network for training, and obtain a loss function of each partial feature layer; and the optimization module 43 is configured to optimize the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network.
In order to implement the feature analysis-based self-distillation learning method of the foregoing embodiment, the present application further provides another terminal device, and specifically please refer to fig. 8, where fig. 8 is a schematic structural diagram of another embodiment of the terminal device provided by the present application.
As shown in fig. 8, the terminal device 500 of the present embodiment includes a processor 51, a memory 52, an input-output device 53, and a bus 54.
The processor 51, the memory 52, and the input/output device 53 are respectively connected to the bus 54, the memory 52 stores a computer program, and the processor 51 is configured to execute the computer program to implement the feature analysis-based self-distillation learning method according to the above embodiment.
In the present embodiment, the processor 51 may also be referred to as a CPU (Central Processing Unit). The processor 51 may be an integrated circuit chip having signal processing capabilities. The processor 51 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The processor 51 may also be a GPU (Graphics Processing Unit), which is also called a display core, a visual processor, and a display chip, and is a microprocessor specially used for image operation on a personal computer, a workstation, a game machine, and some mobile devices (such as a tablet computer, a smart phone, etc.). The GPU is used for converting and driving display information required by a computer system, providing a line scanning signal for a display and controlling the display of the display correctly, is an important element for connecting the display and a personal computer mainboard, and is also one of important devices for man-machine conversation. The display card is an important component in the computer host, takes charge of outputting display graphics, and is very important for people engaged in professional graphic design. A general purpose processor may be a microprocessor or the processor 51 may be any conventional processor or the like.
The present application also provides a computer-readable storage medium, as shown in fig. 9, the computer-readable storage medium 600 is used for storing a computer program 61, and the computer program 61 is used for implementing the method as described in the embodiment of the self-distillation learning method based on feature analysis in the present application when being executed by a processor.
The method involved in the embodiment of the self-distillation learning method based on feature analysis can be stored in equipment, such as a computer readable storage medium, when the method is implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes performed by the present specification and drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A self-distillation learning method based on feature analysis, the self-distillation learning method comprising:
dividing the convolutional layer of the convolutional neural network into n partial feature layers by a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
inputting a training set into the convolutional neural network for training to obtain a loss function of each part of feature layer;
and optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
2. The feature analysis-based self-distillation learning method according to claim 1, wherein the feature layers divided by the convolutional neural network at least comprise a shallow feature layer, a middle feature layer and a deep feature layer; wherein the shallow feature layer, the middle feature layer and the deep feature layer are connected in sequence;
the step of inputting a training set into the convolutional neural network for training includes:
inputting the training set into the shallow feature layer to obtain shallow feature knowledge;
inputting the shallow layer feature knowledge into the middle layer feature layer to obtain middle layer feature knowledge;
and inputting the middle layer characteristic knowledge into the deep layer characteristic layer to obtain deep layer characteristic knowledge.
3. The feature analysis based self-distillation learning method of claim 2, further comprising:
inputting the training set into the shallow feature layer to obtain a loss factor of the shallow feature layer;
outputting a structural loss function of the shallow feature layer based on the loss factor of the shallow feature layer;
wherein the functional structure of the structure loss function is designed based on the specific characteristics of the shallow feature layer.
4. The feature analysis based self-distillation learning method of claim 2, further comprising:
inputting the shallow feature knowledge into the middle feature layer to obtain a loss factor of the middle feature layer;
outputting a pairing loss function of the middle layer feature layer based on the loss factor of the middle layer feature layer;
wherein the function structure of the pairing loss function is designed based on the specific characteristics of the middle layer characteristic layer.
5. The feature analysis based self-distillation learning method of claim 2, further comprising:
inputting the middle layer feature knowledge into the deep layer feature layer to obtain a loss factor of the deep layer feature layer;
outputting a probability distribution loss function of the deep feature layer based on the loss factor of the deep feature layer;
wherein a function structure of the probability distribution loss function is designed based on the specific characteristics of the deep feature layer.
6. The feature analysis-based self-distillation learning method according to claim 1, wherein the step of optimizing the convolutional neural network based on the loss functions of all feature layers to obtain a trained convolutional neural network comprises:
obtaining an overall loss function output by the convolutional neural network;
weighting the whole loss function by the loss function of each part of feature layer according to preset weight to obtain a target loss function;
and optimizing the convolutional neural network based on the target loss function to obtain the trained convolutional neural network.
7. The feature analysis based self-distillation learning method of claim 6, further comprising
The step of weighting the overall loss function by the loss function of each partial feature layer according to a preset weight to obtain a target loss function comprises:
comparing the values of the loss function of each partial feature layer;
setting the weight value of the loss function of each part of the feature layer according to the comparison result;
and weighting the whole loss function by the loss function of each partial characteristic layer according to the weight value of the loss function of each partial characteristic layer to obtain the target loss function.
8. The terminal equipment is characterized by comprising a dividing module, a training module and an optimizing module; wherein,
the dividing module is used for dividing the convolutional layer of the convolutional neural network into n parts of characteristic layers in a set depth interval based on the depth and the original structure of the convolutional neural network, wherein n is a positive integer and is more than or equal to 2;
the training module is used for inputting a training set into the convolutional neural network for training to obtain a loss function of each part of the feature layer;
and the optimization module is used for optimizing the convolutional neural network based on the loss functions of all the characteristic layers to obtain the trained convolutional neural network.
9. A terminal device, characterized in that the terminal device comprises a processor and a memory; the memory is stored with a computer program, and the processor is used for executing the computer program to realize the steps of the feature analysis-based self-distillation learning method according to any one of claims 1-7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed, implements the steps of the feature analysis based self-distillation learning method according to any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110146048.3A CN112862095B (en) | 2021-02-02 | 2021-02-02 | Self-distillation learning method and device based on feature analysis and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110146048.3A CN112862095B (en) | 2021-02-02 | 2021-02-02 | Self-distillation learning method and device based on feature analysis and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112862095A true CN112862095A (en) | 2021-05-28 |
CN112862095B CN112862095B (en) | 2023-09-29 |
Family
ID=75986335
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110146048.3A Active CN112862095B (en) | 2021-02-02 | 2021-02-02 | Self-distillation learning method and device based on feature analysis and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112862095B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486990A (en) * | 2021-09-06 | 2021-10-08 | 北京字节跳动网络技术有限公司 | Training method of endoscope image classification model, image classification method and device |
CN113507466A (en) * | 2021-07-07 | 2021-10-15 | 浙江大学 | Method and system for defending backdoor attack by knowledge distillation based on attention mechanism |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170300811A1 (en) * | 2016-04-14 | 2017-10-19 | Linkedin Corporation | Dynamic loss function based on statistics in loss layer of deep convolutional neural network |
US20180268292A1 (en) * | 2017-03-17 | 2018-09-20 | Nec Laboratories America, Inc. | Learning efficient object detection models with knowledge distillation |
CN109858466A (en) * | 2019-03-01 | 2019-06-07 | 北京视甄智能科技有限公司 | A kind of face critical point detection method and device based on convolutional neural networks |
CN109948573A (en) * | 2019-03-27 | 2019-06-28 | 厦门大学 | A kind of noise robustness face identification method based on cascade deep convolutional neural networks |
CN110232203A (en) * | 2019-04-22 | 2019-09-13 | 山东大学 | Knowledge distillation optimization RNN has a power failure prediction technique, storage medium and equipment in short term |
US20190325313A1 (en) * | 2018-04-20 | 2019-10-24 | Google Llc | Systems and Methods for Regularizing Neural Networks |
CN110472730A (en) * | 2019-08-07 | 2019-11-19 | 交叉信息核心技术研究院(西安)有限公司 | A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks |
US20190354857A1 (en) * | 2018-05-17 | 2019-11-21 | Raytheon Company | Machine learning using informed pseudolabels |
CN111368673A (en) * | 2020-02-26 | 2020-07-03 | 华南理工大学 | Method for quickly extracting human body key points based on neural network |
WO2020143225A1 (en) * | 2019-01-08 | 2020-07-16 | 南京人工智能高等研究院有限公司 | Neural network training method and apparatus, and electronic device |
CN111598793A (en) * | 2020-04-24 | 2020-08-28 | 云南电网有限责任公司电力科学研究院 | Method and system for defogging image of power transmission line and storage medium |
CN112016591A (en) * | 2020-08-04 | 2020-12-01 | 杰创智能科技股份有限公司 | Training method of image recognition model and image recognition method |
WO2021012494A1 (en) * | 2019-07-19 | 2021-01-28 | 平安科技(深圳)有限公司 | Deep learning-based face recognition method and apparatus, and computer-readable storage medium |
-
2021
- 2021-02-02 CN CN202110146048.3A patent/CN112862095B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170300811A1 (en) * | 2016-04-14 | 2017-10-19 | Linkedin Corporation | Dynamic loss function based on statistics in loss layer of deep convolutional neural network |
US20180268292A1 (en) * | 2017-03-17 | 2018-09-20 | Nec Laboratories America, Inc. | Learning efficient object detection models with knowledge distillation |
US20190325313A1 (en) * | 2018-04-20 | 2019-10-24 | Google Llc | Systems and Methods for Regularizing Neural Networks |
US20190354857A1 (en) * | 2018-05-17 | 2019-11-21 | Raytheon Company | Machine learning using informed pseudolabels |
WO2020143225A1 (en) * | 2019-01-08 | 2020-07-16 | 南京人工智能高等研究院有限公司 | Neural network training method and apparatus, and electronic device |
CN109858466A (en) * | 2019-03-01 | 2019-06-07 | 北京视甄智能科技有限公司 | A kind of face critical point detection method and device based on convolutional neural networks |
CN109948573A (en) * | 2019-03-27 | 2019-06-28 | 厦门大学 | A kind of noise robustness face identification method based on cascade deep convolutional neural networks |
CN110232203A (en) * | 2019-04-22 | 2019-09-13 | 山东大学 | Knowledge distillation optimization RNN has a power failure prediction technique, storage medium and equipment in short term |
WO2021012494A1 (en) * | 2019-07-19 | 2021-01-28 | 平安科技(深圳)有限公司 | Deep learning-based face recognition method and apparatus, and computer-readable storage medium |
CN110472730A (en) * | 2019-08-07 | 2019-11-19 | 交叉信息核心技术研究院(西安)有限公司 | A kind of distillation training method and the scalable dynamic prediction method certainly of convolutional neural networks |
CN111368673A (en) * | 2020-02-26 | 2020-07-03 | 华南理工大学 | Method for quickly extracting human body key points based on neural network |
CN111598793A (en) * | 2020-04-24 | 2020-08-28 | 云南电网有限责任公司电力科学研究院 | Method and system for defogging image of power transmission line and storage medium |
CN112016591A (en) * | 2020-08-04 | 2020-12-01 | 杰创智能科技股份有限公司 | Training method of image recognition model and image recognition method |
Non-Patent Citations (1)
Title |
---|
景雨;祁瑞华;刘建鑫;刘朝霞;: "基于改进多尺度深度卷积网络的手势识别算法", 计算机科学, no. 06 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113507466A (en) * | 2021-07-07 | 2021-10-15 | 浙江大学 | Method and system for defending backdoor attack by knowledge distillation based on attention mechanism |
CN113486990A (en) * | 2021-09-06 | 2021-10-08 | 北京字节跳动网络技术有限公司 | Training method of endoscope image classification model, image classification method and device |
CN113486990B (en) * | 2021-09-06 | 2021-12-21 | 北京字节跳动网络技术有限公司 | Training method of endoscope image classification model, image classification method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112862095B (en) | 2023-09-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11551333B2 (en) | Image reconstruction method and device | |
US20220270207A1 (en) | Image processing method, apparatus, device, and computer-readable storage medium | |
CN114186632B (en) | Method, device, equipment and storage medium for training key point detection model | |
WO2020119527A1 (en) | Human action recognition method and apparatus, and terminal device and storage medium | |
CN110689599A (en) | 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement | |
CN108921225A (en) | A kind of image processing method and device, computer equipment and storage medium | |
KR20220038475A (en) | Video content recognition method and apparatus, storage medium, and computer device | |
CN114092963B (en) | Method, device, equipment and storage medium for key point detection and model training | |
CN116363261B (en) | Training method of image editing model, image editing method and device | |
CN112508782A (en) | Network model training method, face image super-resolution reconstruction method and equipment | |
CN113657397B (en) | Training method for circularly generating network model, method and device for establishing word stock | |
CN112862095B (en) | Self-distillation learning method and device based on feature analysis and readable storage medium | |
CN111860276B (en) | Human body key point detection method, device, network equipment and storage medium | |
CN110807379B (en) | Semantic recognition method, semantic recognition device and computer storage medium | |
KR102637342B1 (en) | Method and apparatus of tracking target objects and electric device | |
WO2019001323A1 (en) | Signal processing system and method | |
CN115456167B (en) | Lightweight model training method, image processing device and electronic equipment | |
CN114049491A (en) | Fingerprint segmentation model training method, fingerprint segmentation device, fingerprint segmentation equipment and fingerprint segmentation medium | |
CN111339315B (en) | Knowledge graph construction method, system, computer readable medium and electronic equipment | |
WO2023116744A1 (en) | Image processing method and apparatus, device, and medium | |
EP4318314A1 (en) | Image acquisition model training method and apparatus, image detection method and apparatus, and device | |
CN114758130B (en) | Image processing and model training method, device, equipment and storage medium | |
CN116245157A (en) | Facial expression representation model training method, facial expression recognition method and facial expression recognition device | |
CN113055666B (en) | Video quality evaluation method and device | |
CN114723933A (en) | Region information generation method and device, electronic equipment and computer readable medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |