CN113610069B - Knowledge distillation-based target detection model training method - Google Patents

Knowledge distillation-based target detection model training method Download PDF

Info

Publication number
CN113610069B
CN113610069B CN202111179182.XA CN202111179182A CN113610069B CN 113610069 B CN113610069 B CN 113610069B CN 202111179182 A CN202111179182 A CN 202111179182A CN 113610069 B CN113610069 B CN 113610069B
Authority
CN
China
Prior art keywords
target detection
detection frame
pixel position
label
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111179182.XA
Other languages
Chinese (zh)
Other versions
CN113610069A (en
Inventor
张志嵩
曹松
任必为
宋君
陶海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Vion Intelligent Technology Co ltd
Original Assignee
Beijing Vion Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Vion Intelligent Technology Co ltd filed Critical Beijing Vion Intelligent Technology Co ltd
Priority to CN202111179182.XA priority Critical patent/CN113610069B/en
Publication of CN113610069A publication Critical patent/CN113610069A/en
Application granted granted Critical
Publication of CN113610069B publication Critical patent/CN113610069B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a knowledge distillation-based target detection model training method, which comprises the following steps: training a target detection teacher model using a training sample image set, the training sample image having: a first label: a hard tag probability matrix of the pixel position of the central point of the target detection frame; a second label: width and height of the target detection frame; a third label: the pixel position offset of the center point of the target detection frame; the prediction output result of the target detection teacher model comprises the following steps: the pixel position probability thermodynamic diagram of the center point of the target detection frame, the width and the height of the target detection frame and the pixel position offset of the center point of the target detection frame; and after the loss function of the target detection student model is improved in a knowledge distillation mode, training to generate the target detection student model. The invention solves the problems that the target detection model obtained by training by using the existing knowledge distillation method cannot simultaneously ensure that the network structure is simple and meets the use requirement of terminal equipment, and the recognition rate of the target detection model is excellent so as to ensure the detection precision of the model.

Description

Knowledge distillation-based target detection model training method
Technical Field
The invention relates to the technical field of artificial intelligence model training, in particular to a knowledge distillation-based target detection model training method.
Background
Knowledge distillation is to guide the training of the network structure of a student model by introducing the network structure of a teacher model, thereby realizing knowledge migration. The method comprises the specific steps of firstly training a teacher model, and then training a student model by using the output of the teacher model and the real label of data, so that the knowledge of the network structure of the teacher model is transferred to the network structure of the student model, the network structure of the student model is enabled to be as small as possible and the parameter quantity is less while the network structure of the student model can obtain the performance close to the network structure of the teacher model, and therefore the method is more beneficial to reducing the calculation force requirement on the deployment model and improving the reasoning efficiency of the model.
The terminal device that performs the object detection task is generally a small device such as a video camera, a camera, or a monitor probe, and the size of the network structure of the object detection model is strictly limited because the computational effort of the chip mounted thereon is limited. Although the target detection model obtained by training by using the traditional knowledge distillation method can match the computational power requirement of the terminal equipment in the network structure size; but the accuracy of the obtained target detection model when the target detection task is implemented cannot be guaranteed.
The reason is that the traditional knowledge distillation method is usually used for training a model for implementing a single classification task, and the target detection task implemented by the target detection model adopting the centret network structure simultaneously comprises a classification task and a regression task, so that the network structure of the target detection model is relatively complex, the traditional knowledge distillation method directly replaces a real label part in a loss function of a student model with the output of a teacher model, and does not perform hierarchical classification guidance and optimization on the loss function of the target detection model, so that the finally trained target detection model has the problems of poor recognition effect and low detection precision.
Therefore, how to use the knowledge distillation method to train the target detection model while considering the simple network structure to meet the use requirements of the terminal device and simultaneously ensuring the excellent recognition rate of the target detection model to ensure the detection accuracy of the model becomes a problem to be solved urgently in the prior art.
Disclosure of Invention
The invention mainly aims to provide a knowledge distillation-based target detection model training method, which aims to solve the problems that a target detection model obtained by training with a knowledge distillation method in the prior art cannot simultaneously ensure that a network structure is simple and meets the use requirement of terminal equipment, and the recognition rate of the target detection model is excellent so as to ensure the detection precision of the model.
In order to achieve the above object, the present invention provides a knowledge-based training method for a target detection model, comprising: step S1, training a generation target detection teacher model using a training sample image set, each training sample image in the training sample image set having: a first label: a hard tag probability matrix of the pixel position of the central point of the target detection frame; a second label: width and height of the target detection frame; a third label: the pixel position offset of the center point of the target detection frame; the predicted output results of the target detection teacher model corresponding to the three types of labels include: the pixel position probability thermodynamic diagram of the center point of the target detection frame, the width and the height of the target detection frame and the pixel position offset of the center point of the target detection frame; and step S2, after the loss function of the target detection student model is improved through the target detection teacher model in a knowledge distillation mode, the training sample image set and the prediction output result are used for training to generate the target detection student model.
Further, Loss function Loss of the target detection student modeltotalIs defined as:
Figure 560595DEST_PATH_IMAGE001
…………………………(1),
therein, LosshmA loss function part corresponding to the target detection frame center point pixel position probability thermodynamic diagram output by the target detection student model prediction; losswhA loss function part corresponding to the width and height of a target detection frame output for the target detection student model prediction; lossregA loss function part corresponding to the pixel position offset of the center point of the target detection frame output by the target detection student model prediction; lambda [ alpha ]whWeighting proportion coefficients of loss function parts corresponding to the width and the height of the target detection frame; lambda [ alpha ]regAnd the weight proportion coefficient is the loss function part of the pixel position offset of the central point of the target detection frame.
Further, a Loss function part Loss corresponding to the target detection frame center point pixel position probability thermodynamic diagram output by the target detection student model predictionhmIs defined as:
Figure 766317DEST_PATH_IMAGE002
…………………………(2),
wherein the content of the first and second substances,
Figure 850948DEST_PATH_IMAGE003
converting a hard label probability matrix of a central point pixel position of a target detection frame corresponding to a first label to obtain a sub-loss function guided by a soft label probability matrix of the central point pixel position of the target detection frame;
Figure 567362DEST_PATH_IMAGE004
a sub-loss function guided by a target detection teacher model and a soft label probability matrix of the pixel position of the center point of a target detection frame corresponding to the first label together; lambda [ alpha ]hmAnd the weight proportion coefficient of the sub-loss function is guided by the target detection teacher model and the soft label probability matrix of the target detection frame central point pixel position corresponding to the first label.
Further, the air conditioner is provided with a fan,
Figure 421049DEST_PATH_IMAGE003
as focalloss loss function, sub-loss function
Figure 797672DEST_PATH_IMAGE003
Is defined as:
Figure 228653DEST_PATH_IMAGE005
…………………(3)
Figure 466868DEST_PATH_IMAGE004
as a loss function based on knowledge distillation, a sub-loss function
Figure 722531DEST_PATH_IMAGE004
Is defined as:
Figure 286368DEST_PATH_IMAGE006
………(4),
n is the number of pixel points in a target detection frame center point pixel position probability thermodynamic diagram output by target detection student model prediction;
Figure 735803DEST_PATH_IMAGE007
the probability value of a digital coordinate point (x, y) in the pixel position soft label probability matrix of the central point of the target detection frame is obtained after coordinate transformation is carried out on the pixel position hard label probability matrix of the central point of the target detection frame;
Figure 590758DEST_PATH_IMAGE008
predicting the probability value of a pixel point (x, y) in the pixel position probability thermodynamic diagram of the center point of the target detection frame output by the target detection teacher model;
Figure 684616DEST_PATH_IMAGE009
predicting the probability value of a pixel point (x, y) in a target detection frame center point pixel position probability thermodynamic diagram output by a target detection student model;
Figure 403042DEST_PATH_IMAGE010
and
Figure 683982DEST_PATH_IMAGE011
are all exponential constants.
Further, the hard label probability matrix of the pixel position of the center point of the target detection frame is transformed through a Gaussian kernel function coordinate to obtain a soft label probability matrix of the pixel position of the center point of the target detection frame; probability value of digital coordinate point (x, y) of target detection frame central point pixel position soft label probability matrix
Figure 3013DEST_PATH_IMAGE007
Is the result value G of the Gaussian kernel function; the Gaussian kernel function is:
Figure 951377DEST_PATH_IMAGE012
… … … … … … … … … … (5) where m and n are the pixel positions of the center point of the target detection frameThe abscissa and the ordinate of a digit coordinate point with the probability value of 1 in the hard tag probability matrix; x and y are respectively the abscissa and the ordinate of any one digital coordinate point in the soft label probability matrix of the pixel position of the central point of the target detection frame;
Figure 106284DEST_PATH_IMAGE013
is a scale constant corresponding to the target detection box.
Further, when a plurality of digital coordinate points with the probability value of 1 are arranged in the target detection frame central point pixel position hard label probability matrix, the probability value of each digital coordinate point (x, y) in the target detection frame central point pixel position soft label probability matrix
Figure 405678DEST_PATH_IMAGE007
The largest of the multiple gaussian kernel result values G is taken.
Further, the width and height of the target detection frame output by the target detection student model prediction corresponds to the Loss function part LosswhIs as follows;
Figure 540118DEST_PATH_IMAGE014
… … … … … … … … … … … … … … … … … (6), wherein,
Figure 342989DEST_PATH_IMAGE015
sub-loss functions for width and height guidance of a target detection frame corresponding to the second label;
Figure 668797DEST_PATH_IMAGE016
a sub-loss function which is used for jointly guiding the width and the height of the target detection frame corresponding to the target detection teacher model and the second label;
Figure 189909DEST_PATH_IMAGE017
and the weighting proportion coefficient of the sub-loss function is guided by the width and the height of the target detection frame corresponding to the target detection teacher model and the second label.
Further, a sub-loss function
Figure 393619DEST_PATH_IMAGE015
Is defined as:
Figure 910051DEST_PATH_IMAGE018
……………………………………………………(7)
function of sub-loss
Figure 891914DEST_PATH_IMAGE016
Is defined as:
Figure 884009DEST_PATH_IMAGE019
……………(8),
k is the number of the width and the height of the target detection frame corresponding to the second label in the training sample image; k refers to any one second label in the training sample image;
Figure 609520DEST_PATH_IMAGE020
the product of the width and the height of a target detection frame corresponding to a second label in the training sample image is obtained;
Figure 246037DEST_PATH_IMAGE021
predicting the product of the width and the height of an output target detection frame for the target detection student model;
Figure 149534DEST_PATH_IMAGE022
predicting the product of the width and the height of the output target detection box for the target detection teacher model;
Figure 379658DEST_PATH_IMAGE023
is composed of
Figure 423706DEST_PATH_IMAGE020
And
Figure 55676DEST_PATH_IMAGE021
the L1 distance therebetween;
Figure 926811DEST_PATH_IMAGE024
is composed of
Figure 34444DEST_PATH_IMAGE020
And
Figure 226391DEST_PATH_IMAGE021
the L2 distance therebetween;
Figure 712867DEST_PATH_IMAGE025
is composed of
Figure 958166DEST_PATH_IMAGE022
And
Figure 897303DEST_PATH_IMAGE021
the L2 distance therebetween;
Figure 283154DEST_PATH_IMAGE026
is a first spacing constant.
Further, a Loss function part Loss corresponding to the pixel position offset of the center point of the target detection frame output by the target detection student model predictionregComprises the following steps:
Figure 748770DEST_PATH_IMAGE027
…………………………………(9),
wherein the content of the first and second substances,
Figure 679817DEST_PATH_IMAGE028
a sub-loss function guided by the pixel position offset of the center point of the target detection frame corresponding to the third label;
Figure 122562DEST_PATH_IMAGE029
a sub-loss function which is guided by the target detection teacher model and the target detection frame central point pixel position offset corresponding to the third label together;
Figure 797257DEST_PATH_IMAGE030
and the weight proportion coefficient of the sub-loss function is guided by the target detection teacher model and the target detection frame center point pixel position offset corresponding to the third label.
Further, a sub-loss function
Figure 648539DEST_PATH_IMAGE028
Is defined as:
Figure 875121DEST_PATH_IMAGE031
……………………………………………………(10)
function of sub-loss
Figure 805162DEST_PATH_IMAGE029
Is defined as:
Figure 17968DEST_PATH_IMAGE032
……………(11)
z is the number of the pixel position offsets of the center point of the target detection frame corresponding to the third label in the training sample image; z refers to any one third label in the training sample image;
Figure 582811DEST_PATH_IMAGE033
multiplying the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame corresponding to the third label in the training sample image;
Figure 590081DEST_PATH_IMAGE034
predicting the product of the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame output by the target detection student model;
Figure 19137DEST_PATH_IMAGE035
target detection frame center point pixel location for target detection teacher model prediction outputThe product of the horizontal axis offset and the vertical axis offset of the offset;
Figure 770055DEST_PATH_IMAGE036
is composed of
Figure 64770DEST_PATH_IMAGE033
And
Figure 649466DEST_PATH_IMAGE034
the L1 distance therebetween;
Figure 315284DEST_PATH_IMAGE037
is composed of
Figure 728948DEST_PATH_IMAGE033
And
Figure 753536DEST_PATH_IMAGE034
the L2 distance therebetween;
Figure 351876DEST_PATH_IMAGE038
is composed of
Figure 586549DEST_PATH_IMAGE035
And
Figure 679270DEST_PATH_IMAGE034
the L2 distance therebetween;
Figure 309096DEST_PATH_IMAGE039
is a second spacing constant.
By applying the technical scheme of the invention, label classification is carried out on the training sample images of the training sample image set, the target detection tasks of the trained target detection teacher model can be clearly distinguished according to the classified labels, specifically, the probability thermodynamic diagram of the pixel position of the center point of the target detection frame is obtained from the prediction output result of the target detection teacher model, the classification tasks belong to, and the width and the height of the target detection frame and the offset of the pixel position of the center point of the target detection frame are regression tasks. Therefore, in the process of guiding the training of the target detection student model by using the target detection teacher model, the loss function of the target detection student model can be pertinently classified, improved and optimized according to the task type of the target detection task, so that the network structure of the obtained target detection student model is ensured to be simple enough to meet the use requirement of terminal equipment by relying on knowledge distillation, the target detection student model can be better ensured to migrate and acquire the knowledge of the target detection teacher model, the performance of the target detection teacher model is inherited, the target detection student model has excellent identification effect and detection precision, and the target detection student model has good practicability.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 illustrates a flow chart of steps of a knowledge-based distillation target detection model training method according to the present invention;
FIG. 2 is a schematic diagram of training sample images of an alternative embodiment of a set of training sample images showing a target pedestrian whose head is the detection target, selected using a target detection frame, when implementing the knowledge-based distillation target detection model training method of the present invention;
FIG. 3 illustrates a first label, i.e., a hard label probability matrix of the pixel position of the center point of the target detection frame, of the training sample image in FIG. 2;
fig. 4 shows the soft label probability matrix of the pixel position of the center point of the target detection frame after the hard label probability matrix of the pixel position of the center point of the target detection frame in fig. 3 is transformed.
Wherein the figures include the following reference numerals:
A. a target pedestrian; B. a head of the target pedestrian; C. and (5) detecting a target.
Detailed Description
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," "includes," "including," "has," "having," and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The invention provides a knowledge distillation-based target detection model training method, aiming at solving the problems that a target detection model obtained by training by using a knowledge distillation method in the prior art cannot simultaneously ensure that a network structure is simple and meets the use requirement of terminal equipment, and the recognition rate of the target detection model is excellent so as to ensure the detection precision of the model.
FIG. 1 is a flow chart of the steps of a knowledge-based distillation target detection model training method according to an alternative embodiment of the invention. As shown in fig. 1, the target detection model training method includes: step S1, training a generation target detection teacher model using a training sample image set, each training sample image in the training sample image set having: a first label: a hard tag probability matrix of the pixel position of the central point of the target detection frame; a second label: width and height of the target detection frame; a third label: the pixel position offset of the center point of the target detection frame; the predicted output results of the target detection teacher model corresponding to the three types of labels include: the pixel position probability thermodynamic diagram of the center point of the target detection frame, the width and the height of the target detection frame and the pixel position offset of the center point of the target detection frame; and step S2, after the loss function of the target detection student model is improved through the target detection teacher model in a knowledge distillation mode, the training sample image set and the prediction output result are used for training to generate the target detection student model.
The label classification is carried out on the training sample images of the training sample image set, the target detection tasks of the trained target detection teacher model can be clearly distinguished according to the classified labels, specifically, the thermodynamic diagram for obtaining the pixel position probability of the center point of the target detection frame belongs to the classification tasks in the prediction output result of the target detection teacher model, and the regression tasks are obtained for obtaining the width and the height of the target detection frame and obtaining the pixel position offset of the center point of the target detection frame. Therefore, in the process of guiding the training of the target detection student model by using the target detection teacher model, the loss function of the target detection student model can be pertinently classified, improved and optimized according to the task type of the target detection task, so that the network structure of the target detection student model obtained by training is ensured to be simple enough to meet the use requirement of terminal equipment by relying on knowledge distillation, the target detection student model can be better ensured to migrate and acquire the knowledge of the target detection teacher model, the performance of the target detection teacher model is inherited, the target detection student model has excellent identification effect and detection precision, and the target detection student model has good practicability.
Optionally, the thermodynamic diagram for obtaining the pixel position probability of the center point of the target detection frame of the target detection task belongs to a binary task.
It should be explained that before training a target detection teacher model or a target detection student model by using training sample images in a training sample image set, three types of labels need to be labeled on all training sample images, and as shown in fig. 2, for example, only one target pedestrian a exists in a training sample image, and the head B of the target pedestrian is selected by using a target detection frame C in a manual labeling manner.
And then labeling the training sample image by using a preset program, wherein a labeled first label is a hard label probability matrix (shown in figure 3) of the pixel position of the center point of the target detection frame, the numerical probability values of the hard label probability matrix of the pixel position of the center point of the target detection frame correspond to the probability values of the pixel points of the training sample image as the center point of the target detection frame one by one, the numerical probability values are 0 or 1, the numerical coordinate point with the numerical probability value of 1 is the geometric center point of the frame C of the target detection frame, and the other numerical probability values are 0. Of course, when there are a plurality of target pedestrians in the training sample image, the number of the digit coordinate points with the digit probability value of 1 is also a corresponding plurality. In order to ensure that the target detection teacher model and the target detection student model can better learn the characteristic information of the first label in the training sample image so as to improve the detection precision of the models, the hard label probability matrix of the pixel position of the center point of the target detection frame needs to be converted to obtain the soft label probability matrix of the pixel position of the center point of the target detection frame; this is because, although there is only one center point of each target detection frame in the training sample image, the pixel points around the center point still represent the head characteristics of the target pedestrian and should be truly different from the pixel points outside the head, and therefore, the target detection teacher model and the target detection student model can learn more realistic characteristic information in the training sample image by using the target detection frame center point pixel position soft label probability matrix. FIG. 4 is a soft label probability matrix of the pixel position of the center point of the target detection frame obtained after the hard label probability matrix of the pixel position of the center point of the target detection frame in FIG. 3 is transformed; in the figure, the bit probability values of the bit coordinate points adjacent to the bit coordinate point having the bit probability value of 1 are closer to 1 (not shown), and the bit probability values of the bit coordinate points adjacent to the bit coordinate point having the bit probability value of 1 are closer to 0.
In this example, the transformation method of both is: target detection frameThe central point pixel position hard label probability matrix is transformed through a Gaussian kernel function coordinate to obtain a target detection frame central point pixel position soft label probability matrix; probability value of digital coordinate point (x, y) of target detection frame central point pixel position soft label probability matrix
Figure 219283DEST_PATH_IMAGE007
Is the result value G of the Gaussian kernel function; the Gaussian kernel function is:
Figure 551039DEST_PATH_IMAGE012
…………………………(5)
wherein m and n are respectively an abscissa and an ordinate of a digital coordinate point with a probability value of 1 in the hard tag probability matrix of the pixel position of the center point of the target detection frame; the mth column and the nth row of the hard tag probability matrix of the pixel position of the central point of the target detection frame; x and y are respectively the abscissa and the ordinate of any one digital coordinate point in the soft label probability matrix of the pixel position of the central point of the target detection frame; namely, the xth column and the yth row of the soft label probability matrix of the pixel position of the central point of the target detection frame;
Figure 696718DEST_PATH_IMAGE013
is a scale constant corresponding to the target detection box. Optionally, a scale constant of the target detection box
Figure 289374DEST_PATH_IMAGE013
Is in the range of 10 pixels to 80 pixels.
Of course, when there are a plurality of digital coordinate points with a probability value of 1 in the target detection box center point pixel position hard label probability matrix, that is, when there are a plurality of target detection boxes C in fig. 2, the probability value of each digital coordinate point (x, y) in the target detection box center point pixel position soft label probability matrix
Figure 245828DEST_PATH_IMAGE007
The largest of the multiple gaussian kernel result values G is taken.
The second label labeled on the training sample image is the width and height of the target detection frame (not shown), and the third label labeled on the training sample image is the pixel position offset of the center point of the target detection frame (not shown).
In this embodiment, the Loss function Loss of the object detection student modeltotalIs defined as:
Figure 189514DEST_PATH_IMAGE001
… … … … … … … … … … (1), wherein,
Losshma loss function part corresponding to the target detection frame center point pixel position probability thermodynamic diagram output by the target detection student model prediction; losswhA loss function part corresponding to the width and height of a target detection frame output for the target detection student model prediction; lossregA loss function part corresponding to the pixel position offset of the center point of the target detection frame output by the target detection student model prediction; lambda [ alpha ]whWeighting proportion coefficients of loss function parts corresponding to the width and the height of the target detection frame; lambda [ alpha ]regAnd the weight proportion coefficient is the loss function part of the pixel position offset of the central point of the target detection frame.
Optionally, the weight scale factor λ of the loss function part corresponding to the width and height of the target detection framewhAnd weight proportion coefficient lambda of loss function part of central point pixel position offset of target detection frameregThe value ranges of the target detection frames are [0.5,1 ], which shows that the probability thermodynamic diagram of the pixel position of the center point of the target detection frame output by the target detection student model in prediction occupies the largest weight and is the most key factor influencing the later detection precision of the target detection student model.
Optionally, the weight scale factor λ of the loss function part corresponding to the width and height of the target detection framewhWeight proportion coefficient lambda of loss function part larger than pixel position offset of central point of target detection framereg. This is because the target detection student model post-detection accuracy is more heavily influenced by the width and height of the target detection frame than the target detection frame center point pixel position offset amount.
In particular, Loss function Loss of the object detection student modeltotalThe first part of the grading is a Loss function part Loss corresponding to a target detection frame center point pixel position probability thermodynamic diagram output by target detection student model predictionhmThe Loss function of the classification task is optimized and improved through knowledge distillation, and the corresponding Loss function part Loss ishmIs defined as:
Figure 640349DEST_PATH_IMAGE002
…………………………(2),
wherein the content of the first and second substances,
Figure 494035DEST_PATH_IMAGE003
converting a hard label probability matrix of a central point pixel position of a target detection frame corresponding to a first label to obtain a sub-loss function guided by a soft label probability matrix of the central point pixel position of the target detection frame;
Figure 214867DEST_PATH_IMAGE004
a sub-loss function guided by a target detection teacher model and a soft label probability matrix of the pixel position of the center point of a target detection frame corresponding to the first label together; lambda [ alpha ]hmAnd the weight proportion coefficient of the sub-loss function is guided by the target detection teacher model and the soft label probability matrix of the target detection frame central point pixel position corresponding to the first label.
Optionally, a sub-loss function
Figure 36061DEST_PATH_IMAGE004
Is a weight scale factor ofhmThe value range of (1) is [0.5 ], so that the value range of the target detection frame is ensured not to exceed a sub-loss function guided by a soft label probability matrix of the pixel position of the central point of the target detection frame
Figure 352904DEST_PATH_IMAGE003
The weight of (c).
It should be noted that, in this embodiment, the target detection teacher model does not show the target detection frame center point pixel position probability thermodynamic diagrams of the target detection student model prediction output and the target detection teacher model prediction output, but the ideal training state of the model is to expect that the target detection frame center point pixel position probability matrixes corresponding to the target detection frame center point pixel position probability thermodynamic diagrams of both prediction outputs learn the soft label probability matrix close to the target detection frame center point pixel position in fig. 4, thereby ensuring that both the target detection teacher model and the target detection student model have good detection accuracy.
Sub-loss function guided by soft label probability matrix of pixel position of center point of target detection frame
Figure 857835DEST_PATH_IMAGE003
And the method is used for evaluating the difference between a target detection frame central point pixel position probability matrix corresponding to the target detection frame central point pixel position probability thermodynamic diagram output by the target detection student model prediction and a target detection frame central point pixel position soft label probability matrix.
In the present embodiment, the first and second electrodes are,
Figure 563334DEST_PATH_IMAGE003
is focalloss loss function, which is mainly used for balancing the problems of unbalance between positive and negative samples and difficult sample occurrence in the detection task, and sub-loss function
Figure 153715DEST_PATH_IMAGE003
Is defined as:
Figure 54675DEST_PATH_IMAGE005
…………………(3)
Figure 368107DEST_PATH_IMAGE004
a loss function based on knowledge distillation for evaluating a difference in distribution between a predicted output of the object detection student model and a predicted output of the object detection teacher model, compared to a soft scale of a pixel position of a center point of the object detection frameSub-loss function guided by probability matrix of signatures
Figure 837265DEST_PATH_IMAGE003
Function of sub-loss
Figure 977260DEST_PATH_IMAGE004
Increase is provided with
Figure 72124DEST_PATH_IMAGE040
And
Figure 754909DEST_PATH_IMAGE041
output distribution, sub-loss function after network structure of teacher model is detected for guiding network structure of student model to be detected
Figure 87977DEST_PATH_IMAGE004
Is defined as:
Figure 839901DEST_PATH_IMAGE006
………(4)
n is the number of pixel points in a target detection frame center point pixel position probability thermodynamic diagram output by target detection student model prediction;
Figure 974342DEST_PATH_IMAGE007
the probability value of a digital coordinate point (x, y) in the pixel position soft label probability matrix of the central point of the target detection frame is obtained after coordinate transformation is carried out on the pixel position hard label probability matrix of the central point of the target detection frame;
Figure 511633DEST_PATH_IMAGE008
predicting the probability value of a pixel point (x, y) in the pixel position probability thermodynamic diagram of the center point of the target detection frame output by the target detection teacher model;
Figure 729119DEST_PATH_IMAGE009
target detection frame center point pixel location for target detection student model prediction outputProbability values of the pixel points (x, y) in the probability thermodynamic diagram;
Figure 250230DEST_PATH_IMAGE010
and
Figure 421318DEST_PATH_IMAGE011
are all exponential constants.
In the above formula (3) and formula (4),
Figure 78695DEST_PATH_IMAGE042
and
Figure 14552DEST_PATH_IMAGE043
the weight coefficients of the difficult samples are increased, and the larger the deviation of the prediction output of the target detection student model is, the larger the two weight coefficients are.
Figure 882014DEST_PATH_IMAGE044
Is a weighting factor used to adjust the fraction of negative samples lost, the more negative samples deviate from the target, the greater the weighting factor. Alternatively,
Figure 856792DEST_PATH_IMAGE010
and
Figure 634255DEST_PATH_IMAGE011
has a value range of [2,4 ]]。
Loss function Loss of target detection student modeltotalThe second part of the hierarchy is a Loss function part Loss corresponding to the width and height of an object detection box output by the object detection student model predictionwhThe Loss function of the part of the regression task is optimized and improved through knowledge distillation, and the corresponding Loss function part Loss iswhThe combined L1 and L2 loss functions are defined as:
Figure 85222DEST_PATH_IMAGE045
……………………………………………(6),
wherein the content of the first and second substances,
Figure 174400DEST_PATH_IMAGE015
sub-loss functions for width and height guidance of a target detection frame corresponding to the second label;
Figure 454334DEST_PATH_IMAGE016
a sub-loss function which is used for jointly guiding the width and the height of the target detection frame corresponding to the target detection teacher model and the second label;
Figure 555145DEST_PATH_IMAGE017
and the weighting proportion coefficient of the sub-loss function is guided by the width and the height of the target detection frame corresponding to the target detection teacher model and the second label.
Optionally, a sub-loss function
Figure 331340DEST_PATH_IMAGE016
Weight scaling factor of
Figure 173395DEST_PATH_IMAGE017
Is in a value range of [0.5, 1), so as to ensure that the value does not exceed the wide and high sub-loss functions of the target detection frame corresponding to the second label
Figure 3159DEST_PATH_IMAGE015
The weight of (c).
Further, a sub-loss function
Figure 958476DEST_PATH_IMAGE015
As part of the L1 loss function, its calculation formula is defined as:
Figure 843256DEST_PATH_IMAGE046
……………………………………………………(7)
function of sub-loss
Figure 31661DEST_PATH_IMAGE016
As a loss function of L2Part of the number, the calculation formula of which is defined as:
Figure 902665DEST_PATH_IMAGE019
……………(8)
k is the number of the width and the height of the target detection frame corresponding to the second label in the training sample image; k refers to any one second label in the training sample image;
Figure 728800DEST_PATH_IMAGE020
the product of the width and the height of a target detection frame corresponding to a second label in the training sample image is obtained;
Figure 784481DEST_PATH_IMAGE021
predicting the product of the width and the height of an output target detection frame for the target detection student model;
Figure 210914DEST_PATH_IMAGE022
predicting the product of the width and the height of the output target detection box for the target detection teacher model;
Figure 869298DEST_PATH_IMAGE023
is composed of
Figure 923841DEST_PATH_IMAGE020
And
Figure 25790DEST_PATH_IMAGE021
the L1 distance therebetween;
Figure 424672DEST_PATH_IMAGE024
is composed of
Figure 637479DEST_PATH_IMAGE020
And
Figure 812108DEST_PATH_IMAGE021
the L2 distance therebetween;
Figure 68646DEST_PATH_IMAGE025
is composed of
Figure 469672DEST_PATH_IMAGE022
And
Figure 79644DEST_PATH_IMAGE021
the L2 distance therebetween;
Figure 458DEST_PATH_IMAGE026
is a first spacing constant.
By determining the difference between the predicted output of the target detection student model and the second label of the originally input training sample image
Figure 37684DEST_PATH_IMAGE047
Greater than the difference between the predicted output of the target detection student model and the predicted output of the target detection teacher model
Figure 926006DEST_PATH_IMAGE048
And exceeds the first spacing constant
Figure 729883DEST_PATH_IMAGE026
Then the loss of L2 that would add a second label to the object detection student model.
Optionally, a first spacing constant
Figure 754471DEST_PATH_IMAGE026
Has a value range of [10,20 ]]。
Loss function Loss of target detection student modeltotalThe third part of the hierarchy is a Loss function part Loss corresponding to the pixel position offset of the center point of the target detection frame output by the prediction of the target detection student modelregThe Loss function of the part of the regression task is optimized and improved through knowledge distillation, and the corresponding Loss function part Loss isregThe combined L1 and L2 loss functions are defined as:
Figure 493756DEST_PATH_IMAGE027
……………………………………………(9),
wherein the content of the first and second substances,
Figure 354527DEST_PATH_IMAGE049
a sub-loss function guided by the pixel position offset of the center point of the target detection frame corresponding to the third label;
Figure 447248DEST_PATH_IMAGE029
a sub-loss function which is guided by the target detection teacher model and the target detection frame central point pixel position offset corresponding to the third label together;
Figure 450976DEST_PATH_IMAGE030
and the weight proportion coefficient of the sub-loss function is guided by the target detection teacher model and the target detection frame center point pixel position offset corresponding to the third label.
Optionally, a sub-loss function
Figure 220218DEST_PATH_IMAGE029
Weight scaling factor of
Figure 817553DEST_PATH_IMAGE030
Is [0.5, 1), ensures that the sub-loss function does not exceed the sub-loss function guided by the pixel position offset of the center point of the target detection frame corresponding to the third label
Figure 573019DEST_PATH_IMAGE028
The weight of (c). It should be noted that the offset of the pixel position of the center point of the target detection frame is the difference between the pixel coordinate position of the center point of the target detection frame output by the target detection student model in prediction and the actual position in the training sample image.
Further, a sub-loss function
Figure 322931DEST_PATH_IMAGE028
As part of the L1 loss function, its calculation formula is defined as:
Figure 13807DEST_PATH_IMAGE050
…………………………………………………(10)
function of sub-loss
Figure 223071DEST_PATH_IMAGE029
As part of the L2 loss function, its calculation formula is defined as:
Figure 641283DEST_PATH_IMAGE051
……………(11)
z is the number of the pixel position offsets of the center point of the target detection frame corresponding to the third label in the training sample image; z refers to any one third label in the training sample image;
Figure 494970DEST_PATH_IMAGE033
multiplying the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame corresponding to the third label in the training sample image;
Figure 746960DEST_PATH_IMAGE034
predicting the product of the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame output by the target detection student model;
Figure 804040DEST_PATH_IMAGE035
the product of the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame output by the target detection teacher model in a prediction mode;
Figure 42254DEST_PATH_IMAGE052
is composed of
Figure 875081DEST_PATH_IMAGE033
And
Figure 157026DEST_PATH_IMAGE034
the L1 distance therebetween;
Figure 216249DEST_PATH_IMAGE037
is composed of
Figure 117209DEST_PATH_IMAGE033
And
Figure 430641DEST_PATH_IMAGE034
the L2 distance therebetween;
Figure 758854DEST_PATH_IMAGE053
is composed of
Figure 39794DEST_PATH_IMAGE035
And
Figure 869079DEST_PATH_IMAGE034
the L2 distance therebetween;
Figure 410918DEST_PATH_IMAGE039
is a second spacing constant.
Predicting a gap between an output and a third label of an original input training sample image by judging a target detection student model
Figure 785399DEST_PATH_IMAGE054
Greater than the difference between the predicted output of the target detection student model and the predicted output of the target detection teacher model
Figure 678269DEST_PATH_IMAGE055
And exceeds the second spacing constant
Figure 269832DEST_PATH_IMAGE039
Then the loss of L2 that would add a third label to the object detection student model.
Optionally, a second spacing constant
Figure 72703DEST_PATH_IMAGE056
Has a value range of [0.01,0.05 ]]。
It should be noted that, the network structure of the target detection teacher model and the network structure of the target detection student model both adopt hourglass network structures, and the difference is that the network depth and width of the network structure of the target detection teacher model are both greater than those of the network structure of the target detection student model, and the number of parameters of the network structure of the target detection teacher model is 5-10 times that of the network structure of the target detection student model. The recall rate and the detection precision of the target detection student model trained by the knowledge distillation-based target detection model training method provided by the invention are superior to those of the target detection student model trained by a general knowledge distillation training mode.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing one or more computer devices (which may be personal computers, servers, network devices, etc.) to execute all or part of the steps of the method according to the embodiments of the present invention.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (8)

1. A knowledge distillation-based target detection model training method is characterized by comprising the following steps:
step S1, training a generation target detection teacher model using a training sample image set, each training sample image in the training sample image set having: a first label: a hard tag probability matrix of the pixel position of the central point of the target detection frame; a second label: width and height of the target detection frame; a third label: the pixel position offset of the center point of the target detection frame; the prediction output results of the target detection teacher model corresponding to the three types of labels include: the pixel position probability thermodynamic diagram of the center point of the target detection frame, the width and the height of the target detection frame and the pixel position offset of the center point of the target detection frame;
step S2, after a loss function of the target detection student model is improved through the target detection teacher model in a knowledge distillation mode, the training sample image set and the prediction output result are used for training to generate a target detection student model;
loss function Loss of the target detection student modeltotalIs defined as:
Losstotal=LosshmwhLosswhregLossrega.9... (1), wherein,
Losshma loss function part corresponding to the target detection frame center point pixel position probability thermodynamic diagram output by the target detection student model prediction;
Losswha loss function part corresponding to the width and height of the target detection frame output by the target detection student model prediction;
Lossrega loss function part corresponding to the pixel position offset of the center point of the target detection frame output by the target detection student model prediction;
λwhweighting proportion coefficients of loss function parts corresponding to the width and the height of the target detection frame;
λregthe weight proportion coefficient is a loss function part of the pixel position offset of the central point of the target detection frame;
loss function part Loss corresponding to target detection frame center point pixel position probability thermodynamic diagram output by target detection student model predictionhmIs defined as:
Figure FDA0003363315850000011
wherein the content of the first and second substances,
Figure FDA0003363315850000012
converting the hard tag probability matrix of the target detection frame central point pixel position corresponding to the first tag to obtain the target detection frame central point pixel positionA soft label probability matrix guided sub-loss function;
Figure FDA0003363315850000013
a sub-loss function guided by a target detection teacher model and a soft label probability matrix of the pixel position of the center point of a target detection frame corresponding to the first label together;
λhmand the weight proportion coefficient of the sub-loss function is guided by the target detection teacher model and the soft label probability matrix of the target detection frame central point pixel position corresponding to the first label.
2. The knowledge-based distillation target detection model training method according to claim 1,
Figure FDA0003363315850000021
is focalloss loss function, said sub-loss function
Figure FDA0003363315850000022
Is defined as:
Figure FDA0003363315850000023
Figure FDA0003363315850000024
the sub-loss functions are loss functions based on knowledge distillation
Figure FDA0003363315850000025
Is defined as:
Figure FDA0003363315850000026
n is the number of pixel points in a target detection frame center point pixel position probability thermodynamic diagram output by the target detection student model prediction;
Hxythe probability value of a digital coordinate point (x, y) in the target detection frame central point pixel position soft label probability matrix is obtained after coordinate transformation is carried out on the target detection frame central point pixel position hard label probability matrix;
Figure FDA0003363315850000027
predicting the probability value of a pixel point (x, y) in the pixel position probability thermodynamic diagram of the center point of the target detection frame output by the target detection teacher model;
Figure FDA0003363315850000028
predicting the probability value of a pixel point (x, y) in a target detection frame center point pixel position probability thermodynamic diagram output by a target detection student model;
both α and β are exponential constants.
3. The knowledge distillation-based target detection model training method as claimed in claim 1, wherein the hard label probability matrix of the pixel position of the center point of the target detection frame is transformed by a gaussian kernel function coordinate to obtain a soft label probability matrix of the pixel position of the center point of the target detection frame; probability value H of digital coordinate point (x, y) of soft label probability matrix of pixel position of center point of target detection framexyIs the result value G of the Gaussian kernel function; the Gaussian kernel function is:
Figure FDA0003363315850000029
wherein m and n are respectively an abscissa and an ordinate of a digital coordinate point with a probability value of 1 in the hard tag probability matrix of the pixel position of the center point of the target detection frame;
x and y are respectively the abscissa and the ordinate of any one digital coordinate point in the soft label probability matrix of the pixel position of the central point of the target detection frame;
σpis a scale constant corresponding to the target detection box.
4. The knowledge-based distillation target detection model training method according to claim 3,
when a plurality of digital coordinate points with the probability value of 1 in the target detection frame central point pixel position hard label probability matrix are provided, the probability value H of each digital coordinate point (x, y) in the target detection frame central point pixel position soft label probability matrixxyThe largest of the multiple gaussian kernel result values G is taken.
5. The knowledge distillation-based target detection model training method as claimed in claim 1, wherein the target detection student model predicts the Loss function part Loss corresponding to the width and height of the output target detection boxwhIs as follows;
Figure FDA0003363315850000031
wherein the content of the first and second substances,
Figure FDA0003363315850000032
sub-loss functions for width and height guidance of a target detection frame corresponding to the second label;
Figure FDA0003363315850000033
a sub-loss function which is used for jointly guiding the width and the height of the target detection frame corresponding to the target detection teacher model and the second label;
λrand the weighting proportion coefficient of the sub-loss function is guided by the width and the height of the target detection frame corresponding to the target detection teacher model and the second label.
6. The knowledge-based distillation target detection model training method according to claim 5,
said sub-loss function
Figure FDA0003363315850000034
Is defined as:
Figure FDA0003363315850000035
said sub-loss function
Figure FDA0003363315850000036
Is defined as:
Figure FDA0003363315850000037
k is the number of the width and the height of a target detection frame corresponding to a second label in the training sample image;
k refers to any one of the second labels in the training sample image;
Skthe product of the width and the height of a target detection frame corresponding to a second label in the training sample image is obtained;
Figure FDA0003363315850000038
predicting a product of a width and a height of an output object detection box for the object detection student model;
Figure FDA0003363315850000039
predicting a product of a width and a height of an output target detection box for the target detection teacher model;
Figure FDA00033633158500000310
is SkAnd
Figure FDA00033633158500000311
the L1 distance therebetween;
Figure FDA00033633158500000312
is SkAnd
Figure FDA00033633158500000313
the L2 distance therebetween;
Figure FDA00033633158500000314
is composed of
Figure FDA00033633158500000315
And
Figure FDA00033633158500000316
the L2 distance therebetween;
η is a first spacing constant.
7. The knowledge distillation-based target detection model training method as claimed in claim 1, wherein the target detection student model predicts the Loss function part Loss corresponding to the target detection frame center point pixel position offset outputregComprises the following steps:
Figure FDA0003363315850000041
wherein the content of the first and second substances,
Figure FDA0003363315850000042
a sub-loss function guided by the pixel position offset of the center point of the target detection frame corresponding to the third label;
Figure FDA0003363315850000043
a sub-loss function which is guided by the target detection teacher model and the target detection frame central point pixel position offset corresponding to the third label together;
λqand the weight proportion coefficient of the sub-loss function is guided by the target detection teacher model and the target detection frame center point pixel position offset corresponding to the third label.
8. The knowledge-based distillation target detection model training method according to claim 7,
said sub-loss function
Figure FDA0003363315850000044
Is defined as:
Figure FDA0003363315850000045
said sub-loss function
Figure FDA0003363315850000046
Is defined as:
Figure FDA0003363315850000047
z is the number of the pixel position offsets of the center point of the target detection frame corresponding to the third label in the training sample image;
z refers to any one of the third labels in the training sample image;
Tzmultiplying the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame corresponding to the third label in the training sample image;
Figure FDA0003363315850000048
predicting the product of the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame output by the target detection student model;
Figure FDA0003363315850000049
the product of the horizontal axis offset and the vertical axis offset of the pixel position offset of the center point of the target detection frame output by the target detection teacher model in a prediction mode;
Figure FDA00033633158500000410
is TzAnd
Figure FDA00033633158500000411
the L1 distance therebetween;
Figure FDA00033633158500000412
is TzAnd
Figure FDA00033633158500000413
the L2 distance therebetween;
Figure FDA0003363315850000051
is composed of
Figure FDA0003363315850000052
And
Figure FDA0003363315850000053
the L2 distance therebetween;
and omega is a second spacing constant.
CN202111179182.XA 2021-10-11 2021-10-11 Knowledge distillation-based target detection model training method Active CN113610069B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111179182.XA CN113610069B (en) 2021-10-11 2021-10-11 Knowledge distillation-based target detection model training method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111179182.XA CN113610069B (en) 2021-10-11 2021-10-11 Knowledge distillation-based target detection model training method

Publications (2)

Publication Number Publication Date
CN113610069A CN113610069A (en) 2021-11-05
CN113610069B true CN113610069B (en) 2022-02-08

Family

ID=78343524

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111179182.XA Active CN113610069B (en) 2021-10-11 2021-10-11 Knowledge distillation-based target detection model training method

Country Status (1)

Country Link
CN (1) CN113610069B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114119959A (en) * 2021-11-09 2022-03-01 盛视科技股份有限公司 Vision-based garbage can overflow detection method and device
CN115512131B (en) * 2022-10-11 2024-02-13 北京百度网讯科技有限公司 Image detection method and training method of image detection model
CN115496666A (en) * 2022-11-02 2022-12-20 清智汽车科技(苏州)有限公司 Heatmap generation method and apparatus for target detection
CN115984640B (en) * 2022-11-28 2023-06-23 北京数美时代科技有限公司 Target detection method, system and storage medium based on combined distillation technology
CN118154992A (en) * 2024-05-09 2024-06-07 中国科学技术大学 Medical image classification method, device and storage medium based on knowledge distillation

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021189912A1 (en) * 2020-09-25 2021-09-30 平安科技(深圳)有限公司 Method and apparatus for detecting target object in image, and electronic device and storage medium

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180268292A1 (en) * 2017-03-17 2018-09-20 Nec Laboratories America, Inc. Learning efficient object detection models with knowledge distillation
CN110674688B (en) * 2019-08-19 2023-10-31 深圳力维智联技术有限公司 Face recognition model acquisition method, system and medium for video monitoring scene
CN110991556B (en) * 2019-12-16 2023-08-15 浙江大学 Efficient image classification method, device, equipment and medium based on multi-student cooperative distillation
CN112418268B (en) * 2020-10-22 2024-07-12 北京迈格威科技有限公司 Target detection method and device and electronic equipment
CN112367273B (en) * 2020-10-30 2023-10-31 上海瀚讯信息技术股份有限公司 Flow classification method and device of deep neural network model based on knowledge distillation
CN112508169A (en) * 2020-11-13 2021-03-16 华为技术有限公司 Knowledge distillation method and system
CN112257815A (en) * 2020-12-03 2021-01-22 北京沃东天骏信息技术有限公司 Model generation method, target detection method, device, electronic device, and medium
CN112990198B (en) * 2021-03-22 2023-04-07 华南理工大学 Detection and identification method and system for water meter reading and storage medium
CN113011356A (en) * 2021-03-26 2021-06-22 杭州朗和科技有限公司 Face feature detection method, device, medium and electronic equipment
CN113139500B (en) * 2021-05-10 2023-10-20 重庆中科云从科技有限公司 Smoke detection method, system, medium and equipment
CN113361384A (en) * 2021-06-03 2021-09-07 深圳前海微众银行股份有限公司 Face recognition model compression method, device, medium, and computer program product
CN113326852A (en) * 2021-06-11 2021-08-31 北京百度网讯科技有限公司 Model training method, device, equipment, storage medium and program product

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021189912A1 (en) * 2020-09-25 2021-09-30 平安科技(深圳)有限公司 Method and apparatus for detecting target object in image, and electronic device and storage medium

Also Published As

Publication number Publication date
CN113610069A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN113610069B (en) Knowledge distillation-based target detection model training method
CN109086811B (en) Multi-label image classification method and device and electronic equipment
CN114241282A (en) Knowledge distillation-based edge equipment scene identification method and device
CN109978893A (en) Training method, device, equipment and the storage medium of image, semantic segmentation network
CN110490136B (en) Knowledge distillation-based human behavior prediction method
CN108133172A (en) Method, the analysis method of vehicle flowrate and the device that Moving Objects are classified in video
CN103942749B (en) A kind of based on revising cluster hypothesis and the EO-1 hyperion terrain classification method of semi-supervised very fast learning machine
CN112180471B (en) Weather forecasting method, device, equipment and storage medium
CN111368634B (en) Human head detection method, system and storage medium based on neural network
CN110210625A (en) Modeling method, device, computer equipment and storage medium based on transfer learning
CN110175657B (en) Image multi-label marking method, device, equipment and readable storage medium
CN113128478A (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN110969200A (en) Image target detection model training method and device based on consistency negative sample
CN112541639A (en) Recommendation system scoring prediction method based on graph neural network and attention mechanism
CN110781970A (en) Method, device and equipment for generating classifier and storage medium
CN110457523A (en) The choosing method of cover picture, the training method of model, device and medium
CN114782752B (en) Small sample image integrated classification method and device based on self-training
CN110263808B (en) Image emotion classification method based on LSTM network and attention mechanism
CN115439192A (en) Medical commodity information pushing method and device, storage medium and computer equipment
CN113065533B (en) Feature extraction model generation method and device, electronic equipment and storage medium
CN110399813A (en) A kind of age recognition methods, device, electronic equipment and storage medium
CN114332457A (en) Image instance segmentation model training method, image instance segmentation method and device
CN116151479B (en) Flight delay prediction method and prediction system
CN116894593A (en) Photovoltaic power generation power prediction method and device, electronic equipment and storage medium
CN110909645A (en) Crowd counting method based on semi-supervised manifold embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant