CN111639744B

CN111639744B - Training method and device for student model and electronic equipment

Info

Publication number: CN111639744B
Application number: CN202010297966.1A
Authority: CN
Inventors: 曾凡高; 张有才; 危夷晨
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-04-15
Filing date: 2020-04-15
Publication date: 2023-09-22
Anticipated expiration: 2040-04-15
Also published as: CN111639744A

Abstract

The invention provides a training method and device for a student model and electronic equipment, which relate to the field of artificial intelligence, wherein the student model learns to a trained teacher model in a knowledge distillation mode, and the student model and the teacher model are both object detection models, and the method comprises the following steps: acquiring a candidate sample area of a training sample; extracting features of candidate sample areas of the training sample through the student model and the teacher model respectively to obtain first features extracted by the student model and second features extracted by the teacher model; acquiring the confidence coefficient of the first feature; determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and the confidence level of the first feature; parameters of the student model are updated based on the distillation loss. According to the invention, the student model can update parameters to different degrees according to different samples, so that the trained student model has more excellent performance, and the object detection effect is improved.

Description

Training method and device for student model and electronic equipment

Technical Field

The invention relates to the field of artificial intelligence, in particular to a training method and device for a student model and electronic equipment.

Background

Knowledge distillation is a common model compression method, and in a teacher-student framework, the characteristic representation 'knowledge' learned by a teacher model with high learning ability is distilled out and transmitted to a student network with small parameters and weak learning ability. Since the number of samples is usually large in knowledge distillation of object detection, but the quality of the samples is poor, such as the samples may include dirty samples or too difficult samples, if one taste requires that the student model be imitated on all the samples, the performance of the student model is seriously affected, and the distillation effect of the student model in the training process is poor, so that the detection effect of the student model in the object detection is poor.

Disclosure of Invention

Accordingly, the invention aims to provide a training method, a training device and electronic equipment for a student model, which enable the student model to update parameters to different degrees according to different samples, and enable the trained student model to have more excellent performance, so that the object detection effect is improved.

In order to achieve the above object, the technical scheme adopted by the embodiment of the invention is as follows:

in a first aspect, an embodiment of the present invention provides a training method for a student model, where the student model learns to a trained teacher model in a knowledge distillation manner, and the student model and the teacher model are both object detection models, where the method includes: acquiring a candidate sample area of a training sample; extracting features of candidate sample areas of the training sample through the student model and the teacher model respectively to obtain first features extracted by the student model and second features extracted by the teacher model; acquiring the confidence coefficient of the first feature; determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and the confidence level of the first feature; parameters of the student model are updated based on the distillation loss.

Further, the step of obtaining the confidence level of the first feature includes: inputting the first features into a variance generating network to obtain variances of the first features output by the variance generating network, and representing the confidence of the first features through the variances; wherein the variance generating network comprises a convolution layer and/or a full connection layer, and the variance is inversely related to the confidence.

Further, the step of determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and the confidence level of the first feature, comprises: determining the distillation loss between the student model and the teacher model according to the following formula:

wherein d is the feature dimension, and N is the number of samples;is a first feature; />Is a second feature;/>is the variance.

Further, the step of updating parameters of the student model based on the distillation loss includes: acquiring task loss of a student model for executing an object detection task; and updating parameters of the student model according to the task loss and the distillation loss.

Further, the step of obtaining a candidate sample region of the training sample includes: and inputting the training sample into a candidate region extraction network to obtain a candidate sample region.

Further, the step of obtaining a candidate sample region of the training sample includes: and determining candidate sample areas of the training samples according to the labeling information carrying the truth boxes.

Further, the method also comprises the following steps: inputting the image to be detected into a trained student model, and carrying out object detection on the image to be detected based on the trained student model to obtain an object detection result.

In a second aspect, an embodiment of the present invention further provides a training device for a student model, where the student model learns to a trained teacher model in a knowledge distillation manner, and the student model and the teacher model are both object detection models, where the device includes: the acquisition module is used for acquiring candidate sample areas of the training samples; the feature extraction module is used for extracting features of candidate sample areas of the training sample through the student model and the teacher model respectively to obtain first features extracted by the student model and second features extracted by the teacher model; the confidence coefficient acquisition module is used for acquiring the confidence coefficient of the first feature; a distillation loss determination module for determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and the confidence level of the first feature; and the parameter updating module is used for updating the parameters of the student model based on distillation loss.

In a third aspect, an embodiment of the present invention provides an electronic device, including: a processor and a storage device; the storage means has stored thereon a computer program which, when executed by a processor, performs a method according to any of the above embodiments.

In a fourth aspect, embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, which when executed by a processor performs the steps of the method according to any of the above embodiments.

The embodiment of the invention provides a training method, a training device and electronic equipment for a student model, wherein the student model learns from a trained teacher model in a knowledge distillation mode, and the student model and the teacher model are object detection models. According to the method, the influence of sample quality on the student model can be determined by acquiring the confidence coefficient of the first feature extracted by the student model aiming at the training sample before training, and the determination of the distillation loss is related to the confidence coefficient of the feature extracted by the student model aiming at the training sample, so that the student model is trained based on the distillation loss, the confidence coefficient of the feature extracted by the student model is different, the distillation loss is different, the corresponding model parameter updating effect is also different, namely, the knowledge migration intensity aiming at different training samples in the training process of the student model is different, the student model can carry out self-adaptive knowledge migration aiming at different samples, and the method enables the student model to carry out parameter updating of different degrees aiming at different samples, so that the trained student model has more excellent performance, and the object detection effect is improved.

Additional features and advantages of embodiments of the invention will be set forth in the description which follows, or in part will be obvious from the description, or may be learned by practice of the embodiments of the invention.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 shows a block diagram of an electronic device according to an embodiment of the present invention;

FIG. 2 is a flowchart of a training method for a student model according to an embodiment of the present invention;

FIG. 3 is a flowchart of another method for training a student model according to an embodiment of the invention;

FIG. 4 is a schematic view showing a structure of a distillation frame according to an embodiment of the present invention;

fig. 5 shows a block diagram of a training device for a student model according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments.

Currently, in the scheme of applying knowledge distillation to object detection, the location of distillation (such as the location of a feature map, the area on the feature map, the probability of classification, etc.) is usually explored, but all distilled samples are at the same time. Considering that the sample quality of the object detection task has a large difference, the number of low-quality samples (such as blurred pictures and the like) is far more than that of other visual tasks (such as face recognition), and the performance of the student model is seriously affected by one student model which is required to be imitated on all the samples, so that the distillation method of the object detection is poor. In order to solve the problem, the embodiment of the invention provides a training method and device for a student model and electronic equipment, and the embodiment of the invention is described in detail below.

Embodiment one:

first, an example electronic device 100 of an electronic device, an apparatus, and a training method of a student model for implementing an embodiment of the present invention are described with reference to fig. 1.

As shown in fig. 1, an electronic device 100 includes one or more processors 102, one or more storage devices 104, an input device 106, an output device 108, and an image capture device 110, which are interconnected by a bus system 112 and/or other forms of connection mechanisms (not shown). It should be noted that the components and structures of the electronic device 100 shown in fig. 1 are exemplary only and not limiting, as the electronic device may have other components and structures as desired.

The processor 102 may be implemented in at least one hardware form of a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), the processor 102 may be one or a combination of several of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or other form of processing unit with data processing and/or instruction execution capabilities, and may control other components in the electronic device 100 to perform desired functions.

The storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by the processor 102 to implement client functions and/or other desired functions in embodiments of the present invention as described below. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer readable storage medium.

The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, mouse, microphone, touch screen, and the like.

The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.

The image capture device 110 may capture images (e.g., photographs, videos, etc.) desired by the user and store the captured images in the storage device 104 for use by other components.

For example, the example electronic device for implementing the training method, apparatus and electronic device for a student model according to the embodiments of the present invention may be implemented as a smart terminal such as a robot, a smart phone, a tablet computer, a computer, or the like.

Embodiment two:

the embodiment provides a training method for a student model, wherein the student model and a teacher model are both object detection models, and the object detection models can adopt a single-stage detector or a two-stage detector for executing an object detection task. In order to facilitate understanding of the training method of the student model in object detection distillation provided in this embodiment, referring to a flowchart of a training method of a student model shown in fig. 2, the student model learns to a trained teacher model by a knowledge distillation method, the method mainly includes steps S202 to S210 as follows:

step S202, a candidate sample area of a training sample is acquired.

In one embodiment, candidate sample regions for the training sample may be generated by a model, such as by outputting a number of candidate sample regions for the training sample by a detector, and in practice, the training sample may be input to a candidate region extraction network for the candidate sample regions. In another embodiment, the candidate sample area of the training sample may also be determined according to the labeling information carrying the truth box, where the truth box is a box for displaying the real position of the target image in the training sample, and the position of the feature to be extracted in the training sample may be determined according to the labeling information carrying the truth box, so as to determine the candidate sample area to be extracted in the training sample. The candidate sample area of each training sample can be one or a plurality of the candidate sample areas. In the teacher-student framework of knowledge distillation, candidate sample regions include features of the student model to be distilled, i.e., features of the student model to be learned with the teacher model.

Step S204, feature extraction is performed on candidate sample areas of the training samples through the student model and the teacher model respectively, and first features extracted by the student model and second features extracted by the teacher model are obtained.

The teacher model may be a complex model or a combined model, and the student model is mostly a relatively simple model compared to the teacher model, and the student model and the teacher model are both object detection models, may be single-stage (one-stage) detectors such as SSD (Single Shot MultiBox Detector, single-stage multi-frame detector), YOLO (You Only Look Once: unified, real-Time Object Detection, target detection system based on a single neural network), or the like, and may be two-stage (two-stage) detectors such as CNN (Convolutional Neural Network ), fast-RCNN (Fast Region-based Convolutional Neural Network, ultrafast neural network), or the like. And extracting the characteristics of the candidate sample areas of the training samples through the student models to obtain first characteristics extracted by the student models, and extracting the characteristics of the candidate sample areas of the training samples through the teacher models to obtain second characteristics extracted by the teacher models.

Step S206, obtaining the confidence of the first feature.

The confidence level of the first feature may be understood as the degree of confidence of the first feature, such as if the training sample is a dirty sample or too difficult sample, the degree of confidence of the first feature extracted by the student model for such training sample is not high, i.e. the confidence level is not high. In one embodiment, the confidence level is obtained by taking the variance corresponding to the first feature. Such as inputting the first feature extracted by the student model into a variance generating network, resulting in a variance of the first feature output by the variance generating network, which may characterize the confidence of the first feature, wherein the variance generating network may comprise convolution layers and/or fully connected layers, i.e. be implemented by two convolution layers, or two fully connected layers, or one convolution layer and one fully connected layer. The variance obtained by the variance generating network is inversely related to the confidence, i.e., the larger the variance, the smaller the confidence. It can be understood that: before distillation, the student model can determine the intensity of knowledge migration according to a variance result obtained by a variance generating network, so that knowledge distillation can be selectively performed, for example, training can be added to a high-quality sample with smaller variance (namely, larger confidence coefficient), and the performance of the model is improved; because the variance of the dirty sample or the too-difficult sample is larger (the confidence is smaller), training can be correspondingly reduced on the sample, and the adverse effect of the sample on the student model is reduced. In practice, training may be increased or decreased by setting the distillation loss function.

Step S208, determining distillation loss between the student model and the teacher model according to the first feature, the second feature and the confidence of the first feature.

In one embodiment, the distillation loss between the student model and the teacher model may be determined as follows:

wherein d is the feature dimension, and N is the number of samples;is a first feature; />Is a second feature; />Is the variance. The dimensions of the first feature extracted by the student model, the second feature extracted by the teacher model and the variance of the first feature are all kept consistent. As can be seen from the above formula of distillation loss, the variance of the sample is inversely proportional to the loss function, i.e. the larger the variance of the sample is, the smaller the loss function is, which means that the parameters of the back-propagation-adjusted student model are not large. For example, since the variance of the features of the extracted dirty sample and the too-difficult sample is larger, the confidence of the features can be determined to be smaller according to the inverse relation between the variance and the confidence, and the loss function is smaller, so that the influence of the dirty sample/the too-difficult sample on the parameters of the student model is less.

Step S210, updating parameters of the student model based on the distillation loss.

The model distillation learning loss function can be divided into two parts, the distillation loss (which may also be referred to as adaptive migration loss) between the student model and the teacher model, and the task loss for the student model to perform the object detection task, see the following formula:

L＝L _task +L _distill

wherein L is the whole loss function in model distillation learning; l (L) _task Is a loss function associated with a task.

According to task loss L _task And distillation loss L _distill And determining the whole loss function L in the model distillation process, and updating parameters of the student model according to the L, namely training the student model until the loss function converges or reaches a preset training stopping condition, so as to obtain the trained student model.

According to the training method for the student model, provided by the embodiment of the invention, the influence of the sample quality on the parameters of the student model can be determined by acquiring the confidence coefficient of the first feature extracted by the student model before training, so that the knowledge migration intensity of different training samples during training can be accurately adjusted, the training error caused by dirty samples or excessively difficult samples is reduced, the distillation effect of the student model and the teacher model on object detection is improved, and the performance of the object detection model is improved.

In one embodiment, after the student model is trained by the training method of the student model, the image to be detected can be input into the trained student model, so that object detection can be performed on the image to be detected based on the trained student model, and an object detection result is obtained. It should be noted that the variance generating branch can be used only in the training process, after the training is finished, the student model does not need to pass through the variance generating branch during the actual test, and the trained model parameters are directly used. In addition, the parameters of the teacher model do not need to be modified or adjusted in the whole process.

In summary, by acquiring the confidence coefficient of the first feature extracted by the student model for the training sample before training, the influence of the sample quality on the student model can be determined, and the determination of the distillation loss is also related to the confidence coefficient of the feature extracted by the student model for the training sample, so that the confidence coefficient of the feature of different samples is different, the distillation loss is different, the corresponding model parameter updating effect is also different, that is, the knowledge migration strength for different training samples is different in the student model training process, the student model can carry out self-adaptive knowledge migration for different samples, and the mode enables the student model to carry out parameter updating of different degrees for different samples, so that the trained student model has more excellent performance.

Embodiment III:

on the basis of the foregoing embodiment, this embodiment provides a specific example of a training method using the foregoing student model, referring to a flowchart of another training method for a student model shown in fig. 3, the method may be applied to a knowledge distillation framework shown in fig. 4, and the method mainly includes the following steps S302 to S308:

in step S302, a candidate sample region of the sample image is acquired.

In one embodiment, candidate sample regions (not shown) of the sample image may be output by a detector, which may be a single-stage detector or a two-stage detector. For example, in a distillation frame based on a two-stage detector Faster R-CNN, the candidate sample region is the candidate region obtained by the first rough detection by the two-stage detector, i.e. the candidate sample region is obtained after inputting the sample image into the RPN (Region Proposal Network, region-generating network) in the frame.

In another embodiment, the sample candidate region may also be obtained according to labeling information that is labeled in advance, for example, in a distillation frame based on a single-stage detector RetinaNet, a plurality of distillation regions (i.e., candidate sample regions) may be generated by a truth box in the labeling information, and the candidate sample regions for obtaining the sample image may be obtained by resetting the weights of pixels in the distillation regions to 1 and the weights of pixels in regions other than the distillation regions to 0 (the weights of 0 indicate that no loss function calculation is actually involved).

Step S304, for each sample candidate region, the characteristics of the teacher model, the characteristics of the student model, and the variances of the characteristics of the student model are obtained respectively.

For each candidate sample region, a teacher model and a student model are input, respectively, and a feature extracted by the teacher model (i.e., the second feature mentioned in the foregoing embodiment) and a feature extracted by the student model (i.e., the first feature mentioned in the foregoing embodiment) are output. In addition, for the sample candidate region, a previously established variance generating network may be input, such that a variance of the student model is determined by the variance generating network output, and a confidence of the first feature of the student model is determined according to the variance, the confidence being inversely related to the variance. The variance generating network may be implemented by two convolution layers and/or fully connected layers, which are exemplified by the two fully connected layers in fig. 3.

Alternatively, the variance-confidence conversion process may also be categorized in a variance generating network (which may also be referred to as a variance generating branch), that is, the variance is generated by a variance generating branch and converted into confidence, where the branch may also be referred to as a confidence generating branch.

In one embodiment, when the distillation frame is the distillation frame based on the two-stage detector fast R-CNN, the features extracted by the student model (may also be referred to as a student network) are input into RPN (Region Proposal Network) in the frame to obtain a sample candidate region, and the sample candidate region is input into the variance generating branch or the confidence generating branch to obtain the confidence.

In another embodiment, when the distillation frame is the distillation frame based on the single-stage detector RetinaNet, the features of the distillation region with the weight of 1 are extracted and input to the variance generating branch or the confidence generating branch as the confidence of each pixel in the distillation region.

Step S306, determining the self-adaptive migration loss according to the characteristics of the student model, the variance of the characteristics of the student model and the characteristics of the teacher model.

In one embodiment, the adaptive migration loss (i.e., the distillation loss of the above example) may be determined according to the following formula:

wherein d is the feature dimension, and N is the number of samples;is a feature of the student model; />Is a characteristic of a teacher model; />Variance of features for student model, +.>And->The dimension d of the output remains consistent.

Step S308, determining task loss, and updating parameters of the student model based on the adaptive migration loss and the task loss.

Task loss is the loss of a model when performing object detection tasks, and the loss function when the model performs distillation learning generally includes task loss and adaptive migration loss, that is:

L＝L _task +L _distill

wherein L is the whole loss function in model distillation learning; l (L) _task Loss for a task; l (L) _distill Is an adaptive migration penalty.

And updating parameters of the student model according to the determined whole loss function L in model distillation learning, wherein the parameters of the teacher model remain unchanged until convergence. After training, when the picture to be detected is tested, the picture to be detected is directly input, and the picture to be detected is directly detected according to an undistilled detector without distillation branches.

According to the training method for the student model, the confidence coefficient of the sample level in the detection distillation is obtained (by generating a plurality of candidate areas of the sample, respectively extracting features on the candidate areas and generating the confidence coefficient according to the features), the confidence coefficient of the features of different samples extracted by the student model is different, the distillation loss is different, the corresponding model parameter updating effects are also different, namely the knowledge migration intensity aiming at different training samples in the student model training process is different, and the knowledge migration intensity in the detection task can be regulated according to the generated confidence coefficient in the student model training process, so that a better knowledge migration effect is obtained. In addition, the method does not increase the calculation amount in the test process, so that the knowledge migration effect can be improved under the condition of not reducing the calculation efficiency.

Embodiment four:

for the training method of the student model provided in the second embodiment, the embodiment of the invention provides a training device of the student model, referring to a structural block diagram of the training device of the student model shown in fig. 5, the device comprises the following modules:

an obtaining module 502, configured to obtain a candidate sample area of a training sample;

the feature extraction module 504 is configured to perform feature extraction on candidate sample areas of the training sample through the student model and the teacher model, so as to obtain a first feature extracted by the student model and a second feature extracted by the teacher model;

a confidence coefficient obtaining module 506, configured to obtain a confidence coefficient of the first feature;

a distillation loss determination module 508 for determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and the confidence level of the first feature;

a parameter updating module 510 for updating parameters of the student model based on the distillation loss.

According to the training device for the student model, provided by the embodiment of the invention, the influence of sample quality on the student model can be determined by acquiring the confidence coefficient of the first characteristic extracted by the student model for the training sample before training, the determination of the distillation loss is also related to the confidence coefficient of the characteristic extracted by the student model for the training sample, so that the student model is trained based on the distillation loss, the confidence coefficient of the characteristic extracted by the student model is different, the distillation loss is different, the corresponding model parameter updating effect is also different, namely, the knowledge migration intensity for different training samples in the training process of the student model is different, and the student model can carry out self-adaptive knowledge migration for different samples.

In one embodiment, the confidence coefficient obtaining module 506 is further configured to input the first feature into a variance generating network, obtain a variance of the first feature output by the variance generating network, and characterize the confidence coefficient of the first feature by the variance; wherein the variance generating network comprises a convolution layer and/or a full connection layer, and the variance is inversely related to the confidence.

In one embodiment, the distillation loss determination module 508 is further configured to determine the distillation loss between the student model and the teacher model according to the following formula:

wherein d is the feature dimension, and N is the number of samples;is a first feature; />Is a second feature; />Is the variance.

In one embodiment, the parameter updating module 510 is further configured to obtain a task loss of the student model in performing the object detection task; and updating parameters of the student model according to the task loss and the distillation loss.

In one embodiment, the obtaining module 502 is further configured to input the training sample into the candidate region extraction network to obtain a candidate sample region.

In one embodiment, the obtaining module 502 is further configured to determine the candidate sample area of the training sample according to the labeling information carrying the truth box.

In one embodiment, the apparatus further comprises: the detection module is used for inputting the image to be detected into the trained student model, and carrying out object detection on the image to be detected based on the trained student model to obtain an object detection result.

The device provided in this embodiment has the same implementation principle and technical effects as those of the foregoing embodiment, and for brevity, reference may be made to the corresponding content in the foregoing method embodiment for a part of the description of the device embodiment that is not mentioned.

In summary, according to the training method, the device and the electronic equipment for the student model provided by the embodiment of the invention, the influence of sample quality on the student model can be determined by acquiring the confidence coefficient of the first feature extracted by the student model for the training sample before training, the confidence coefficient of the feature of different samples extracted by the student model is different, the distillation loss is different, the corresponding model parameter updating effect is also different, namely, the knowledge migration strength for different training samples in the training process of the student model is different, the student model can carry out self-adaptive knowledge migration for different samples, and the mode enables the student model to carry out parameter updating of different degrees for different samples, so that the trained student model has more excellent performance.

It will be clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the system described above may refer to the corresponding process in the foregoing embodiment, which is not described in detail herein.

The training method and apparatus for a student model and the computer program product of the electronic device provided in the embodiments of the present invention include a computer readable storage medium storing program codes, where the instructions included in the program codes may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment and will not be repeated herein.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

In the description of the present invention, it should be noted that the directions or positional relationships indicated by the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience of describing the present invention and simplifying the description, and do not indicate or imply that the devices or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Finally, it should be noted that: the above examples are only specific embodiments of the present invention, and are not intended to limit the scope of the present invention, but it should be understood by those skilled in the art that the present invention is not limited thereto, and that the present invention is described in detail with reference to the foregoing examples: any person skilled in the art may modify or easily conceive of the technical solution described in the foregoing embodiments, or perform equivalent substitution of some of the technical features, while remaining within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention, and are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. The training method of the student model is characterized in that the student model learns to a trained teacher model in a knowledge distillation mode, and the student model and the teacher model are both object detection models, and the method comprises the following steps:

acquiring a candidate sample area of a training sample;

extracting features of candidate sample areas of the training sample through the student model and the teacher model respectively to obtain first features extracted by the student model and second features extracted by the teacher model;

obtaining the confidence coefficient of the first feature by solving the variance of the first feature;

determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and the confidence level of the first feature;

updating parameters of the student model based on the distillation loss;

wherein determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and the confidence level of the first feature comprises:

determining a distillation loss between the student model and the teacher model according to the following formula:

wherein d is the feature dimension, and N is the number of samples;is the first feature; />Is the second feature; />Is the variance.

2. The method of claim 1, wherein the step of obtaining the confidence level of the first feature by solving for the variance of the first feature comprises:

inputting the first feature into a variance generating network to obtain a variance of the first feature output by the variance generating network, and representing the confidence coefficient of the first feature through the variance; wherein the variance generating network comprises a convolution layer and/or a full connection layer, and the variance is inversely related to the confidence.

3. The method of claim 1, wherein the step of updating parameters of the student model based on the distillation loss comprises:

acquiring task loss of the student model in executing an object detection task;

and updating parameters of the student model according to the task loss and the distillation loss.

4. The method of claim 1, wherein the step of obtaining candidate sample areas for training samples comprises:

and inputting the training sample into a candidate region extraction network to obtain a candidate sample region.

5. The method of claim 1, wherein the step of obtaining candidate sample areas for training samples comprises:

and determining candidate sample areas of the training samples according to the labeling information carrying the truth boxes.

6. The method according to any one of claims 1 to 5, further comprising:

inputting an image to be detected into a trained student model, and carrying out object detection on the image to be detected based on the trained student model to obtain an object detection result.

7. A training device for a student model, wherein the student model learns to a trained teacher model through a knowledge distillation mode, and the student model and the teacher model are both object detection models, the device comprising:

the acquisition module is used for acquiring candidate sample areas of the training samples;

the feature extraction module is used for extracting features of candidate sample areas of the training sample through the student model and the teacher model respectively to obtain first features extracted by the student model and second features extracted by the teacher model;

the confidence coefficient acquisition module is used for acquiring the confidence coefficient of the first feature by solving the variance of the first feature;

a distillation loss determination module for determining a distillation loss between the student model and the teacher model based on the first feature, the second feature, and a confidence level of the first feature;

a parameter updating module for updating parameters of the student model based on the distillation loss;

when determining the distillation loss between the student model and the teacher model according to the first feature, the second feature and the confidence coefficient of the first feature, the distillation loss determination module is specifically configured to:

8. An electronic device, comprising: a processor and a storage device;

the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of claims 1 to 6.

9. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the method of any of the preceding claims 1 to 6.