CN115661560A

CN115661560A - Method for detecting face in cockpit, method and device for training target detection model

Info

Publication number: CN115661560A
Application number: CN202210763848.4A
Authority: CN
Inventors: 周野; 王琪
Original assignee: Zebred Network Technology Co Ltd
Current assignee: Zebred Network Technology Co Ltd
Priority date: 2022-06-30
Filing date: 2022-06-30
Publication date: 2023-01-31

Abstract

The specification discloses a face detection method in a cockpit, a target detection model training method and a device, which belong to the technical field of computers, wherein the target detection model training method based on knowledge distillation comprises the following steps: determining the confidence of a teacher network and a student network for outputting a prediction frame aiming at the same target object in the current batch of sample images; according to the confidence coefficient of a prediction frame output by a teacher network and a student network aiming at the same target object, determining the loss weight of the regression branch corresponding to the target object; and adjusting the loss value of a regression loss function according to the loss weight of each target object corresponding to the regression branch in the current batch of sample images, wherein the regression loss function is used for training the student network based on knowledge distillation.

Description

Method for detecting face in cockpit, method and device for training target detection model

Technical Field

The specification relates to the technical field of computers, in particular to a face detection method in a cockpit, a target detection model training method and a target detection model training device.

Background

Deep neural networks have achieved great success in the AI fields of computer vision, natural language processing, speech recognition, etc., but these deep neural networks require great computational effort overhead, which seriously hinders the use of deep neural networks in resource-constrained scenarios such as intelligent vehicle cabins. However, with the rapid growth in the field of smart cars, more and more car enterprises are eagerly willing to inject AI capability in a smart cockpit scenario. In order to meet the requirement that a deep neural network model is operated on a vehicle machine with limited computing resources and ensure the real-time performance and the accuracy of the deep neural network model, the model is compressed into a vital technology. Common model compression techniques are model quantization, model pruning and knowledge distillation, among others. The Knowledge Distillation (KD) technology is used for transferring the Knowledge learned by the complex model to the simple model, so that the precision of the simple model is improved.

In the prior art, the prediction conditions of the teacher network and the student network on the target frame are judged by comparing the distances from the prediction frames output by the teacher network and the student network to the real frame. The teacher network is better than the student network, but does not represent that the teacher network predicts better for each sample than the student network. However, for a sample, knowledge distillation may result in overfitting of the student model when the teacher's network is not as predictive of the student's network. The existing target detection method based on knowledge distillation does not consider the prediction condition of a teacher network (complex network) and a student network (simple network) on each sample, so that the effect of knowledge distillation is reduced, and the precision of the student network obtained through knowledge distillation learning is not high.

Disclosure of Invention

In view of the above technical problems in the prior art, embodiments of the present specification provide a method for detecting a face in a cockpit, a method and an apparatus for training a target detection model.

In a first aspect, the present specification provides a knowledge-based distillation target detection model training method, including:

determining the confidence of a teacher network and a student network for outputting a prediction frame aiming at the same target object in the current batch of sample images;

determining the loss weight of the regression branch corresponding to the target object according to the confidence coefficient of the prediction box output by the teacher network and the student network aiming at the same target object;

and adjusting the loss value of a regression loss function according to the loss weight of each target object corresponding to the regression branch in the current batch of sample images, wherein the regression loss function is used for training the student network based on knowledge distillation.

Optionally, the determining, according to the confidence of the teacher network and the confidence of the student network in outputting a prediction box for the same target object, the loss weight of the regression branch corresponding to the target object includes:

respectively taking each object to be detected in the current batch of sample images as a target object, wherein the current batch of sample images comprise a plurality of sample images, and at least one object to be detected exists in each sample image;

performing forward calculation through the teacher network to obtain a first confidence score of a prediction frame output by the teacher network for the target object;

performing forward calculation through the student network to obtain a second confidence score of a prediction frame output by the student network for the target object;

and obtaining the loss weight of the regression branch corresponding to the target object according to the first confidence score and the second confidence score.

Optionally, the obtaining, according to the first confidence score and the second confidence score, a loss weight of a regression branch corresponding to the target object includes:

determining a confidence ratio of the first confidence score to the second confidence score;

and according to the magnitude relation between the confidence coefficient ratio and a reference weight threshold, constraining the loss weight of the regression branch corresponding to the target object in a preset weight range.

Optionally, the reference weight threshold is set to 3 to 7.

Optionally, the adjusting the loss value of the regression loss function according to the loss weight of the regression branch corresponding to each target object in the current batch of sample images includes:

for each target object in the current batch of sample images, carrying out weighted calculation on the original loss value of the regression branch corresponding to the target object according to the loss weight of the regression branch corresponding to the target object, and obtaining an adjusted loss value of the regression branch corresponding to the target object;

and summing the adjusted loss values of the regression branches corresponding to each target object in the current batch of sample images, and determining the current loss value of the regression loss function according to the result of the summation.

Optionally, after determining the current loss value of the regression loss function according to the result of the summation calculation, the method further includes:

judging whether the regression loss function meets a convergence condition or not according to the current loss value;

and if the convergence condition is met, finishing the training of the student network based on knowledge distillation to obtain a target detection model after the knowledge distillation, and otherwise, continuing the training of the student network based on the knowledge distillation by adopting the next batch of sample images.

Optionally, the method further comprises: a complex network model and a simple network model are constructed in advance; pre-training the complex network model to generate the teacher network; and taking the simple network model as the student network, wherein the prediction accuracy of the teacher network is higher than that of the student network.

In a second aspect, the present specification provides a method for detecting a human face in a cockpit, which is applied to a vehicle, and includes:

a face detection model is adopted to detect the face in a cab of a vehicle, wherein the face detection model is obtained by training through a knowledge distillation-based target detection model training method in any embodiment of the first aspect;

and executing one or more of the following tasks aiming at the driver according to the face detection result: the system comprises a face recognition task, a face key point detection task, a living body detection task and a fatigue monitoring task.

In a third aspect, the present specification provides an object detection model training apparatus based on knowledge distillation, including:

the confidence coefficient determining unit is used for determining the confidence coefficient of a teacher network and the confidence coefficient of a student network for outputting a prediction frame aiming at the same target object in the current batch of sample images;

the weight adjusting unit is used for determining the loss weight of the regression branch corresponding to the target object according to the confidence coefficient of the prediction frame output by the teacher network and the student network aiming at the same target object;

and the loss calculation unit is used for adjusting the loss value of a regression loss function according to the loss weight of the regression branch corresponding to each target object in the current batch of sample images, and the regression loss function is used for training the student network based on knowledge distillation.

In a fourth aspect, the present specification provides an in-cabin human face detection apparatus, which is applied to a vehicle, and includes:

a face detection unit, configured to perform face detection on a face in a cockpit of a vehicle by using a face detection model, where the face detection model is obtained by training according to the knowledge distillation-based target detection model training method as claimed in any one of claims 1 to 7;

the task execution unit is used for executing one or more of the following tasks aiming at the driver according to the face detection result: the system comprises a face recognition task, a face key point detection task, a living body detection task and a fatigue monitoring task.

In a fifth aspect, an embodiment of the present specification provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to any one of the first aspect or the second aspect when executing the program.

In a sixth aspect, the present specification provides a computer readable storage medium, on which a computer program is stored, the program, when executed by a processor, implementing the steps of the method according to any one of the embodiments of the first aspect, or implementing the steps of the method according to any one of the embodiments of the second aspect.

One or more technical solutions provided in the embodiments of the present description at least achieve the following technical effects or advantages:

according to the confidence coefficient of a prediction frame output by a teacher network and a student network aiming at the same target object, determining the loss weight of the regression branch corresponding to the target object; the loss value of the regression loss function used for training the student network based on knowledge distillation is adjusted according to the loss weight of the regression branch corresponding to each target object in the current batch of sample images, so that the prediction condition of the teacher network and the student network to each object in each sample image is considered in the process of training the student network based on knowledge distillation, the loss weight of each object corresponding to the regression branch is dynamically adjusted, the loss weight participates in the calculation of the regression loss function, the calculation result of the regression loss function fully reflects the quality of the prediction condition of each object in each sample image by the teacher network and the student network, and therefore the prediction accuracy of the target detection model trained based on knowledge distillation can be improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the specification. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 is a flow chart of a knowledge-based distillation target detection model training method provided in an embodiment of the present disclosure;

FIG. 2 is a flowchart of a method for detecting a face in a cockpit according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a training apparatus for a knowledge-based distillation target detection model provided in an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a face detection device in a cockpit according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the specification.

Detailed Description

The embodiment of the specification provides a face detection method in a cockpit, a target detection model training method and a target detection model training device, and the general idea is as follows:

determining the confidence of a teacher network and a student network for outputting a prediction frame aiming at the same target object in the current batch of sample images; determining the loss weight of the regression branch corresponding to the target object according to the confidence coefficient of the prediction box output by the teacher network and the student network aiming at the same target object; and adjusting the loss value of a regression loss function according to the loss weight of each target object corresponding to the regression branch in the current batch of sample images, wherein the regression loss function is used for training the student network based on knowledge distillation. The prediction accuracy of the target detection model trained based on knowledge distillation can be improved through the embodiment of the invention.

In order to better understand the technical solutions, the technical solutions of the embodiments of the present specification are described in detail below with reference to the drawings and specific embodiments, and it should be understood that the specific features of the embodiments and embodiments of the present specification are detailed descriptions of the technical solutions of the embodiments of the present specification, and are not limitations of the technical solutions of the present specification, and the technical features of the embodiments and embodiments of the present specification may be combined with each other without conflict.

Training of a knowledge-based distillation target detection model includes: 1. knowledge distillation of the Backbone layer; 2. knowledge distillation at the RPN (region pro-active network) level; 3. knowledge distillation at the RCN (region classification network) level.

Knowledge distillation of the Backbone layer: the feature transfer method can be used and an adaptation layer is used to raise/lower the dimensions of the features of the student network to the same latitude as the features of the teacher network. Knowledge distillation at the RPN level and knowledge distillation at the RCN level are carried out on the classification task and the regression task. The target detection model training method based on knowledge distillation provided by the embodiment of the specification is applied to knowledge distillation of regression tasks, and the knowledge distillation in other aspects can refer to the prior art, and is not limited herein.

In a first aspect, an embodiment of the present specification provides a knowledge-based training method for a target detection model, which is applied to knowledge distillation of a regression task, and as shown in fig. 1, the training method for a target detection model based on knowledge distillation specifically includes the following steps:

s101, determining the confidence of a teacher network and a student network for outputting a prediction box for the same target object in the current batch of sample images.

Dividing a first sample image set into a plurality of batches of sample images, wherein one batch of sample images is a batch, training a student network based on knowledge distillation is carried out by taking the batch as a unit, each batch of sample images comprises a plurality of sample images, each sample image at least comprises one object to be detected, one object to be detected is a target object, and forward prediction is respectively carried out on each sample image in the current batch of sample images through a teacher network and the student network to obtain the confidence coefficient of the teacher network and the confidence coefficient of the student network for outputting a prediction frame to the same target object on the same sample image.

The target object can be a human face area in the sample image or other specific image areas needing to be detected. In the case of training a face detection model using a teacher network and a student network, at least one face region exists on each sample image.

Because one or more target objects exist on each sample image, one or more prediction frames are correspondingly output when a teacher network predicts one sample image forwards, and one or more prediction frames are correspondingly output when a student network predicts one sample image forwards. For example, if three faces exist in a sample image, three prediction frames are output for forward prediction of the sample image through a teacher network and a student network. The Prediction frame is a Prediction box, namely a frame for calculation output of the teacher network and a frame for calculation output of the student network, and the Prediction frame is a real frame which is a group channel box, is a position for manual marking on the sample image and is stored in a marking file.

Before step S101, a complex network model and a simple network model need to be constructed in advance; and taking the simple network model as a student network, and pre-training the complex network model by utilizing a second sample image set to generate a teacher network, wherein the prediction precision of the teacher network is greater than that of the student network. It should be noted that the second sample image set may be the same as or different from the first sample image set.

And S102, determining the loss weight of the regression branch corresponding to the target object according to the confidence coefficient of the teacher network and the confidence coefficient of the student network to the output prediction frame of the same target object.

In an embodiment of the present specification, the outputting, by the teacher network and the student network, the confidence of the prediction box for the same target object includes: the teacher network outputs a first confidence score of a prediction box for the target object, and the student network outputs a second confidence score of a prediction box for the target object.

Aiming at the same target object on the same sample image, performing forward calculation through the teacher network to obtain a first confidence score of the teacher network for outputting a prediction frame to the target object, and performing forward calculation through the student network to obtain a second confidence score of the student network for outputting the prediction frame to the target object; and obtaining the loss weight of the regression branch corresponding to the target object according to the first confidence coefficient score and the second confidence coefficient score.

It should be understood that, a current batch of sample images corresponds to M × N regression branches, and a target object corresponds to a regression branch, and the loss weight corresponding to each regression branch in the M × N regression branches needs to be adjusted, where M is the number of samples of the current batch of sample images, and N is the number of target objects existing in a sample image.

In some embodiments, obtaining the loss weight corresponding to the target object according to the first confidence score and the second confidence score includes: determining a confidence ratio of the first confidence score to the second confidence score; and according to the magnitude relation between the confidence coefficient ratio and the reference weight threshold, constraining the loss weight of the regression branch corresponding to the target object within a preset weight range.

Specifically, the teacher network performs forward calculation to obtain a first confidence score of a prediction box output by the teacher network for the target object; the student network carries out forward calculation to obtain a second confidence score of the student network for outputting a prediction box aiming at the target object; dividing the first confidence score by the second confidence score to obtain a confidence ratio; processing the confidence coefficient ratio value through the following function to restrain the loss weight of the regression branch corresponding to the target object in a preset weight range:

wherein, RSt is a first confidence score, RSs is a second confidence score, RSt/RSs is a confidence ratio, weight is a loss weight of the regression branch corresponding to the target object obtained through calculation, [0,m ] is a preset weight range, and the function of the lip function is to process the confidence ratio RSt/RSs so as to constrain the loss weight of the regression branch corresponding to the target object between [0,m ]: if the confidence coefficient ratio RSt/RSs is larger than the reference weight threshold value m, taking the reference weight threshold value m as the loss weight of the regression branch corresponding to the target object; and if the confidence coefficient ratio is not greater than the reference weight threshold value m, taking the confidence coefficient ratio RSt/RSs as the loss weight of the regression branch corresponding to the target object.

In a specific implementation, the reference weight threshold m may be set to 3 to 7. Taking m =3 as an example, if the confidence ratio RSt/RSs is calculated to be equal to 1.9 for a certain target object, the weight loss = RSt/RSs =1.9 of the regression branch corresponding to the target object; if the confidence ratio RSt/RSs is calculated to be equal to 3.1, then weight = m =2.

S103, adjusting the loss value of a regression loss function according to the loss weight of each target object corresponding to the regression branch in the current batch of sample images, wherein the regression loss function is used for training the student network based on knowledge distillation.

For any target object, if the loss weight of the target object is greater than 1, the learning condition of the target object in the sample image of the student network is not good as that of the teacher network, and the loss value of the regression branch corresponding to the target object needs to be increased; if the loss weight of the target object is less than 1, the learning condition of the target object by the student network is better than that of the teacher network, and the loss value of the corresponding regression branch of the target object needs to be reduced.

Therefore, for each target object on each sample image in the current batch of sample images, the loss value of the regression branch corresponding to the target object is adjusted according to the loss weight of the regression branch corresponding to the target object. Specifically, the original loss value of the regression branch corresponding to the target object may be weighted according to the loss weight corresponding to the target object, and for the adjusted loss value of the regression branch corresponding to the target object, the following formula is specifically referred to:

Loss2 _i ＝Loss1 _i *weight _i

therein, loss1 _i Is the original loss value, weight, of the regression branch corresponding to the ith target object in the current batch of sample images of the student network _i Loss weight, loss2, for the regression branch corresponding to the ith target object in the current batch of sample images _i And the adjusted loss value of the i-th target object in the current batch of sample images of the student network corresponding to the regression branch is obtained.

It should be noted that, because the network structures of the student networks are different, the loss functions for calculating the original loss values of each regression branch are different, and therefore, the specific form of the loss function for calculating the original loss values of each regression branch is not limited. Specifically, such loss functions as mean square error (sum of average distances between the real and predicted frames), and average absolute error (sum of absolute differences between the real and predicted frames) can be used to calculate the original loss value of each regression branch. By the technical scheme, the original loss value of the regression branch corresponding to each target object in the current batch of sample images can be dynamically adjusted, and the adjusted loss value is obtained.

After the original loss value of the regression branch corresponding to each target object in the current batch of sample images is dynamically adjusted, the adjusted loss values of the regression branches corresponding to each target object in the current batch of sample images are summed, and the current loss value of the regression loss function is determined according to the summed result.

Therein, loss2 _i And adjusting Loss values of regression branches corresponding to the ith target object in the current batch of sample images, wherein Loss is the current Loss value of the regression Loss function, K = M N, and K is the number of prediction frames output by the student network for the current batch of sample images.

Judging whether the regression loss function meets a convergence condition or not according to the current loss value of the regression loss function, if so, finishing training the student network based on knowledge distillation to obtain a target detection model after the knowledge distillation; otherwise, continuing training the student network based on knowledge distillation by adopting the next batch of sample images.

According to the technical scheme, the loss weight of the regression branch corresponding to each object is dynamically adjusted, and the loss weight participates in the calculation of the regression loss function, so that the calculation result of the regression loss function fully reflects the prediction condition of a teacher network and a student network on each object in each sample image, and therefore the prediction precision of the target detection model trained based on knowledge distillation can be improved.

It should be understood that the knowledge-based distillation target detection model training method provided by the embodiments of the present specification needs to be executed on a device with huge computational resources, and the training of the obtained target detection model can be applied to a scenario with limited computational resources, such as: intelligent terminals, automobile intelligent cabs, and the like.

If the target detection model after knowledge distillation is a human face detection model, the method can be applied to an intelligent cab of a vehicle. Therefore, based on the same inventive concept, the embodiment of the specification further provides a method for detecting the face in the cab, which is applied to a vehicle.

Referring to fig. 2, a method for detecting a face in a cockpit according to an embodiment of the present invention includes the following steps S201 to S202:

s201: and carrying out face detection on the face in the cockpit of the vehicle by adopting a face detection model, wherein the face detection model is obtained by training through the knowledge distillation-based target detection model training method in any one of the embodiments of the first aspect.

Since the training method of the target detection model based on knowledge distillation has been described in detail in the foregoing, reference may be made to the foregoing for further implementation details in this section, and further description is omitted here.

S202: and executing one or more of the following tasks aiming at the driver according to the face detection result: the system comprises a face recognition task, a face key point detection task, a living body detection task and a fatigue monitoring task.

And carrying out face detection on the face in the cab of the vehicle by adopting a face detection model to obtain a face prediction frame. In a specific implementation process, the face recognition task for the driver is executed according to the face detection result, which may be: and identifying whether the person currently sitting at the driving position is the target driver according to the face prediction frame obtained by face detection, so that service items corresponding to the target driver are provided conveniently. Executing a face key point detection task aiming at a driver according to a face prediction frame obtained by face detection, wherein the face key point detection task can be as follows: locating the key region position of the face from the face prediction frame, comprising: eyebrows, eyes, nose, mouth, facial contours, etc. And executing a living body detection task aiming at the driver according to the face prediction frame obtained by the face detection so as to verify the identity of the person currently sitting in the driving position, thereby more accurately identifying whether the person currently sitting in the driving position is the target driver. The fatigue monitoring for the driver is performed according to the face prediction box, which may be: and positioning face characteristic points (such as eye regions, head postures and the like) in the face prediction frame, and predicting whether the driver is tired according to the face characteristic points.

Because the face detection model used by the method for detecting the face in the cockpit is obtained by training by adopting the training method for the target detection model based on knowledge distillation in the technical scheme, the prediction precision of the face detection model is higher, the accuracy of face detection in the cockpit can be improved, and the method is favorable for better completing a face recognition task, a face key point detection task, a living body detection task and a fatigue monitoring task for a driver.

Based on the same inventive concept, the present specification provides a knowledge-based distillation target detection model training apparatus, which is shown in fig. 3 and includes:

the confidence determining unit 301 is configured to determine confidence of the teacher network and the student network outputting the prediction box for the same target object in the current batch of sample images;

a weight adjusting unit 302, configured to determine a loss weight of a regression branch corresponding to a target object according to a confidence that the teacher network and the student network output prediction frames for the same target object;

a loss calculating unit 303, configured to adjust a loss value of a regression loss function according to a loss weight of a regression branch corresponding to each target object in the current batch of sample images, where the regression loss function is used to train the student network based on knowledge distillation.

In some embodiments, the weight adjusting unit 302 includes: the object determining unit is used for respectively taking each object to be detected in the current batch of sample images as a target object, the current batch of sample images comprise a plurality of sample images, and each sample image comprises at least one object to be detected; the first calculation unit is used for performing forward calculation through the teacher network to obtain a first confidence score of a prediction frame output by the teacher network for the target object; the second calculation unit is used for performing forward calculation through the student network to obtain a second confidence score of the student network for the target object output prediction frame; and the weight calculation unit is used for obtaining the loss weight of the regression branch corresponding to the target object according to the first confidence score and the second confidence score.

In some embodiments, the weight calculating unit is specifically configured to: determining a confidence ratio of the first confidence score to the second confidence score; and according to the magnitude relation between the confidence coefficient ratio and a reference weight threshold, constraining the loss weight of the regression branch corresponding to the target object in a preset weight range.

In some embodiments, the reference weight threshold is set to 3 to 7.

In some embodiments, the loss calculating unit 303 is specifically configured to: for each target object in the current batch of sample images, carrying out weighted calculation on the original loss value of the regression branch corresponding to the target object according to the loss weight of the regression branch corresponding to the target object, and obtaining an adjusted loss value of the regression branch corresponding to the target object; and summing the adjusted loss values of the regression branches corresponding to each target object in the current batch of sample images, and determining the current loss value of the regression loss function according to the result of the summation.

In some embodiments, the training end determining unit is further configured to determine whether the regression loss function satisfies a convergence condition according to the current loss value; and if the convergence condition is met, finishing the training of the student network based on knowledge distillation to obtain a target detection model after the knowledge distillation, and otherwise, continuing the training of the student network based on the knowledge distillation by adopting the next batch of sample images.

In some embodiments, a pre-training unit to: a complex network model and a simple network model are constructed in advance; pre-training the complex network model to generate the teacher network; and taking the simple network model as the student network, wherein the prediction accuracy of the teacher network is higher than that of the student network.

With regard to the above-mentioned training apparatus for training a target detection model based on knowledge distillation, the specific functions of each unit have been described in detail in the embodiment of the training method for a target detection model based on knowledge distillation provided in the embodiments of the present specification, and will not be described in detail here.

Based on the same invention concept, the embodiment of the invention provides a face detection device in a cab, which is applied to a vehicle. Referring to fig. 4, an embodiment of the present invention provides a device for detecting a human face in a cockpit, including:

a face detection unit 401, configured to perform face detection on a face in a cabin of a vehicle by using a face detection model, where the face detection model is obtained by training according to the knowledge distillation-based target detection model training method of any one of claims 1 to 7;

a task execution unit 402, configured to execute one or more of the following tasks for the driver according to the face detection result: the system comprises a face recognition task, a face key point detection task, a living body detection task and a fatigue monitoring task.

With regard to the above-mentioned apparatus, the specific functions of the respective units have been described in detail in the embodiment of the method for detecting a human face in a cockpit provided in the embodiment of the present specification, and will not be described in detail here.

Based on the same inventive concept, embodiments of the present specification further provide an electronic device, as shown in fig. 5, including a memory 504, a processor 502, and a computer program stored in the memory 504 and executable on the processor 502, where the processor 502 executes the program to implement the steps described in any one of the embodiments of the knowledge-based distillation target detection model training method described above, or to implement the steps described in any one of the embodiments of the method for detecting a human face in a cockpit.

Where in fig. 5 a bus architecture (represented by bus 500) is shown, bus 500 may include any number of interconnected buses and bridges, and bus 500 links together various circuits including one or more processors, represented by processor 502, and memory, represented by memory 504. The bus 500 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 506 provides an interface between the bus 500 and the receiver 501 and transmitter 503. The receiver 501 and the transmitter 503 may be the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium. The processor 502 is responsible for managing the bus 500 and general processing, and the memory 504 may be used for storing data used by the processor 502 in performing operations.

In a fourth aspect, based on the inventive concept of the applet-based contract event processing method as in the foregoing embodiments, the present specification further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of any of the embodiments of the knowledge-based distillation target detection model training method as in the foregoing embodiments, or implements the steps of any of the embodiments of the method for detecting a human face in a cockpit.

The description has been described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the description. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present specification have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all changes and modifications that fall within the scope of the specification.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present specification without departing from the spirit and scope of the specification. Thus, if such modifications and variations of the present specification fall within the scope of the claims of the present specification and their equivalents, then such modifications and variations are also intended to be included in the present specification.

Claims

1. A knowledge distillation-based target detection model training method is characterized by comprising the following steps:

2. The method of claim 1, wherein determining the loss weight of the target object for the regression branch based on the confidence level that the teacher network and the student network output prediction boxes for the same target object comprises:

performing forward calculation through the teacher network to obtain a first confidence score of a prediction box output by the teacher network for the target object;

3. The method of claim 2, wherein said deriving a loss weight for a regression branch corresponding to the target object based on the first confidence score and the second confidence score comprises:

4. The method of claim 3, wherein the reference weight threshold is set to 3 to 7.

5. The method of claim 1, wherein adjusting the loss value of the regression loss function according to the loss weight of the regression branch corresponding to each target object in the current batch of sample images comprises:

6. The method of claim 1 or 5, after determining the current loss value of the regression loss function from the summation computation, further comprising:

7. The method of claim 1, further comprising:

a complex network model and a simple network model are constructed in advance;

pre-training the complex network model to generate the teacher network;

and taking the simple network model as the student network, wherein the prediction accuracy of the teacher network is higher than that of the student network.

8. A method for detecting a face in a cockpit, applied to a vehicle, the method comprising:

carrying out face detection in a cockpit of a vehicle by adopting a face detection model, wherein the face detection model is obtained by training through a knowledge distillation-based target detection model training method according to any one of claims 1 to 7;

9. A knowledge distillation-based target detection model training device is characterized by comprising:

10. An in-cockpit face detection device for use in a vehicle, the device comprising:

a face detection unit, configured to perform face detection on a face in a cabin of a vehicle by using a face detection model, where the face detection model is obtained by training according to the knowledge distillation-based target detection model training method of any one of claims 1 to 7;

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method of any one of claims 1 to 7 or the steps of the method of claim 8 when executing the program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7 or carries out the steps of the method of claim 8.