CN108664893B - Face detection method and storage medium - Google Patents

Face detection method and storage medium Download PDF

Info

Publication number
CN108664893B
CN108664893B CN201810290187.1A CN201810290187A CN108664893B CN 108664893 B CN108664893 B CN 108664893B CN 201810290187 A CN201810290187 A CN 201810290187A CN 108664893 B CN108664893 B CN 108664893B
Authority
CN
China
Prior art keywords
network
loss function
classification
lightweight
face detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810290187.1A
Other languages
Chinese (zh)
Other versions
CN108664893A (en
Inventor
黄海清
王金桥
陈盈盈
刘智勇
郑碎武
杨旭
黄志明
谢德坤
田�健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Haijing Technology Development Co ltd
Original Assignee
Fujian Haijing Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Haijing Technology Development Co ltd filed Critical Fujian Haijing Technology Development Co ltd
Priority to CN201810290187.1A priority Critical patent/CN108664893B/en
Publication of CN108664893A publication Critical patent/CN108664893A/en
Application granted granted Critical
Publication of CN108664893B publication Critical patent/CN108664893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

A face detection method and a storage medium, the method includes the following steps: 102, respectively inputting a batch of same training images to a lightweight network and a complex network; 104, filtering output results of classification graphs of the lightweight network and the complex network by adopting a difficult sample mining method; 106, constructing a comprehensive loss function, wherein the comprehensive loss function comprises a knowledge distillation loss function or a label-based face detection loss function, and the knowledge distillation loss function is obtained according to the output results of the classification images of the lightweight network and the complex network; step 108, updating parameters of the lightweight network based on the loss function, and not updating parameters of the complex network; and step 110, repeating the steps until the lightweight network is trained to be converged. The neural network algorithm for quickly adjusting the parameters is provided, and the problem of face detection of a light neural network is solved.

Description

Face detection method and storage medium
Technical Field
The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to a human face detection method which can be applied to the fields of safety monitoring, human-computer interaction and the like.
Background
Face detection is an important technology that is required in many computer vision applications, such as face tracking, face alignment, face recognition, etc. In recent years, due to the development of convolutional neural networks, the performance of face detection is obviously improved. However, existing face detection models are typically slow to compute because they require a relatively large neural network to maintain good face detection performance. Although detection frameworks based on one-step methods are also proposed to speed up detection (e.g. SSD, YOLO), they are still not fast enough for practical application scenarios, especially in CPU-based environments. On the other hand, if the speed requirement is met by reducing the parameters of the convolutional network, the performance of the detector will be significantly degraded. Therefore, it is a very challenging task to obtain a light-weight face detector with good performance.
Knowledge distillation (knowledge distillation) is a technique that enables small networks to mimic learning large networks, thereby improving the performance of small networks. The effectiveness of knowledge distillation has been validated on classification and metric learning tasks. For the detection task, there is no way to directly use the original knowledge distillation (knowledge distillation) technology, because the output of the detector has the problem of unbalanced classes (the background class is much more than other classes), and if only the analog learning of all the outputs like the classification task is performed, a good performance cannot be obtained. Most lightweight detectors are based on a one-step approach, rather than a two-step approach, because of the speed advantage of the former. Compared with the two-step method, the one-step method lacks a regional nomination network for eliminating negative samples, so the problem of category imbalance is more serious.
Disclosure of Invention
Therefore, a novel neural network algorithm which can be suitable for a one-step method to improve the performance of a lightweight detection model and quickly adjust parameters needs to be provided, and the problem of face detection of a lightweight neural network is solved. In the present invention, the inventor provides a face detection method, which includes the following steps:
102, respectively inputting a batch of same training images to a lightweight network and a complex network;
104, filtering output results of classification graphs of the lightweight network and the complex network by adopting a difficult sample mining method;
106, constructing a comprehensive loss function, wherein the comprehensive loss function comprises a knowledge distillation loss function or a label-based face detection loss function, and the knowledge distillation loss function is obtained according to the output results of the classification images of the lightweight network and the complex network;
step 108, updating parameters of the lightweight network based on the loss function, and not updating parameters of the complex network;
and step 110, repeating the steps until the lightweight network is trained to be converged.
Preferably, the filtering by the difficult sample mining method specifically comprises:
setting a threshold T for judging whether a certain probability in the classification map has enough confidence; t is a hyper-parameter, the value range is 0 to 1, each index in the classification graph is traversed, and when the probability of the index in the lightweight network is greater than T and the probability in the complex network is less than T, the index is added into the set Sm(ii) a Or, when the index has a probability less than T in the lightweight network and greater than T in the complex network, the index is also added to Sm
Alternatively, the knowledge distillation loss function is:
Figure BDA0001617025390000021
wherein p is(i)Is the ith probability score, q, in the classification map of the complex network(i)It is the ith probability score of the lightweight network classification graph.
Further, the air conditioner is provided with a fan,
the label-based face detection loss function is as follows:
LG=Lcls+Lreg
wherein L isclsIs a class two Softmax loss function for classification, LregIs a robust regression loss function for regression;
the comprehensive loss function is weighted by a knowledge distillation loss function and a face detection loss function based on a label:
L=LG+cLKD
c is the equilibrium coefficient.
Specifically, the method further comprises the steps of constructing a lightweight network and a complex network;
constructing a face detection model based on a convolutional neural network, taking the face detection model as a complex network, and training the complex network to be convergent;
and constructing a face detection model of a convolutional neural network with the same framework as the complex network to serve as a lightweight network, wherein the number of each layer of filter in the framework of the lightweight network is smaller than that of the complex network.
A knowledge-based distillation face detection storage medium storing a computer program which, when executed, performs the steps of:
102, respectively inputting a batch of same training images to a lightweight network and a complex network;
104, filtering output results of classification graphs of the lightweight network and the complex network by adopting a difficult sample mining method;
106, constructing a comprehensive loss function, wherein the comprehensive loss function comprises a knowledge distillation loss function or a label-based face detection loss function, and the knowledge distillation loss function is obtained according to the output results of the classification images of the lightweight network and the complex network;
step 108, updating parameters of the lightweight network based on the loss function, and not updating parameters of the complex network;
and step 110, repeating the steps until the lightweight network is trained to be converged.
Specifically, the filtering performed by the difficult sample mining method specifically comprises the following steps:
setting a threshold T for judging whether a certain probability in the classification map has enough confidence; t is a hyper-parameter, the value range is 0 to 1, each index in the classification graph is traversed, and when the probability of the index in the lightweight network is greater than T and the probability in the complex network is less than T, the index is added into the set Sm(ii) a Or, when the index has a probability less than T in the lightweight network and greater than T in the complex network, the index is also added to Sm
Alternatively, the knowledge distillation loss function is:
Figure BDA0001617025390000041
wherein p is(i)Is the ith probability score, q, in the classification map of the complex network(i)It is the ith probability score of the lightweight network classification graph.
Further, the air conditioner is provided with a fan,
the label-based face detection loss function is as follows:
LG=Lcls+Lreg
wherein L isclsIs a class two Softmax loss function for classification, LregIs a robust regression loss function for regression;
the comprehensive loss function is weighted by a knowledge distillation loss function and a face detection loss function based on a label:
L=LG+cLKD
c is the equilibrium coefficient.
Specifically, the computer program further performs steps when executed to construct a lightweight network, a complex network;
constructing a face detection model based on a convolutional neural network, taking the face detection model as a complex network, and training the complex network to be convergent;
and constructing a face detection model of a convolutional neural network with the same framework as the complex network to serve as a lightweight network, wherein the number of each layer of filter in the framework of the lightweight network is smaller than that of the complex network.
Different from the prior art, the technology adopts a standardized calculation means, indexes which are not available in the original evaluation system are introduced in the whole evaluation process, and parameters are unified, so that the quantitative standards are relatively unified, and therefore, the method solves the problem of real-time analysis of network public opinion dynamics.
Drawings
Fig. 1 is a flowchart of a face detection method according to an embodiment.
Detailed Description
To explain technical contents, structural features, and objects and effects of the technical solutions in detail, the following detailed description is given with reference to the accompanying drawings in conjunction with the embodiments.
In the embodiment shown in fig. 1, we can see a face detection method, which includes the following steps:
and step 100, constructing a face detection model based on a convolutional neural network as a teacher network, and training the model until convergence.
The framework of teacher networks is usually the same as student networks, but the number of filters per layer is several times that of student networks, so the performance is better. In order to make the complexity of the conventional convolutional neural network simple, in this context, the teacher network can be replaced with a complex network, and the student network can also be replaced with a lightweight network, which is characterized by being a face detection model of the convolutional neural network in the same frame as the complex network, and the number of each layer of filter in the frame of the lightweight network is smaller than that of the complex network. The teacher's network training method is the same as the conventional detection model, and taking the invention as an example, the loss function of the teacher's network is LG=Lcls+Lreg. Wherein L isclsIs a class two Softmax loss function for classification, LregIs a robust regression loss function (smooth L) for regression1). Constructing a lightweight network and a complex network;
the student network is the detection model to be obtained finally, and the parameters of the student network are initialized randomly by using the Xavier method.
In other specific embodiments, the present invention may perform the above preparation steps in advance and directly start with step 102, and input a batch of the same training images for the lightweight network and the complex network, respectively;
the training image may not be processed, or a data augmentation technique may be performed therein, specifically as follows:
for each training image input, a data augmentation technique is used, thereby increasing the generalization performance of the model. Taking the present invention as an example, data augmentation comprises the steps of:
(1) color dithering operation: parameters such as brightness, contrast, saturation, and the like of the training image are randomly adjusted with a probability of 0.5, respectively.
(2) Random clipping operation: on this training image, 5 square sub-images were randomly cropped. Wherein 1 is the largest square sub-image, and the side length of the other 4 square sub-images is 0.3-1.0 times of the short side of the training image. Of these 5 square sub-images, 1 piece was randomly selected as the final training sample.
(3) And (3) horizontal turning operation: for this selected training sample, the horizontal flipping operation was randomly performed with a probability of 0.5.
(4) Scale transformation operation: and scaling the training samples obtained through the operation to 1024 × 1024 sizes, and sending the training samples into a network for training.
Step 104, filtering by adopting a difficult sample mining method aiming at the output results of the classification maps (classification maps) of the lightweight network and the complex network; thereby solving the problems of unbalanced classification and low fitting efficiency.
The knowledge distillation (knowledge distillation) method is intended for student networks to output the same result as a teacher's network as much as possible by imitating the teacher's network. In the neural network, the information of the later layers is more closely related to the final prediction result, and better supervision information can be provided for simulating learning. Therefore, the last layer is suitable for the students to imitate the network learning. In the single-step method-based face detection framework, the last layer has two modules, namely a classification map and a regression map. The knowledge distillation is effective, and provides the soft label information learned by the teacher through the network to the student network, and compared with the original label, the soft labels have more accurate and smooth information, thereby being more beneficial to network learning. In the face detection, the labeling label of the regression frame is originally a real number and is relatively accurate; the label labels of the classification task are only 0 and 1, and are not very accurate. Therefore, the classification map is more suitable for knowledge distillation (knowledge distillation) learning.
A typical classification map (classification map) based on the one-step method has an output size of 2N × H × W, where N is the number of anchor blocks, 2 indicates the probability that each anchor block needs to predict the positive class and the negative class, H is the height of the classification map, and W is its width. Since the probabilities of the positive and negative classes are normalized and the sum is always 1, only the probability of the positive class can be focused when performing knowledge distillation, and thus the output of the classification map can be simplified to N × H × W. In the training process, the teacher network and the student network output the result of a classification map respectively, and for the two results, it needs to determine which indexes of the classification map should be filtered and which indexes can be used for knowledge distillation (knowledge distillation).
Then, step 106 is further performed to construct a synthetic loss function, which includes a knowledge distillation loss function or a label-based face detection loss function, and the knowledge distillation loss function is obtained according to the output results of the classification diagrams of the lightweight network and the complex network. In some optional embodiments, the knowledge distillation loss function is constructed to make the classification map (classification map) result of the current student network as close as possible to the classification map of the teacher network, and as a specific embodiment, the knowledge distillation loss function is:
Figure BDA0001617025390000071
wherein p is(i)Is the ith probability score, q, in the classification map of the complex network(i)It is the ith probability score of the lightweight network classification graph.
Further, during the training process, besides the knowledge distillation (knowledge distillation) loss function, there is also a conventional label-based face detection loss function, which is consistent with the region nomination network in the classical detection framework, fast RCNN:
the label-based face detection loss function is as follows:
LG=Lcls+Lreg
wherein L isclsIs a class two Softmax loss function for classification, LregIs a robust regression loss function for regression; during training, the loss function based on knowledge distillation (knowledge distillation) and the loss function based on the label are added to form a final comprehensive loss function.
The comprehensive loss function is a knowledge distillation loss function and:
L=LG+cLKD
c is a coefficient for balancing two loss functions, fixed at 50 in the present invention, and the optimal value should be determined by a specific scenario.
As shown in fig. 1, step 108 is then performed to update the parameters of the lightweight network and not to update the parameters of the teacher network based on the loss function. In this step, parameters of the student network are updated by using a back propagation algorithm according to the obtained comprehensive loss function, so that one training is completed. The parameters of the teacher's network do not need to be updated and therefore need to be frozen during training.
Step 110, repeat the above steps 102-108 until the lightweight network training is converged.
By applying the design of the steps, the precision of the lightweight face detector can be effectively improved, so that the face detection can achieve satisfactory detection effect on equipment with limited computing resources. Since the detection model and the classification and metric learning model are different in network structure, a knowledge distillation (knowledge distillation) method cannot be directly used for the detection task. Invention of the inventionPeople find that the regression graph in the face detection model based on the single-step method does not have enough effective information for learning, and the classification map (classification map) can provide effective soft label information, so that the classification map (classification map) is used as a medium for the student network and teacher network to transfer knowledge. In addition, a large number of negative class samples exist in the output result of the classification map (classification map), which causes the problem of class imbalance. The invention provides a method for mining a difficult sample, which is used for filtering a simple negative sample, so that the category is balanced, and meanwhile, a simple positive sample is filtered, so that the knowledge distillation (knowledge distillation) efficiency is higher. During training, the loss function based on knowledge distillation (knowledge distillation) and the loss function based on the label are added in a proper proportion to form a complete loss function. In specific implementation, the method also comprises the following steps of inputting the test image into the trained student network model and outputting a detection result frame. Since the number of output test frames is very large, they need to be screened and combined. In this embodiment, first, most detection frames are screened out by the confidence threshold T being 0.05, and then the top N is selected according to the confidencea400 detection frames. Then using non-maximum value to inhibit and remove repeated detection frame, and selecting top N according to confidence coefficientbAnd (5) obtaining the final detection result by 200 detection frames.
Finally, the knowledge distillation (knowledge distillation) -based training method provided by the invention can effectively improve the detection capability of the lightweight face detection model.
A knowledge-based distillation face detection storage medium storing a computer program which, when executed, performs the steps of:
102, respectively inputting a batch of same training images to a lightweight network and a complex network;
104, filtering output results of classification graphs of the lightweight network and the complex network by adopting a difficult sample mining method;
106, constructing a comprehensive loss function, wherein the comprehensive loss function comprises a knowledge distillation loss function or a label-based face detection loss function, and the knowledge distillation loss function is obtained according to the output results of the classification images of the lightweight network and the complex network;
step 108, updating parameters of the lightweight network based on the loss function, and not updating parameters of the complex network;
and step 110, repeating the steps until the lightweight network is trained to be converged.
Specifically, the filtering performed by the difficult sample mining method specifically comprises the following steps:
setting a threshold T for judging whether a certain probability in the classification map has enough confidence; t is a hyper-parameter, the value range is 0 to 1, each index in the classification graph is traversed, and when the probability of the index in the lightweight network is greater than T and the probability in the complex network is less than T, the index is added into the set Sm(ii) a Or, when the index has a probability less than T in the lightweight network and greater than T in the complex network, the index is also added to Sm
Alternatively, the knowledge distillation loss function is:
Figure BDA0001617025390000091
wherein p is(i)Is the ith probability score, q, in the classification map of the complex network(i)It is the ith probability score of the lightweight network classification graph.
Further, the air conditioner is provided with a fan,
the label-based face detection loss function is as follows:
LG=Lcls+Lreg
wherein L isclsIs a class two Softmax loss function for classification, LregIs a robust regression loss function for regression;
the comprehensive loss function is a knowledge distillation loss function and:
L=LG+cLKD
c is the equilibrium coefficient.
Specifically, the computer program further performs steps when executed to construct a lightweight network, a complex network;
constructing a face detection model based on a convolutional neural network, taking the face detection model as a complex network, and training the complex network to be convergent;
and constructing a face detection model of a convolutional neural network with the same framework as the complex neural network to serve as a lightweight network, wherein the number of each layer of filter in the framework of the lightweight network is smaller than that of the complex network.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrases "comprising … …" or "comprising … …" does not exclude the presence of additional elements in a process, method, article, or terminal that comprises the element. Further, herein, "greater than," "less than," "more than," and the like are understood to exclude the present numbers; the terms "above", "below", "within" and the like are to be understood as including the number.
As will be appreciated by one skilled in the art, the above-described embodiments may be provided as a method, apparatus, or computer program product. These embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. All or part of the steps in the methods according to the embodiments may be implemented by a program instructing associated hardware, where the program may be stored in a storage medium readable by a computer device and used to execute all or part of the steps in the methods according to the embodiments. The computer devices, including but not limited to: personal computers, servers, general-purpose computers, special-purpose computers, network devices, embedded devices, programmable devices, intelligent mobile terminals, intelligent home devices, wearable intelligent devices, vehicle-mounted intelligent devices, and the like; the storage medium includes but is not limited to: RAM, ROM, magnetic disk, magnetic tape, optical disk, flash memory, U disk, removable hard disk, memory card, memory stick, network server storage, network cloud storage, etc.
The various embodiments described above are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a computer apparatus to produce a machine, such that the instructions, which execute via the processor of the computer apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer device to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer apparatus to cause a series of operational steps to be performed on the computer apparatus to produce a computer implemented process such that the instructions which execute on the computer apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the embodiments have been described, once the basic inventive concept is obtained, other variations and modifications of these embodiments can be made by those skilled in the art, so that the above embodiments are only examples of the present invention, and not intended to limit the scope of the present invention, and all equivalent structures or equivalent processes using the contents of the present specification and drawings, or any other related technical fields, which are directly or indirectly applied thereto, are included in the scope of the present invention.

Claims (8)

1. A face detection method is characterized by comprising the following steps:
step 100, constructing a face detection model based on a convolutional neural network as a teacher network, and training the model until convergence;
the framework of the teacher network is the same as that of the student network, but the number of the filters on each layer is several times that of the student network, the teacher network and the complex network can be replaced mutually, the student network can also be replaced with a light weight network, the light weight network is characterized in that the light weight network is a face detection model of a convolutional neural network with the same framework as the complex network, and the number of the filters on each layer in the framework of the light weight network is smaller than that of the complex network;
102, respectively inputting a batch of same training images to a lightweight network and a complex network; 104, aiming at output results of classification graphs of the light-weight network and the complex network, filtering by adopting a difficult sample mining method, so that the problems of unbalanced classification and low fitting efficiency are solved;
the output size of a typical classification map based on a single-step method is 2 NxHxW, wherein N is the number of anchor points, 2 represents the probability that each anchor point needs to predict a positive class and a negative class, H is the height of the classification map, and W is the width of the classification map, because the probabilities of the positive class and the negative class are standardized and are always added to be 1, only the probability of the positive class is concerned during knowledge distillation, so the output of the classification map is simplified to be NxHxW, in the training process, a teacher network and a student network respectively output the result of one classification map, and for the two results, it needs to be decided which indexes of the classification map should be filtered and which indexes are used for knowledge distillation;
106, constructing a comprehensive loss function, wherein the comprehensive loss function comprises a knowledge distillation loss function or a label-based face detection loss function, and the knowledge distillation loss function is obtained according to the output results of the classification images of the lightweight network and the complex network;
step 108, updating parameters of the lightweight network based on the loss function, and not updating parameters of the complex network;
step 110, repeating the above steps until the lightweight network is trained to be converged;
through the steps, the precision of the lightweight face detector can be effectively improved, the face detection can also obtain a satisfactory detection effect on equipment with limited computing resources, because the detection model and the classification and measurement learning model are different in network structure, the knowledge distillation method cannot be directly used for detection tasks, the classification map can provide effective soft label information, the classification map is used as a medium for transferring knowledge of a student network and a teacher network, a large number of negative samples exist in the output result of the classification map, and a difficult sample mining method is used for filtering simple negative samples, so that the classification is balanced, and meanwhile, simple positive samples are filtered, and the knowledge distillation efficiency is higher;
during training, adding a loss function based on knowledge distillation and a loss function based on a label in a proper proportion to form a complete loss function, inputting a test image into a trained student network model, outputting detection result frames, screening and combining the output detection frames due to the fact that the number of the output detection frames is very large, screening most detection frames through a confidence threshold value T of 0.05, selecting 400 detection frames before N _ a according to confidence, then removing repeated detection frames through non-maximum inhibition, and selecting 200 detection frames before N _ b according to confidence, so as to obtain a final detection result; finally, the knowledge distillation-based training method can effectively improve the detection capability of the lightweight face detection model.
2. The face detection method of claim 1, wherein the filtering by the hard sample mining method specifically comprises:
setting a threshold T for judging whether a certain probability in the classification map has enough confidence; t is a hyper-parameter, the value range is0 to 1, each index in the classification graph is traversed, and when the probability of the index in the lightweight network is greater than T and the probability in the complex network is less than T, the index is added into the set Sm(ii) a Or, when the index has a probability less than T in the lightweight network and greater than T in the complex network, the index is also added to Sm
3. The face detection method according to claim 2,
the knowledge distillation loss function is:
Figure FDA0003537923470000021
wherein p is(i)Is the ith probability score, q, in the classification map of the complex network(i)It is the ith probability score of the lightweight network classification graph.
4. The face detection method according to claim 2 or 3,
the label-based face detection loss function is as follows:
LG=Lcls+Lreg
wherein L isclsIs a class two Softmax loss function for classification, LregIs a robust regression loss function for regression;
the comprehensive loss function is weighted by a knowledge distillation loss function and a face detection loss function based on a label:
L=LG+cLKD
c is the equilibrium coefficient.
5. A face detection storage medium, storing a computer program that, when executed, performs the steps of:
step 100, constructing a face detection model based on a convolutional neural network as a teacher network, and training the model until convergence;
the framework of the teacher network is the same as that of the student network, but the number of the filters on each layer is several times that of the student network, the teacher network and the complex network can be replaced mutually, the student network can also be replaced with a light weight network, the light weight network is characterized in that the light weight network is a face detection model of a convolutional neural network with the same framework as the complex network, and the number of the filters on each layer in the framework of the light weight network is smaller than that of the complex network;
102, respectively inputting a batch of same training images to a lightweight network and a complex network; 104, aiming at output results of classification graphs of the light-weight network and the complex network, filtering by adopting a difficult sample mining method, so that the problems of unbalanced classification and low fitting efficiency are solved;
the output size of a typical classification map based on a single-step method is 2 NxHxW, wherein N is the number of anchor points, 2 represents the probability that each anchor point needs to predict a positive class and a negative class, H is the height of the classification map, and W is the width of the classification map, because the probabilities of the positive class and the negative class are standardized and are always added to be 1, only the probability of the positive class is concerned during knowledge distillation, so the output of the classification map is simplified to be NxHxW, in the training process, a teacher network and a student network respectively output the result of one classification map, and for the two results, it needs to be decided which indexes of the classification map should be filtered and which indexes are used for knowledge distillation;
106, constructing a comprehensive loss function, wherein the comprehensive loss function comprises a knowledge distillation loss function or a label-based face detection loss function, and the knowledge distillation loss function is obtained according to the output results of the classification images of the lightweight network and the complex network;
step 108, updating parameters of the lightweight network based on the loss function, and not updating parameters of the complex network;
step 110, repeating the above steps until the lightweight network is trained to be converged;
through the steps, the precision of the lightweight face detector can be effectively improved, the face detection can also obtain a satisfactory detection effect on equipment with limited computing resources, because the detection model and the classification and measurement learning model are different in network structure, the knowledge distillation method cannot be directly used for detection tasks, the classification map can provide effective soft label information, the classification map is used as a medium for transferring knowledge of a student network and a teacher network, a large number of negative samples exist in the output result of the classification map, and a difficult sample mining method is used for filtering simple negative samples, so that the classification is balanced, and meanwhile, simple positive samples are filtered, and the knowledge distillation efficiency is higher;
during training, adding a loss function based on knowledge distillation and a loss function based on a label in a proper proportion to form a complete loss function, inputting a test image into a trained student network model, outputting detection result frames, screening and combining the output detection frames due to the fact that the number of the output detection frames is very large, screening most detection frames through a confidence threshold value T of 0.05, selecting 400 detection frames before N _ a according to confidence, then removing repeated detection frames through non-maximum inhibition, and selecting 200 detection frames before N _ b according to confidence, so as to obtain a final detection result; finally, the knowledge distillation-based training method can effectively improve the detection capability of the lightweight face detection model.
6. The storage medium of claim 5, wherein the filtering by the hard sample mining method is specifically:
setting a threshold T for judging whether a certain probability in the classification map has enough confidence; t is a hyper-parameter, the value range is 0 to 1, each index in the classification graph is traversed, and when the probability of the index in the lightweight network is greater than T and the probability in the complex network is less than T, the index is added into the set Sm(ii) a Or, when the index has a probability less than T in the lightweight network and greater than T in the complex network, the index is also added to Sm
7. The face detection storage medium of claim 6,
the knowledge distillation loss function is:
Figure FDA0003537923470000051
wherein p is(i)Is the ith probability score, q, in the classification map of the complex network(i)It is the ith probability score of the lightweight network classification graph.
8. The face detection storage medium of claim 6 or 7,
the label-based face detection loss function is as follows:
LG=Lcls+Lreg
wherein L isclsIs a class two Softmax loss function for classification, LregIs a robust regression loss function for regression;
the comprehensive loss function is weighted by a knowledge distillation loss function and a face detection loss function based on a label:
L=LG+cLKD
c is the equilibrium coefficient.
CN201810290187.1A 2018-04-03 2018-04-03 Face detection method and storage medium Active CN108664893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810290187.1A CN108664893B (en) 2018-04-03 2018-04-03 Face detection method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810290187.1A CN108664893B (en) 2018-04-03 2018-04-03 Face detection method and storage medium

Publications (2)

Publication Number Publication Date
CN108664893A CN108664893A (en) 2018-10-16
CN108664893B true CN108664893B (en) 2022-04-29

Family

ID=63782947

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810290187.1A Active CN108664893B (en) 2018-04-03 2018-04-03 Face detection method and storage medium

Country Status (1)

Country Link
CN (1) CN108664893B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109300170B (en) * 2018-10-18 2022-10-28 云南大学 Method for transmitting shadow of portrait photo
CN109472360B (en) 2018-10-30 2020-09-04 北京地平线机器人技术研发有限公司 Neural network updating method and updating device and electronic equipment
CN111414987B (en) * 2019-01-08 2023-08-29 南京人工智能高等研究院有限公司 Training method and training device of neural network and electronic equipment
CN110059747B (en) * 2019-04-18 2021-12-14 清华大学深圳研究生院 Network traffic classification method
CN110263731B (en) * 2019-06-24 2021-03-16 电子科技大学 Single step human face detection system
CN110598603A (en) * 2019-09-02 2019-12-20 深圳力维智联技术有限公司 Face recognition model acquisition method, device, equipment and medium
CN110674714B (en) * 2019-09-13 2022-06-14 东南大学 Human face and human face key point joint detection method based on transfer learning
CN110956255B (en) * 2019-11-26 2023-04-07 中国医学科学院肿瘤医院 Difficult sample mining method and device, electronic equipment and computer readable storage medium
CN111178370B (en) * 2019-12-16 2023-10-17 深圳市华尊科技股份有限公司 Vehicle searching method and related device
CN111027551B (en) * 2019-12-17 2023-07-07 腾讯科技(深圳)有限公司 Image processing method, apparatus and medium
CN111368634B (en) * 2020-02-05 2023-06-20 中国人民解放军国防科技大学 Human head detection method, system and storage medium based on neural network
CN111639744B (en) * 2020-04-15 2023-09-22 北京迈格威科技有限公司 Training method and device for student model and electronic equipment
CN111553227A (en) * 2020-04-21 2020-08-18 东南大学 Lightweight face detection method based on task guidance
CN112232397A (en) * 2020-09-30 2021-01-15 上海眼控科技股份有限公司 Knowledge distillation method and device of image classification model and computer equipment
CN112348167B (en) * 2020-10-20 2022-10-11 华东交通大学 Knowledge distillation-based ore sorting method and computer-readable storage medium
CN112270379B (en) * 2020-11-13 2023-09-19 北京百度网讯科技有限公司 Training method of classification model, sample classification method, device and equipment
CN112633406A (en) * 2020-12-31 2021-04-09 天津大学 Knowledge distillation-based few-sample target detection method
CN113723238B (en) * 2021-08-18 2024-02-09 厦门瑞为信息技术有限公司 Face lightweight network model construction method and face recognition method
CN113657411A (en) * 2021-08-23 2021-11-16 北京达佳互联信息技术有限公司 Neural network model training method, image feature extraction method and related device
CN117542085B (en) * 2024-01-10 2024-05-03 湖南工商大学 Park scene pedestrian detection method, device and equipment based on knowledge distillation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2584381A1 (en) * 2011-10-21 2013-04-24 ENI S.p.A. Method for predicting the properties of crude oils by the application of neural networks
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
EP3276540A2 (en) * 2016-07-28 2018-01-31 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN107818314A (en) * 2017-11-22 2018-03-20 北京达佳互联信息技术有限公司 Face image processing method, device and server

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7672935B2 (en) * 2006-11-29 2010-03-02 Red Hat, Inc. Automatic index creation based on unindexed search evaluation
US9606594B2 (en) * 2012-03-19 2017-03-28 Saudi Arabian Oil Company Methods for simultaneous process and utility systems synthesis in partially and fully decentralized environments
CN107220618B (en) * 2017-05-25 2019-12-24 中国科学院自动化研究所 Face detection method and device, computer readable storage medium and equipment
CN107247989B (en) * 2017-06-15 2020-11-24 北京图森智途科技有限公司 Real-time computer vision processing method and device
CN110674880B (en) * 2019-09-27 2022-11-11 北京迈格威科技有限公司 Network training method, device, medium and electronic equipment for knowledge distillation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2584381A1 (en) * 2011-10-21 2013-04-24 ENI S.p.A. Method for predicting the properties of crude oils by the application of neural networks
EP3276540A2 (en) * 2016-07-28 2018-01-31 Samsung Electronics Co., Ltd. Neural network method and apparatus
CN107239736A (en) * 2017-04-28 2017-10-10 北京智慧眼科技股份有限公司 Method for detecting human face and detection means based on multitask concatenated convolutional neutral net
CN107818314A (en) * 2017-11-22 2018-03-20 北京达佳互联信息技术有限公司 Face image processing method, device and server

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
深度网络模型压缩综述;雷杰等;《软件学报》;20171204;第29卷(第2期);第251-266页 *

Also Published As

Publication number Publication date
CN108664893A (en) 2018-10-16

Similar Documents

Publication Publication Date Title
CN108664893B (en) Face detection method and storage medium
CN107909101B (en) Semi-supervised transfer learning character identifying method and system based on convolutional neural networks
CN110633745B (en) Image classification training method and device based on artificial intelligence and storage medium
US11557123B2 (en) Scene change method and system combining instance segmentation and cycle generative adversarial networks
WO2021238262A1 (en) Vehicle recognition method and apparatus, device, and storage medium
CN112257815A (en) Model generation method, target detection method, device, electronic device, and medium
CN109359539B (en) Attention assessment method and device, terminal equipment and computer readable storage medium
KR20200022739A (en) Method and device to recognize image and method and device to train recognition model based on data augmentation
CN111259738B (en) Face recognition model construction method, face recognition method and related device
CN111401516A (en) Neural network channel parameter searching method and related equipment
CN111639744A (en) Student model training method and device and electronic equipment
CN110458765A (en) The method for enhancing image quality of convolutional network is kept based on perception
CN106203625A (en) A kind of deep-neural-network training method based on multiple pre-training
CN109472193A (en) Method for detecting human face and device
CN107945210B (en) Target tracking method based on deep learning and environment self-adaption
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN107247952B (en) Deep supervision-based visual saliency detection method for cyclic convolution neural network
CN113642431A (en) Training method and device of target detection model, electronic equipment and storage medium
CN113221787A (en) Pedestrian multi-target tracking method based on multivariate difference fusion
CN111400452A (en) Text information classification processing method, electronic device and computer readable storage medium
CN108875587A (en) Target distribution detection method and equipment
CN112446331A (en) Knowledge distillation-based space-time double-flow segmented network behavior identification method and system
CN112748941A (en) Feedback information-based target application program updating method and device
CN115187772A (en) Training method, device and equipment of target detection network and target detection method, device and equipment
CN111242176B (en) Method and device for processing computer vision task and electronic system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 350003 21 floors, No. 1 Building, G District, 89 Software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant after: FUJIAN HAIJING TECHNOLOGY DEVELOPMENT Co.,Ltd.

Address before: 350003 1-2 floors of Building C, Building 10, Fuzhou Software Park B, 89 Software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant before: FUZHOU HAIJING SCIENCE & TECHNOLOGY DEVELOPMENT CO.,LTD.

CB02 Change of applicant information
CB03 Change of inventor or designer information

Inventor after: Huang Haiqing

Inventor after: Wang Jinqiao

Inventor after: Chen Yingying

Inventor after: Liu Zhiyong

Inventor after: Zheng Suiwu

Inventor after: Yang Xu

Inventor after: Huang Zhiming

Inventor after: Xie Dekun

Inventor after: Tian Jian

Inventor before: Huang Haiqing

Inventor before: Liu Zhiyong

Inventor before: Zheng Suiwu

Inventor before: Yang Xu

Inventor before: Huang Zhiming

Inventor before: Xie Dekun

Inventor before: Tian Jian

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant
PP01 Preservation of patent right

Effective date of registration: 20231212

Granted publication date: 20220429

PP01 Preservation of patent right