CN106485230B - Training, method for detecting human face and the system of Face datection model neural network based - Google Patents

Training, method for detecting human face and the system of Face datection model neural network based Download PDF

Info

Publication number
CN106485230B
CN106485230B CN201610906338.2A CN201610906338A CN106485230B CN 106485230 B CN106485230 B CN 106485230B CN 201610906338 A CN201610906338 A CN 201610906338A CN 106485230 B CN106485230 B CN 106485230B
Authority
CN
China
Prior art keywords
face
face frame
predicted
default
network layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610906338.2A
Other languages
Chinese (zh)
Other versions
CN106485230A (en
Inventor
邵枭虎
吕江靖
覃勋辉
周祥东
石宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Chongqing Institute of Green and Intelligent Technology of CAS
Priority to CN201610906338.2A priority Critical patent/CN106485230B/en
Publication of CN106485230A publication Critical patent/CN106485230A/en
Application granted granted Critical
Publication of CN106485230B publication Critical patent/CN106485230B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The present invention provides training, method for detecting human face and the system of a kind of Face datection model neural network based, training method: the offset information according to prediction face frame relative to default face frame, and offset information of the real human face frame relative to default face frame, calculate the loss function of prediction face frame biasing networks layer;According to the confidence level of default face frame, the loss function of prediction face frame Belief network layer is calculated;It calculates the error of two loss functions and is adjusted error feedback to the weight in neural network into neural network;Iteration training until convergence obtains Face datection model so that prediction face frame it is more accurate include face.Detection method: facial image to be measured is input in trained Face datection model and exports offset information and confidence level;Corresponding prediction face frame is calculated according to offset information;It chooses and is greater than prediction face frame corresponding to the confidence level or highest confidence level of preset confidence threshold value as Face datection result.

Description

Neural network-based face detection model training method and system and face detection method and system
Technical Field
The invention relates to the technical field of image processing, in particular to a training and face detection method and system of a face detection model based on a neural network.
Background
The main task of the face detection is to judge whether a face exists on a given face image, and if the face exists, the position and the size of the face are given. The commonly adopted process of face detection mainly comprises the following three steps: (1) selecting a rectangular area from the image as an observation window; (2) extracting characteristics of the observation window for describing the contained content; (3) and (5) carrying out classification judgment to judge whether the window contains one face. And continuously repeating the three steps until all observation windows on the face image are traversed. If the observation window is judged to contain the face, the position and the size of the window can be the position and the size of the detected face; conversely, if all windows do not contain a face, then the given face image is considered to have no face.
Currently, the commonly adopted face detectors are all based on Paul Viola and Michael Jones, and the Viola-Jones face detector is designed in 2001; firstly, constructing an integral graph by using Haar characteristics so as to realize rapid calculation of the characteristics; second, an efficient feature classifier method, e.g., the AdaBoost algorithm, is used; and finally, judging the window from coarse to fine in a cascading mode. Due to the non-rigid attribute structure of the face and the complex and changeable external environment, the face detection technology has the problems of high false detection rate, low face detection rate and the like. To solve this problem, a great deal of subsequent work is to improve the Viola-Jones face detector, for example, to select more descriptive features (LBP, HOG), improve the classifier algorithm and the cascade structure, etc., so that the face detection performance is improved.
In recent years, with the development of deep learning, some face detection methods based on a deep neural network are gradually appeared, for example, CascadeCNN, Faceness, fast RCNN, etc., and compared with the traditional face detection method, features extracted by the deep neural network have stronger robustness and description capability, so that the face detection method has higher detection rate and lower false detection.
Although the face detector has made a long-term progress at present, there are still several problems in the following aspects:
(1) at present, most face detectors adopt a sliding window mode to select an observation window, so that after one face image is traversed, a large number of observation windows need to be calculated and distinguished, and the calculated amount is large; and aiming at the human faces with different sizes in the human face image, an image pyramid needs to be constructed, or observation windows with different scales are adopted, so that the human face detection speed is low.
(2) Most face detection algorithms have many steps, each step is relatively independent, and any one step has a problem and affects the final face detection result.
(3) Although the face detection method based on deep learning has a good effect, the input face image needs to be scaled to a fixed size, so that face stretching, distortion, deformation and the like in the image are caused, and the final face detection result is influenced.
Disclosure of Invention
In view of the above-mentioned shortcomings in the prior art, an object of the present invention is to provide a method and a system for training a face detection model based on a neural network, which can implement multi-scale detection of a face.
To achieve the above and other related objects, the present invention provides a method for training a face detection model based on a neural network, the neural network comprising: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the method further comprises the following steps:
when a model training instruction is received, inputting the face images in the training set into the neural network for training;
calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; calculating the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame;
calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame;
calculating an error between a loss function of the network layer of the predicted face frame bias and a loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating a network weight parameter of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameter;
and repeating iterative training until the error between the adjusted predicted face frame and the actual face frame is within a preset error range, and outputting a face detection model.
Preferably, the calculating, by the network layer biased by the predicted face frame, bias information of the predicted face frame with respect to a corresponding default face frame includes:
calculating the bias information of the predicted face frame relative to the corresponding default face frame according to the following formula:
tx=(x- xa)/wa,ty=(y- ya)/ha
tw=log(w/wa),th=log(h/ha)
wherein, (x, y, w, h) is the coordinate of the central point of the predicted face frame, the width and the height; (x)a,ya,wa,ha) The coordinates, width and height of the center point of the default face frame are used; (t)x,ty,tw,th) For said predicted face frame relative to a corresponding default face frameBias information;
the calculating of the bias information of the real face frame relative to the corresponding default face frame includes:
according to each default face frame and the corresponding real face frame, calculating the bias information of the real face frame relative to the corresponding default face frame according to the following formula:
wherein (x)a,ya,wa,ha) The coordinates, width and height of the center point of the default face frame are used; (x)*,y*,w*,h*) Coordinates, width and height of a central point of a real face frame;and the offset information of the real face frame relative to the corresponding default face frame is obtained.
Preferably, the calculating a loss function of a network layer of the predicted face frame bias according to the bias information of the predicted face frame relative to the corresponding default face frame and the bias information of the real face frame relative to the corresponding default face frame includes:
selecting the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a sampling default face frame; the relative area is the area of the intersection area of the default face frame and the real face frame divided by the area of the union area of the default face frame and the real face frame;
calculating the Loss function Loss of the network layer of the predicted face frame offset according to the following formula according to the offset information of the predicted face frame relative to the corresponding sampling default face frame and the offset information of the real face frame relative to the corresponding sampling default face frame1
Wherein N isregDefault number of face boxes for sampling, LregCorresponding to the k (k is 1 and N)regPositive integer in between) regression loss function for spatial bias of default face frames, T ═ Tx,ty,tw,thZ is an element belonging to the set x, y, w, h for predicting the offset information of a face frame with respect to the corresponding sample default face frame,smooth is the offset information of the real face frame relative to the corresponding sampling default face frameL1Represents the smoothing L1 loss function, which is a variation of the L1 norm loss function, L represents smoothL1The input variables of the function.
The calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence that the default face frame contains the face comprises:
taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a positive sample, and taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is smaller than or equal to the preset first relative area threshold value as a negative sample;
selecting all positive samples and part of negative samples according to a preset positive-negative sample proportion;
calculating Loss function Loss of the network layer for predicting the confidence coefficient of the face frame according to the following formula2
Lcls(p,p*)=-[p*logp+(1-p*)log(1-p)
Wherein N isclsFor the total number of positive and negative samples selected,corresponding to the ith (i is 1 and N)clsPositive integer therebetween) classification loss function of the classes, p is the confidence that the selected positive or negative sample contains a face, p*For the true probability that the selected positive or negative sample contains a face, p of the positive sample*Is 1, p of negative example*Is 0.
Preferably, the calculating an error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence includes:
calculating the gradient of the loss function of the network layer of the predicted face frame bias and the gradient of the loss function of the network layer of the predicted face frame confidence by adopting a random gradient descent method;
and taking the obtained gradient value as the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the confidence coefficient of the predicted face frame.
Another object of the present invention is to provide a face detection method based on a neural network, wherein the neural network comprises: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the method comprises the following steps:
when a face detection instruction is received, inputting a face image to be detected into a trained face detection model for face detection;
outputting the offset information of the predicted face frame relative to the corresponding default face frame by a network layer for predicting the offset of the face frame in the trained face detection model aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by a network layer for predicting the confidence coefficient of the face frame in the trained face detection model;
calculating corresponding predicted face frames according to each default face frame and the bias information of the predicted face frames relative to each default face frame;
and selecting the predicted face frame corresponding to the confidence coefficient greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient from the predicted face frames as the final face detection result.
Preferably, the calculating the corresponding predicted face frame according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame includes:
according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame, calculating the corresponding predicted face frame according to the following formula:
x=tx*wa+xa,y=ty*ha+ya
wherein (x)a,ya,wa,ha) Coordinates, width and height of a center point of each default face frame; (x, y, w, h) coordinates, width and height of a center point of the predicted face frame corresponding to each default face frame; (t)x,ty,tw,th) Bias information for the corresponding predicted face frame relative to each default face frame.
Preferably, after the predicting the offset information of the face frame relative to the corresponding default face frame is output by the network layer for predicting the offset of the face frame in the trained face detection model for the face image to be detected, and the confidence that each default face frame contains the face is output by the network layer for predicting the confidence of the face frame in the trained face detection model, the method further includes:
filtering out a default face frame with the confidence coefficient smaller than or equal to a preset confidence coefficient threshold value;
calculating corresponding predicted face frames according to the rest default face frames and the bias information of the corresponding predicted face frames relative to the rest default face frames;
and taking the predicted face frames corresponding to the rest default face frames as final face detection results.
Preferably, after the corresponding predicted face frame is calculated according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame, the method further includes:
calculating the relative area of every two predicted face frames;
if the relative area of every two predicted face frames is larger than a preset second relative area threshold value, taking the two predicted face frames as sampling predicted face frames; wherein the relative area is the area of the intersection region of the two predicted face frames divided by the area of the union region of the two predicted face frames;
and selecting the sampling prediction face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the sampling prediction face frames as a final face detection result, or selecting the sampling prediction face frame corresponding to the highest confidence coefficient from the sampling prediction face frames as the final face detection result.
Another object of the present invention is to provide a training system for a face detection model based on a neural network, the neural network comprising: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the system comprises: the device comprises an input module, a first calculation module, a second calculation module, a third calculation module, a feedback and update module and an iteration output module; wherein,
the input module is used for inputting the face images in the training set into the neural network for training when receiving a model training instruction;
the first calculation module is used for calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; calculating the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame;
the second calculation module is used for calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame;
the third calculation module is used for calculating the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence;
the feedback and updating module is used for feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters;
and the iteration output module is used for repeating iteration training until the error between the adjusted predicted face frame and the adjusted real face frame is within a preset error range, and outputting a face detection model.
Another object of the present invention is to provide a face detection system based on a neural network, the neural network comprising: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the system comprises: the device comprises an input module, an output module, a calculation module and a selection module; wherein,
the input module is used for inputting a face image to be detected into a trained face detection model for face detection when receiving a face detection instruction;
the output module is used for outputting the offset information of the predicted face frame relative to the corresponding default face frame through a network layer for predicting the offset of the face frame in the trained face detection model aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame in the trained face detection model;
the calculation module is used for calculating corresponding predicted face frames according to each default face frame and the bias information of the predicted face frames relative to each default face frame;
and the selection module is used for selecting the predicted face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient from the predicted face frames as the final face detection result.
As described above, the training of the neural network-based face detection model, the face detection method and the system of the present invention have the following advantages over the prior art:
(1) in the embodiment of the invention, the observation window is selected without adopting a sliding window mode, an image construction pyramid is not required to be constructed, or a multi-scale observation window is not required to be used, and a large number of observation windows are calculated and judged, but the network layers for face detection are selected according to the receptive fields with different sizes in the original face image corresponding to different network layers in the neural network, wherein the higher the network layer number is, the larger the receptive field corresponding to the original face image is, the lower the network layer number is, the smaller the receptive field corresponding to the original face image is, the face detection is directly carried out through the network layers for face detection, the calculated amount is smaller compared with the prior art, the lower network layers can be selected for detecting the small-size face, the higher network layers are used for detecting the large-size face, the multi-scale detection and the bias regression of the face are, compared with the prior art, the embodiment of the invention has more accurate face detection and higher face detection speed.
(2) In the embodiment of the invention, the model training directly adopts an end-to-end mode, and the position and the size of the corresponding predicted face frame are directly output by inputting the face images in the training set, so that the method is simpler, more convenient and quicker compared with the conventional multi-step method; in addition, an end-to-end training mode is adopted, and the network weight parameters of the neural network are directly fed back and adjusted according to the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence coefficient, so that the predicted face frame is closer to the real face frame, and the predicted face frame more accurately contains the face; therefore, the method has higher detection rate compared with the method based on a plurality of independent sub-steps in the prior art.
(3) In the embodiment of the invention, the input face image is not required to be scaled, but the face image in the training set is directly input into the neural network to train the face detection model, and the face image to be detected is directly input into the trained face detection model to carry out face detection, so that the influence of factors such as face image stretching, distortion, deformation and the like on the face detection result can be avoided.
Drawings
FIG. 1 is a flowchart illustrating an implementation of a training method for a neural network-based face detection model according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating an implementation of a neural network-based face detection method according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a structure of a training system of a neural network-based face detection model according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram illustrating a composition of a face detection system based on a neural network according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention.
Please refer to the attached drawings. It should be noted that the drawings provided in the embodiments of the present invention are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
In an embodiment of the present invention, the neural network includes: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; and the network layer of each layer of face detection is connected with a network layer of face frame bias prediction and a network layer of face frame confidence coefficient prediction.
The VGG-16 is a popular full convolution neural network structure at present, has good abstract feature expression capability, and has good effect in object classification and target identification, such as ImageNet competition, face identification and other fields, so the VGG-16 is selected as a basic network structure of the full convolution neural network. The network parameter configuration information of the VGG-16 full convolution neural network structure is shown in table 1:
TABLE 1
Because the receptive fields with different sizes are beneficial to detecting the face images with different scales, the network layer can be selected as the network layer for face detection according to the receptive fields of the face images in the training set corresponding to different network layers in the neural network; here, the sense fields of the face images in the training set corresponding to the three network layers conv3_3, conv4_3, conv5_3 in the neural network can be selected for multi-scale face detection according to the sense fields of the face images in the training set corresponding to different network layers in the neural network, and the sense fields of the face images in the training set corresponding to the three network layers conv3_3, conv4_3, conv5_3 in the neural network are shown in table 2:
TABLE 2
Wherein, a network layer for predicting human face frame bias and a network layer for predicting human face frame confidence can be additionally added behind each layer of human face detection network layer; namely: each layer of the network layer for face detection is simultaneously connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence, and the network layer for predicting face frame bias and the network layer for predicting face frame confidence are parallel.
Wherein, six default face frames are respectively bound for each cell (feature map cell) of the three network layers conv3_3, conv4_3 and conv5_3, the size of the six default face frames bound by each cell in each layer can be set according to the size of each layer of the three network layers conv3_3, conv4_3 and conv5_3, and the size of each layer and the suggested size of the default face frame are shown in table 3:
TABLE 3
The invention is described in further detail below with reference to the figures and the embodiments.
The embodiment of the invention provides a training method of a face detection model based on a neural network, which comprises the following steps of:
step S100: and when a model training instruction is received, inputting the face images in the training set into the neural network for training.
In this step, a face detection model trained on an ImageNet image library (1000 categories, 120 training images in total) by the neural network VGG-16 can be used as an initialized face detection model; then, the two image library of CelebFaces image library (202599 images, each image contains a face) and AFLW image library (21080 images, containing 24,386 faces) are used for training the face detection model, and the training image data can be expanded, for example, zoomed, blurred noise, contrast change and the like, so as to enrich the training samples.
Step S101: calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; and calculating the confidence degree that each default face frame contains the face through the network layer of the confidence degree of the predicted face frame.
In this step, each default face frame corresponds to one predicted face frame and one real face frame, that is: the default face frame, the predicted face frame and the real face frame are in one-to-one correspondence.
In this step, the offset information (t) of the output predicted face frame with respect to the corresponding default face frame is calculated according to the following formulax,ty,tw,th):
tx=(xxa)/wa,ty=(yya)/ha
tw=log(w/wa),th=log(h/ha)
Wherein, (x, y, w, h) is the coordinate of the central point of the predicted face frame, the width and the height; (x)a,ya,wa,ha) The coordinates, width and height of the center point of the default face frame are used; (t)x,ty,tw,th) Bias information for the predicted face frame relative to a corresponding default face frame;
the calculating of the bias information of the real face frame relative to the corresponding default face frame includes:
calculating the bias information of the real face frame relative to the corresponding default face frame according to the following formula:
wherein (x)a,ya,wa,ha) The coordinates, width and height of the center point of the default face frame are used; (x)*,y*,w*,h*) Coordinates, width and height of a central point of a real face frame;and the offset information of the real face frame relative to the corresponding default face frame is obtained.
Step S102: calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; and calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame.
In this step, only the bias information corresponding to the default face frame whose relative area IOU of the default face frame to the real face frame is greater than the preset first relative area threshold may be selected for the regression calculation, and the loss function of the predicted face frame regression is calculated as follows:
firstly, selecting a default face frame corresponding to the default face frame when the relative area IOU of the default face frame relative to a corresponding real face frame is larger than a preset first relative area threshold value as a sampling default face frame; the relative area is the area of the intersection area of the default face frame and the real face frame divided by the area of the union area of the default face frame and the real face frame;
calculating a loss function of a network layer of the bias of the predicted face frame according to the bias information of the predicted face frame relative to the corresponding sampling default face frame and the bias information of the real face frame relative to the corresponding sampling default face frame and the following formula:
wherein N isregDefault number of face boxes for sampling, LregCorresponds to the firstk (k is 1 and N)regPositive integer in between) regression loss function for spatial bias of default face frames, T ═ Tx,ty,tw,thZ is an element belonging to the set x, y, w, h for predicting the offset information of a face frame with respect to the corresponding sample default face frame,smooth is the offset information of the real face frame relative to the corresponding sampling default face frameL1Represents the smoothing L1 loss function, which is a variation of the L1 norm loss function, L represents smoothL1An input variable of the function;
then, taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a positive sample, and taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is smaller than or equal to the preset first relative area threshold value as a negative sample;
selecting all positive samples and part of negative samples according to a preset positive-negative sample proportion;
calculating a loss function of the network layer for predicting the confidence of the face frame according to the following formula:
Lcls(p,p*)=-[p*logp+(1-p*)log(1-p)
wherein N isclsFor the total number of positive and negative samples selected,corresponding to the ith (i is 1 and N)clsPositive integer therebetween) classification loss function of the classes, p is the confidence that the selected positive or negative sample contains a face, p*For the true probability that the selected positive or negative sample contains a face, p of the positive sample*Is 1, p of negative example*Is 0.
In the embodiment of the present invention, the preset first relative area threshold may be set according to an actual requirement, where the preset first relative area threshold is not specifically limited, and preferably, the preset first relative area threshold is 0.5. For the calculation of the confidence of face classification, because the relative area IOU of most of default face frames and corresponding real face frames in face detection is less than 0.5, the number of negative samples is far higher than that of positive samples, so in order to balance positive and negative samples and avoid face model misunderstanding or missing detection problems caused by unbalance of the positive and negative samples, the proportion of the positive and negative samples can be used as 1: and 3, calculating the confidence coefficient of the face classification, wherein the part with higher confidence coefficient is selected from the negative sample to be used for training the face detection model.
Finally, the two loss functions can be fused to obtain the following total loss function:
Loss=Loss1+λLoss2
where λ is used to equalize both loss functions, and is set to 1 by default.
It should be noted that: lambda can be set according to actual requirements.
Step S103: and calculating the error of the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters.
In this step, a random gradient descent method may be adopted to calculate the gradient of the loss function of the network layer of the predicted face frame offset and the loss function of the network layer of the predicted face frame confidence; and taking the obtained gradient value as the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the confidence coefficient of the predicted face frame.
In this step, the formula for updating the network weight parameter is as follows:
wherein, Wi lIs the weight value of the parameter of the l-th layer after i iterations,the updated weight for the (i + 1) th iteration,the mean value is 0 and the variance is 1, and a is the learning rate, m is the momentum, and lambda is the weight attenuation coefficient.
In this step, a random gradient descent method is used to update the gradient feedback, and parameters used when learning network weight parameters need to be preset include: initializing learning rate, momentum and weight attenuation; here, the initial learning rate is set to 0.001, the initial momentum is set to 0.9, and the initial weight attenuation is set to 0.0005.
Step S104: judging whether the error of the adjusted predicted face frame and the actual face frame is within a preset error range or not; if the error of the adjusted predicted face frame and the real face frame is within the preset error range, outputting a face detection model, and ending the processing; and if the error between the adjusted predicted face frame and the real face frame is not within the preset error range, the step S101 is carried out continuously according to the adjusted predicted face frame.
In this step, the preset error range may be set according to actual requirements, and the adjusted predicted face frame iteration converges on the true face frame as a principle, that is: and updating the weight parameters of the neural network so that the predicted face frame converges to the real face frame, wherein the predicted face frame can converge after 360000 iterations.
The embodiment of the invention provides a face detection method based on a neural network, and as shown in figure 2, the method comprises the following steps:
step S200: and when a face detection instruction is received, inputting a face image to be detected into the trained face detection model for face detection.
In this step, the trained face detection model is obtained by repeating iterative training in steps S100 to S104 until the error between the adjusted predicted face frame and the actual face frame is within the preset error range.
Step S201: and outputting the offset information of the predicted face frame relative to the corresponding default face frame by a network layer for predicting the offset of the face frame in the trained face detection model aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by a network layer for predicting the confidence coefficient of the face frame in the trained face detection model.
In this step, the default face frame contains the confidence of the face, that is: the probability that the area contained by the default face box belongs to the background or the face is set.
Step S202: and calculating the corresponding predicted face frame according to each default face frame and the bias information of the predicted face frame relative to each default face frame.
Specifically, according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame, the corresponding predicted face frame is calculated according to the following formula:
x=tx*wa+xa,y=ty*ha+ya
wherein (x)a,ya,wa,ha) Coordinates, width and height of a center point of each default face frame; (x, y, w, h) coordinates, width and height of a center point of the predicted face frame corresponding to each default face frame; (t)x,ty,tw,th) Bias information for the corresponding predicted face frame relative to each default face frame.
Step S203: and selecting the predicted face frame corresponding to the confidence coefficient greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient as the final face detection result.
In this step, since the default face frame and the predicted face frame are in a one-to-one correspondence relationship, the predicted face frame corresponding to the confidence coefficient or the highest confidence coefficient greater than the preset confidence coefficient threshold is the default face frame or the predicted face frame corresponding to the default face frame with the highest confidence coefficient greater than the preset confidence coefficient threshold.
In this step, the confidence threshold may be preset according to actual requirements, and the confidence threshold is not specifically limited herein.
In a preferred embodiment of the present invention, the outputting, by a network layer for predicting a face frame bias in a trained face detection model, bias information of a predicted face frame relative to a corresponding default face frame for a face image to be detected, and outputting, by the network layer for predicting a face frame confidence in the trained face detection model, a confidence that each default face frame contains a face, further includes:
firstly, filtering out a default face frame with the confidence coefficient smaller than or equal to a preset confidence coefficient threshold;
then, calculating corresponding predicted face frames according to the rest default face frames and the bias information of the corresponding predicted face frames relative to the rest default face frames;
and finally, taking the predicted face frames corresponding to the rest default face frames as final face detection results.
In a preferred embodiment of the present invention, before calculating the corresponding predicted face frame, the default face frame with the confidence level less than or equal to the preset confidence level threshold is filtered, and then the corresponding predicted face frame is calculated according to the remaining default face frames and the offset information of the corresponding predicted face frame relative to the remaining default face frames, so that the calculation amount of calculating the predicted face frame can be reduced, thereby increasing the calculation speed and reducing the calculation time.
In another preferred embodiment of the present invention, since one face may correspond to multiple overlapped predicted face frames, and there is a large amount of redundancy, a non-maximum suppression method may be used to remove redundant repeated face frames, specifically, a predicted face frame with a relative area IOU of every two predicted face frames greater than a preset second relative area threshold is removed, and only a predicted face frame with a higher relative area IOU of every two predicted face frames is retained, so that after calculating the corresponding predicted face frame according to the bias information of each default face frame and the corresponding predicted face frame with respect to each default face frame, the method further includes:
calculating the relative area of every two predicted face frames;
if the relative area of every two predicted face frames is larger than a preset second relative area threshold value, taking the two predicted face frames as sampling predicted face frames; wherein the relative area is the area of the intersection region of the two predicted face frames divided by the area of the union region of the two predicted face frames;
and selecting the sampling prediction face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the sampling prediction face frames as a final face detection result, or selecting the sampling prediction face frame corresponding to the highest confidence coefficient from the sampling prediction face frames as the final face detection result.
In another preferred embodiment of the present invention, the relative area of each two predicted face frames is calculated for the predicted face frames, and if the relative area of each two predicted face frames is greater than a preset second relative area threshold, the two predicted face frames are considered to contain the same face, therefore, if the relative area of each two predicted face frames is greater than the preset second relative area threshold, the two predicted face frames are used as sampling predicted face frames, and in the sampling predicted face frames, the sampling predicted face frame corresponding to the confidence greater than the preset confidence threshold is selected as a final face detection result, or the predicted face frame corresponding to the highest confidence is selected as a final face detection result, so that the final face detection result better contains a face.
The second relative area threshold may be preset according to an actual requirement, where the second relative area threshold is not specifically limited, and is preferably preset to be 0.3.
In order to implement the method, an embodiment of the present invention provides a training system for a face detection model based on a neural network, and because the principle of solving the problem of the system is similar to that of the method, the implementation process and the implementation principle of the system can be described by referring to the implementation process and the implementation principle of the method, and repeated details are not repeated.
The embodiment of the invention provides a training system of a face detection model based on a neural network, wherein the neural network comprises: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; as shown in fig. 3, the system includes: an input module 300, a first calculation module 301, a second calculation module 302, a third calculation module 303, a feedback and update module 304, and an iteration output module 305; wherein,
the input module 300 is configured to, when receiving a model training instruction, input the face images in the training set into the neural network for training;
the first calculating module 301 is configured to calculate, through a network layer of the predicted face frame offset, offset information of the predicted face frame relative to a corresponding default face frame, and calculate offset information of a real face frame relative to a corresponding default face frame; calculating the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame;
the second calculating module 302 is configured to calculate a loss function of a network layer biased by the predicted face frame according to the bias information of the predicted face frame relative to the corresponding default face frame and the bias information of the real face frame relative to the corresponding default face frame; calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame;
the third calculating module 303 is configured to calculate an error between a loss function of the network layer of the predicted face frame bias and a loss function of the network layer of the predicted face frame confidence;
the feedback and update module 304 is configured to feed back the error to the neural network through back propagation, update a network weight parameter of the neural network according to the error, and adjust the predicted face frame according to the updated network weight parameter;
the iterative output module 305 is configured to repeat iterative training until the error between the adjusted predicted face frame and the actual face frame is within a preset error range, and output a face detection model.
In a specific implementation, the first calculating module 301 is specifically configured to:
calculating the bias information of the predicted face frame relative to the corresponding default face frame according to the following formula:
tx=(xxa)/wa,ty=(yya)/ha
tw=log(w/wa),th=log(h/ha)
wherein, (x, y, w, h) is the coordinate of the central point of the predicted face frame, the width and the height; (x)a,ya,wa,ha) The coordinates, width and height of the center point of the default face frame are used; (t)x,ty,tw,th) Bias information for the predicted face frame relative to a corresponding default face frame;
calculating the bias information of the real face frame relative to the corresponding default face frame according to the following formula:
wherein (x)a,ya,wa,ha) The coordinates, width and height of the center point of the default face frame are used; (x)*,y*,w*,h*) Coordinates, width and height of a central point of a real face frame;and the offset information of the real face frame relative to the corresponding default face frame is obtained.
In a specific implementation, the second calculating module 302 is specifically configured to:
selecting the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a sampling default face frame; the relative area is the area of the intersection area of the default face frame and the real face frame divided by the area of the union area of the default face frame and the real face frame;
calculating a loss function of a network layer of the bias of the predicted face frame according to the bias information of the predicted face frame relative to the corresponding sampling default face frame and the bias information of the real face frame relative to the corresponding sampling default face frame and the following formula:
wherein N isregFor samplingNumber of face boxes, LregCorresponding to the k (k is 1 and N)regPositive integer in between) regression loss function for spatial bias of default face frames, T ═ Tx,ty,tw,thZ is an element belonging to the set x, y, w, h for predicting the offset information of a face frame with respect to the corresponding sample default face frame,smooth is the offset information of the real face frame relative to the corresponding sampling default face frameL1Represents the smoothing L1 loss function, which is a variation of the L1 norm loss function, L represents smoothL1An input variable of the function;
the calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence that the default face frame contains the face comprises:
taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a positive sample, and taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is smaller than or equal to the preset first relative area threshold value as a negative sample;
selecting all positive samples and part of negative samples according to a preset positive-negative sample proportion;
calculating a loss function of the network layer for predicting the confidence of the face frame according to the following formula:
Lcls(p,p*)=-[p*logp+(1-p*)log(1-p)
wherein N isclsFor the total number of positive and negative samples selected,corresponding to the ith (i is 1 and N)clsPositive integer of between) classification loss functions of classes, p being selected positive or negative sampleConfidence that a sample contains a face, p*For the true probability that the selected positive or negative sample contains a face, p of the positive sample*Is 1, p of negative example*Is 0.
In a specific implementation, the third calculating module 303 is specifically configured to:
calculating the gradient of the loss function of the network layer of the predicted face frame bias and the gradient of the loss function of the network layer of the predicted face frame confidence by adopting a random gradient descent method;
and taking the obtained gradient value as the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the confidence coefficient of the predicted face frame.
The above division manner of the functional modules is only one preferred implementation manner given in the embodiment of the present invention, and the division manner of the functional modules does not limit the present invention. For convenience of description, the parts of the system described above are separately described as functionally divided into various modules or units. The system can be a distributed system or a centralized system, if the system is the distributed system, the functional modules can be respectively realized by hardware equipment, and the hardware equipment is interacted with each other through a communication network; in case of a centralized system, the functional modules may be integrated into one hardware device.
In practical applications, when the input module 300, the first calculation module 301, the second calculation module 302, the third calculation module 303, the feedback and update module 304, and the iteration output module 305 are integrated into a hardware device, the input module 300, the first calculation module 301, the second calculation module 302, the third calculation module 303, the feedback and update module 304, and the iteration output module 305 may be implemented by a Central Processing Unit (CPU), a Microprocessor (MPU), a Digital Signal Processor (DSP), or a Field Programmable Gate Array (FPGA) located in the hardware device.
In order to implement the method, the embodiment of the present invention further provides a face detection system based on a neural network, and because the principle of solving the problem of the system is similar to that of the method, the implementation process and the implementation principle of the system can be described by referring to the implementation process and the implementation principle of the method, and repeated details are not repeated.
The embodiment of the invention provides a face detection system based on a neural network, wherein the neural network comprises: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; as shown in fig. 4, the system includes: an input module 400, an output module 401, a calculation module 402 and a selection module 403; wherein,
the input module 400 is configured to, when receiving a face detection instruction, input a face image to be detected into a trained face detection model for face detection;
the output module 401 is configured to output, for a face image to be detected, offset information of a predicted face frame relative to a corresponding default face frame and a confidence that each default face frame contains a face through a network layer that predicts face frame offset in a trained face detection model;
the calculating module 402 is configured to calculate a corresponding predicted face frame according to each default face frame and the offset information of the predicted face frame relative to each default face frame;
the selecting module 403 is configured to select, as a final face detection result, a predicted face frame corresponding to a confidence level greater than a preset confidence level threshold from the predicted face frames, or select, as a final face detection result, a predicted face frame corresponding to a highest confidence level from the predicted face frames.
In a specific implementation, the calculating module 402 is specifically configured to:
according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame, calculating the corresponding predicted face frame according to the following formula:
x=tx*wa+xa,y=ty*ha+ya
wherein (x)a,ya,wa,ha) Coordinates, width and height of a center point of each default face frame; (x, y, w, h) coordinates, width and height of a center point of the predicted face frame corresponding to each default face frame; (t)x,ty,tw,th) Bias information for the corresponding predicted face frame relative to each default face frame.
Further, the system further comprises:
a filtering module 404, configured to filter out a default face frame with the confidence level smaller than or equal to a preset confidence level threshold;
the calculating module 402 is further configured to calculate a corresponding predicted face frame according to each of the remaining default face frames and offset information of the corresponding predicted face frame with respect to each of the remaining default face frames;
the selecting module 403 is further configured to use the predicted face frames corresponding to the remaining default face frames as final face detection results.
Further, the system further comprises:
a judging module 405, configured to calculate relative areas of every two predicted face frames, and when the relative area of every two predicted face frames is greater than a preset second relative area threshold, use the two predicted face frames as sampling predicted face frames; wherein the relative area is the area of the intersection region of the two predicted face frames divided by the area of the union region of the two predicted face frames;
the selecting module 403 is further configured to select, as a final face detection result, the sampling predicted face frame corresponding to the confidence coefficient greater than a preset confidence coefficient threshold from the sampling predicted face frames, or select, as a final face detection result, the sampling predicted face frame corresponding to the highest confidence coefficient from the sampling predicted face frames.
The above division manner of the functional modules is only one preferred implementation manner given in the embodiment of the present invention, and the division manner of the functional modules does not limit the present invention. For convenience of description, the parts of the system described above are separately described as functionally divided into various modules or units. The system can be a distributed system or a centralized system, if the system is the distributed system, the functional modules can be respectively realized by hardware equipment, and the hardware equipment is interacted with each other through a communication network; in case of a centralized system, the functional modules may be integrated into one hardware device.
In practical applications, when the input module 400, the output module 401, the calculation module 402, the selection module 403, the filter module 404, and the determination module 405 are integrated into a hardware device, the input module 400, the output module 401, the calculation module 402, the selection module 403, the filter module 404, and the determination module 405 may be implemented by a Central Processing Unit (CPU), a microprocessor unit (MPU), a Digital Signal Processor (DSP), or a Field Programmable Gate Array (FPGA) in the hardware device.
In order to more clearly explain the embodiments of the present invention, the following describes in detail the training process and the detection process of the face detection model based on the neural network with specific embodiments.
Example one
Step S1: and when a model training instruction is received, inputting the face images in the training set into the neural network for training.
Step S2: calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; and calculating the confidence degree that each default face frame contains the face through the network layer of the confidence degree of the predicted face frame.
Step S3: calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; and calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame.
Step S4: and calculating the error of the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters.
Step S5: judging whether the error of the adjusted predicted face frame and the actual face frame is within a preset error range or not; if the error of the adjusted predicted face frame and the real face frame is within a preset error range, outputting a face detection model; if the error between the adjusted predicted face frame and the real face frame is not within the preset error range, the process proceeds to step S1 to continue execution according to the adjusted predicted face frame.
Step S6: and when a face detection instruction is received, inputting a face image to be detected into the trained face detection model for face detection.
Step S7: and outputting the offset information of the predicted face frame relative to the corresponding default face frame by a network layer for predicting the offset of the face frame in the trained face detection model aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by a network layer for predicting the confidence coefficient of the face frame in the trained face detection model.
Step S8: and calculating the corresponding predicted face frame according to each default face frame and the bias information of the predicted face frame relative to each default face frame.
Step S9: and selecting the predicted face frame corresponding to the confidence coefficient greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient from the predicted face frames as the final face detection result.
Example two
Step S1: and when a model training instruction is received, inputting the face images in the training set into the neural network for training.
Step S2: calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; and calculating the confidence degree that each default face frame contains the face through the network layer of the confidence degree of the predicted face frame.
Step S3: calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; and calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame.
Step S4: and calculating the error of the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters.
Step S5: judging whether the error of the adjusted predicted face frame and the actual face frame is within a preset error range or not; if the error of the adjusted predicted face frame and the real face frame is within a preset error range, outputting a face detection model; if the error between the adjusted predicted face frame and the real face frame is not within the preset error range, the process proceeds to step S1 to continue execution according to the adjusted predicted face frame.
Step S6: and when a face detection instruction is received, inputting a face image to be detected into the trained face detection model for face detection.
Step S7: and outputting the offset information of the predicted face frame relative to the corresponding default face frame by a network layer for predicting the offset of the face frame in the trained face detection model aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by a network layer for predicting the confidence coefficient of the face frame in the trained face detection model.
Step S8: and filtering out the default face frame with the confidence coefficient smaller than or equal to a preset confidence coefficient threshold value.
Step S9: and calculating the corresponding predicted face frame according to the rest default face frames and the bias information of the corresponding predicted face frame relative to the rest default face frames.
Step S10: and taking the predicted face frames corresponding to the rest default face frames as final face detection results.
EXAMPLE III
Step S1: and when a model training instruction is received, inputting the face images in the training set into the neural network for training.
Step S2: calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; and calculating the confidence degree that each default face frame contains the face through the network layer of the confidence degree of the predicted face frame.
Step S3: calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; and calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame.
Step S4: and calculating the error of the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters.
Step S5: judging whether the error of the adjusted predicted face frame and the actual face frame is within a preset error range or not; if the error of the adjusted predicted face frame and the real face frame is within a preset error range, outputting a face detection model; if the error between the adjusted predicted face frame and the real face frame is not within the preset error range, the process proceeds to step S1 to continue execution according to the adjusted predicted face frame.
Step S6: and when a face detection instruction is received, inputting a face image to be detected into the trained face detection model for face detection.
Step S7: and outputting the offset information of the predicted face frame relative to the corresponding default face frame by a network layer for predicting the offset of the face frame in the trained face detection model aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by a network layer for predicting the confidence coefficient of the face frame in the trained face detection model.
Step S8: and calculating the corresponding predicted face frame according to each default face frame and the bias information of the predicted face frame relative to each default face frame.
Step S9: calculating the relative area of every two predicted face frames; and if the relative area of every two predicted face frames is larger than a preset second relative area threshold value, taking the two predicted face frames as sampling predicted face frames.
Step S10: and selecting the sampling prediction face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the sampling prediction face frames as a final face detection result, or selecting the prediction face frame corresponding to the highest confidence coefficient from the sampling prediction face frames as the final face detection result.
Example four
Step S1: and when a model training instruction is received, inputting the face images in the training set into the neural network for training.
Step S2: calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; and calculating the confidence degree that each default face frame contains the face through the network layer of the confidence degree of the predicted face frame.
Step S3: calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; and calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame.
Step S4: and calculating the error of the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters.
Step S5: judging whether the error of the adjusted predicted face frame and the actual face frame is within a preset error range or not; if the error of the adjusted predicted face frame and the real face frame is within a preset error range, outputting a face detection model; if the error between the adjusted predicted face frame and the real face frame is not within the preset error range, the process proceeds to step S1 to continue execution according to the adjusted predicted face frame.
Step S6: and when a face detection instruction is received, inputting a face image to be detected into the trained face detection model for face detection.
Step S7: and outputting the offset information of the predicted face frame relative to the corresponding default face frame by a network layer for predicting the offset of the face frame in the trained face detection model aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by a network layer for predicting the confidence coefficient of the face frame in the trained face detection model.
Step S8: and filtering out the default face frame with the confidence coefficient smaller than or equal to a preset confidence coefficient threshold value.
Step S9: and calculating the corresponding predicted face frame according to the rest default face frames and the bias information of the corresponding predicted face frame relative to the rest default face frames.
Step S10: calculating the relative area of every two predicted face frames; if the relative area of every two predicted face frames is larger than a preset second relative area threshold value, taking the two predicted face frames as sampling predicted face frames;
step S11: and taking the sampling prediction face frame as a final face detection result, or selecting the sampling prediction face frame corresponding to the highest confidence coefficient from the sampling prediction face frames as the final face detection result.
In summary, the training method and system for the face detection model based on the neural network, the face detection method and system of the present invention have the following beneficial effects compared with the prior art:
(1) in the embodiment of the invention, the observation window is selected without adopting a sliding window mode, an image construction pyramid is not required to be constructed, or a multi-scale observation window is not required to be used, and a large number of observation windows are calculated and judged, but the network layers for face detection are selected according to the receptive fields with different sizes in the original face image corresponding to different network layers in the neural network, wherein the higher the network layer number is, the larger the receptive field corresponding to the original face image is, the lower the network layer number is, the smaller the receptive field corresponding to the original face image is, the face detection is directly carried out through the network layers for face detection, the calculated amount is smaller compared with the prior art, the lower network layers can be selected for detecting the small-size face, the higher network layers are used for detecting the large-size face, the multi-scale detection and the bias regression of the face are, compared with the prior art, the embodiment of the invention has more accurate face detection and higher face detection speed.
(2) In the embodiment of the invention, the model training directly adopts an end-to-end mode, and the position and the size of the corresponding predicted face frame are directly output by inputting the face images in the training set, so that the method is simpler, more convenient and quicker compared with the conventional multi-step method; in addition, an end-to-end training mode is adopted, and the network weight parameters of the neural network are directly fed back and adjusted according to the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence coefficient, so that the predicted face frame is closer to the real face frame, and the predicted face frame more accurately contains the face; therefore, the method has higher detection rate compared with the method based on a plurality of independent sub-steps in the prior art.
(3) In the embodiment of the invention, the input face image is not required to be scaled, but the face image in the training set is directly input into the neural network to train the face detection model, and the face image to be detected is directly input into the trained face detection model to carry out face detection, so that the influence of factors such as face image stretching, distortion, deformation and the like on the face detection result can be avoided.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1. A training method of a face detection model based on a neural network is characterized in that the neural network comprises the following steps: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the method comprises the following steps:
when a model training instruction is received, inputting the face images in the training set into the neural network for training;
calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; calculating the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame;
calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame;
calculating the error of the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence, feeding the error back to the neural network through back propagation, updating the network weight parameter of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameter;
and repeating iterative training until the error between the adjusted predicted face frame and the actual face frame is within a preset error range, and outputting a face detection model.
2. The training method according to claim 1, wherein the calculating, by the network layer of the predicted face frame bias, bias information of the predicted face frame with respect to a corresponding default face frame includes:
calculating the bias information of the predicted face frame relative to the corresponding default face frame according to the following formula:
tx=(x-xa)/wa,ty=(y-ya)/ha
tw=log(w/wa),th=log(h/ha)
wherein, (x, y, w, h) is the coordinate of the central point of the predicted face frame, the width and the height; (x)a,ya,wa,ha) Is a default face frameThe center point coordinates, width and height of; (t)x,ty,tw,th) Bias information for the predicted face frame relative to a corresponding default face frame;
the calculating of the bias information of the real face frame relative to the corresponding default face frame includes:
calculating the bias information of the real face frame relative to the corresponding default face frame according to the following formula:
wherein (x)a,ya,wa,ha) The coordinates, width and height of the center point of the default face frame are used; (x)*,y*,w*,h*) Coordinates, width and height of a central point of a real face frame;and the offset information of the real face frame relative to the corresponding default face frame is obtained.
3. The training method according to claim 2, wherein the calculating a loss function of a network layer of the predicted face frame bias according to the bias information of the predicted face frame relative to the corresponding default face frame and the bias information of the real face frame relative to the corresponding default face frame comprises:
selecting the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a sampling default face frame; the relative area is the area of the intersection area of the default face frame and the real face frame divided by the area of the union area of the default face frame and the real face frame;
based on the predicted face frameCalculating the offset information relative to the corresponding sampling default face frame and the offset information of the real face frame relative to the corresponding sampling default face frame according to the following formula to predict the Loss function Loss of the network layer of the face frame offset1
Wherein N isregDefault number of face boxes for sampling, LregRegression loss function of spatial bias corresponding to k-th default face box, k being 1 and NregPositive integer between, T ═ Tx,ty,tw,thIs the bias information of the predicted face frame with respect to the corresponding sampled default face frame, z is an element belonging to the set { x, y, w, h }, T ═ T }* x,t* y,t* w,t* hThe's is the offset information of the real face frame relative to the corresponding sampling default face frame, smoothL1Represents the smoothing L1 loss function, which is a variation of the L1 norm loss function, L represents smoothL1An input variable of the function;
the calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence that the default face frame contains the face comprises:
taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is larger than a preset first relative area threshold value as a positive sample, and taking the default face frame corresponding to the situation that the relative area of the default face frame relative to the corresponding real face frame is smaller than or equal to the preset first relative area threshold value as a negative sample;
selecting all positive samples and part of negative samples according to a preset positive-negative sample proportion;
calculating Loss function Loss of the network layer for predicting the confidence coefficient of the face frame according to the following formula2
Lcls(p,p*)=-[p*logp+(1-p*)log(1-p)]
Wherein N isclsFor the total number of positive and negative samples selected,classification loss function corresponding to ith class, i being 1 and NclsP is the confidence that the selected positive sample or negative sample contains the face, p*For the true probability that the selected positive or negative sample contains a face, p of the positive sample*Is 1, p of negative example*Is 0.
4. A training method as claimed in any one of claims 1 to 3, wherein said calculating an error between the loss function of the network layer for the predicted face frame bias and the loss function of the network layer for the predicted face frame confidence comprises:
calculating the gradient of the loss function of the network layer of the predicted face frame bias and the gradient of the loss function of the network layer of the predicted face frame confidence by adopting a random gradient descent method;
and taking the obtained gradient value as the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the confidence coefficient of the predicted face frame.
5. A face detection method based on a neural network is characterized in that the neural network comprises the following steps: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the method comprises the following steps:
when a face detection instruction is received, inputting a face image to be detected into the face detection model according to any one of claims 1 to 4 for face detection;
for the face image to be detected, outputting the offset information of the predicted face frame relative to the corresponding default face frame through the network layer for predicting the offset of the face frame in the face detection model, and outputting the confidence coefficient of each default face frame including the face through the network layer for predicting the confidence coefficient of the face frame in the face detection model;
calculating corresponding predicted face frames according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame;
and selecting the predicted face frame corresponding to the confidence coefficient greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient from the predicted face frames as the final face detection result.
6. The detection method according to claim 5, wherein the calculating the corresponding predicted face frame according to each default face frame and the offset information of the corresponding predicted face frame relative to each default face frame comprises:
according to each default face frame and the bias information of the corresponding predicted face frame relative to each default face frame, calculating the corresponding predicted face frame according to the following formula:
x=tx*wa+xa,y=ty*ha+ya
wherein (x)a,ya,wa,ha) Coordinates, width and height of a center point of each default face frame; (x, y, w, h) coordinates, width and height of a center point of the predicted face frame corresponding to each default face frame; (t)x,ty,tw,th) Bias information for the corresponding predicted face frame relative to each default face frame.
7. The detection method according to claim 5 or 6, wherein after outputting, for the face image to be detected, the offset information of the predicted face frame relative to the corresponding default face frame through the network layer for predicting the offset of the face frame in the trained face detection model, and outputting the confidence that each default face frame contains a face through the network layer for predicting the confidence of the face frame in the trained face detection model, the method further comprises:
filtering out a default face frame with the confidence coefficient smaller than or equal to a preset confidence coefficient threshold value;
calculating corresponding predicted face frames according to the rest default face frames and the bias information of the corresponding predicted face frames relative to the rest default face frames;
and taking the predicted face frames corresponding to the rest default face frames as final face detection results.
8. The method of claim 7, wherein after computing the corresponding predicted face frame according to each default face frame and the offset information of the corresponding predicted face frame relative to each default face frame, the method further comprises:
calculating the relative area of every two predicted face frames;
if the relative area of every two predicted face frames is larger than a preset second relative area threshold value, taking the two predicted face frames as sampling predicted face frames; wherein the relative area is the area of the intersection region of the two predicted face frames divided by the area of the union region of the two predicted face frames;
and selecting the sampling prediction face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the sampling prediction face frames as a final face detection result, or selecting the sampling prediction face frame corresponding to the highest confidence coefficient from the sampling prediction face frames as the final face detection result.
9. A training system for a face detection model based on a neural network, the neural network comprising: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the system comprises: the device comprises an input module, a first calculation module, a second calculation module, a third calculation module, a feedback and update module and an iteration output module; wherein,
the input module is used for inputting the face images in the training set into the neural network for training when receiving a model training instruction;
the first calculation module is used for calculating the offset information of the predicted face frame relative to the corresponding default face frame and calculating the offset information of the real face frame relative to the corresponding default face frame through the network layer of the offset of the predicted face frame; calculating the confidence coefficient of each default face frame including the face through a network layer for predicting the confidence coefficient of the face frame;
the second calculation module is used for calculating a loss function of a network layer of the offset of the predicted face frame according to the offset information of the predicted face frame relative to the corresponding default face frame and the offset information of the real face frame relative to the corresponding default face frame; calculating a loss function of a network layer for predicting the confidence of the face frame according to the confidence of the face contained in the default face frame;
the third calculation module is used for calculating the error between the loss function of the network layer of the predicted face frame bias and the loss function of the network layer of the predicted face frame confidence;
the feedback and updating module is used for feeding the error back to the neural network through back propagation, updating the network weight parameters of the neural network according to the error and adjusting the predicted face frame according to the updated network weight parameters;
and the iteration output module is used for repeating iteration training until the error between the adjusted predicted face frame and the adjusted real face frame is within a preset error range, and outputting a face detection model.
10. A face detection system based on a neural network, the neural network comprising: a network layer for face detection, a network layer for predicting face frame bias and a network layer for predicting face frame confidence; the network layer of the face detection is selected according to the receptive fields of face images in training sets corresponding to different network layers in the neural network, each cell element in the network layer of the face detection is bound with six default face frames, and the default face frames are set according to the scale of the network layer of the corresponding face detection; each layer of network layer for face detection is connected with a layer of network layer for predicting face frame bias and a layer of network layer for predicting face frame confidence; the system comprises: the device comprises an input module, an output module, a calculation module and a selection module; wherein,
the input module is used for inputting a face image to be detected into a trained face detection model for face detection when receiving a face detection instruction;
the output module is used for outputting the offset information of the predicted face frame relative to the corresponding default face frame by the network layer for predicting the offset of the face frame by the face detection model according to any one of claims 1 to 4 aiming at the face image to be detected, and outputting the confidence coefficient of each default face frame including the face by the network layer for predicting the confidence coefficient of the face frame by the face detection model;
the calculation module is used for calculating corresponding predicted face frames according to each default face frame and the bias information of the predicted face frames relative to each default face frame;
and the selection module is used for selecting the predicted face frame corresponding to the confidence coefficient which is greater than a preset confidence coefficient threshold value from the predicted face frames as a final face detection result, or selecting the predicted face frame corresponding to the highest confidence coefficient from the predicted face frames as the final face detection result.
CN201610906338.2A 2016-10-18 2016-10-18 Training, method for detecting human face and the system of Face datection model neural network based Active CN106485230B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610906338.2A CN106485230B (en) 2016-10-18 2016-10-18 Training, method for detecting human face and the system of Face datection model neural network based

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610906338.2A CN106485230B (en) 2016-10-18 2016-10-18 Training, method for detecting human face and the system of Face datection model neural network based

Publications (2)

Publication Number Publication Date
CN106485230A CN106485230A (en) 2017-03-08
CN106485230B true CN106485230B (en) 2019-10-25

Family

ID=58270094

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610906338.2A Active CN106485230B (en) 2016-10-18 2016-10-18 Training, method for detecting human face and the system of Face datection model neural network based

Country Status (1)

Country Link
CN (1) CN106485230B (en)

Families Citing this family (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108230292B (en) * 2017-04-11 2021-04-02 北京市商汤科技开发有限公司 Object detection method, neural network training method, device and electronic equipment
CN107229929A (en) * 2017-04-12 2017-10-03 西安电子科技大学 A kind of license plate locating method based on R CNN
CN106991408A (en) * 2017-04-14 2017-07-28 电子科技大学 The generation method and method for detecting human face of a kind of candidate frame generation network
CN107220618B (en) * 2017-05-25 2019-12-24 中国科学院自动化研究所 Face detection method and device, computer readable storage medium and equipment
CN110490177A (en) * 2017-06-02 2019-11-22 腾讯科技(深圳)有限公司 A kind of human-face detector training method and device
CN107247944B (en) * 2017-06-28 2020-11-10 智慧眼科技股份有限公司 Face detection speed optimization method and device based on deep learning
CN107403141B (en) * 2017-07-05 2020-01-10 中国科学院自动化研究所 Face detection method and device, computer readable storage medium and equipment
CN107464261B (en) * 2017-07-07 2020-10-23 广州市百果园网络科技有限公司 Image data calibration training method and device, storage medium and server thereof
CN107358223B (en) * 2017-08-16 2021-06-22 上海荷福人工智能科技(集团)有限公司 Face detection and face alignment method based on yolo
CN107784270A (en) * 2017-09-08 2018-03-09 四川云图睿视科技有限公司 A kind of method for detecting human face and system based on convolutional neural networks
CN107679460B (en) * 2017-09-11 2020-08-11 Oppo广东移动通信有限公司 Face self-learning method, intelligent terminal and storage medium
CN107665336A (en) * 2017-09-20 2018-02-06 厦门理工学院 Multi-target detection method based on Faster RCNN in intelligent refrigerator
CN108875488B (en) * 2017-09-29 2021-08-06 北京旷视科技有限公司 Object tracking method, object tracking apparatus, and computer-readable storage medium
CN109697441B (en) * 2017-10-23 2021-02-12 杭州海康威视数字技术股份有限公司 Target detection method and device and computer equipment
CN108875504B (en) * 2017-11-10 2021-07-23 北京旷视科技有限公司 Image detection method and image detection device based on neural network
CN108229308A (en) * 2017-11-23 2018-06-29 北京市商汤科技开发有限公司 Recongnition of objects method, apparatus, storage medium and electronic equipment
CN108182394B (en) * 2017-12-22 2021-02-02 浙江大华技术股份有限公司 Convolutional neural network training method, face recognition method and face recognition device
CN108427939B (en) * 2018-03-30 2022-09-23 百度在线网络技术(北京)有限公司 Model generation method and device
CN108510084B (en) * 2018-04-04 2022-08-23 百度在线网络技术(北京)有限公司 Method and apparatus for generating information
CN108197618B (en) * 2018-04-08 2021-10-22 百度在线网络技术(北京)有限公司 Method and device for generating human face detection model
CN108596082A (en) * 2018-04-20 2018-09-28 重庆邮电大学 Human face in-vivo detection method based on image diffusion velocity model and color character
CN108960148A (en) * 2018-07-05 2018-12-07 济南东朔微电子有限公司 A kind of single three segment encode recognition methods in the express delivery face based on video image
CN109131843B (en) * 2018-08-22 2022-04-26 王桥生 Long-term visual tracking active separation type undercarriage
CN109101947B (en) * 2018-08-27 2021-03-26 Oppo广东移动通信有限公司 Portrait identification method, portrait identification device and terminal equipment
CN109241968B (en) * 2018-09-25 2022-04-19 广东工业大学 Image content inclination angle prediction network training method and correction method and system
CN109271970A (en) * 2018-10-30 2019-01-25 北京旷视科技有限公司 Face datection model training method and device
CN109447156B (en) * 2018-10-30 2022-05-17 北京字节跳动网络技术有限公司 Method and apparatus for generating a model
CN109493296A (en) * 2018-10-31 2019-03-19 泰康保险集团股份有限公司 Image enchancing method, device, electronic equipment and computer-readable medium
CN109508678B (en) 2018-11-16 2021-03-30 广州市百果园信息技术有限公司 Training method of face detection model, and detection method and device of face key points
CN109934184A (en) * 2019-03-19 2019-06-25 网易(杭州)网络有限公司 Gesture identification method and device, storage medium, processor
US11783221B2 (en) 2019-05-31 2023-10-10 International Business Machines Corporation Data exposure for transparency in artificial intelligence
WO2021016932A1 (en) * 2019-07-31 2021-02-04 深圳市大疆创新科技有限公司 Data processing method and apparatus, and computer-readable storage medium
CN110717403B (en) * 2019-09-16 2023-10-24 国网江西省电力有限公司电力科学研究院 Face multi-target tracking method
CN110610575B (en) * 2019-09-20 2021-09-07 北京百度网讯科技有限公司 Coin identification method and device and cash register
CN110991305B (en) * 2019-11-27 2023-04-07 厦门大学 Airplane detection method under remote sensing image and storage medium
CN111144220B (en) * 2019-11-29 2023-03-24 福建省星云大数据应用服务有限公司 Personnel detection method, device, equipment and medium suitable for big data
CN111189201A (en) * 2020-01-15 2020-05-22 西安建筑科技大学 Air conditioner prediction control method based on machine vision
CN113642592B (en) * 2020-04-27 2024-07-05 武汉Tcl集团工业研究院有限公司 Training method of training model, scene recognition method and computer equipment
CN112115789A (en) * 2020-08-18 2020-12-22 北京嘀嘀无限科技发展有限公司 Face detection model determining method and device and electronic equipment
CN112084992B (en) * 2020-09-18 2021-04-13 北京中电兴发科技有限公司 Face frame selection method in face key point detection module
CN112232215B (en) * 2020-10-16 2021-04-06 哈尔滨市科佳通用机电股份有限公司 Railway wagon coupler yoke key joist falling fault detection method
CN112712068B (en) * 2021-03-19 2021-07-06 腾讯科技(深圳)有限公司 Key point detection method and device, electronic equipment and storage medium
CN114644276B (en) * 2022-04-11 2022-12-02 伊萨电梯有限公司 Intelligent elevator control method under mixed scene condition
CN114801632A (en) * 2022-06-14 2022-07-29 中国第一汽车股份有限公司 Suspension height adjusting method, device, equipment and storage medium
WO2024007189A1 (en) * 2022-07-06 2024-01-11 Nokia Shanghai Bell Co., Ltd. Scalable and quick waveform learning in multi-user communication system

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks;Shaoqing Ren et al.;《arXiv:1506.01497v3》;20160106;第1-14页 *
Hierarchical Convolutional Neural Network for Face Detection;Dong Wang et al.;《Springer International Publishing Switzerland 2015》;20151231;第373-384页 *
Jun-Cheng Chen et al..An End-to-End System for Unconstrained Face Verification with Deep Convolutional Neural Networks.《015 IEEE International Conference on Computer Vision Workshop》.2015,第360-368页. *
Landmark perturbation-based data augmentation for unconstrained face recognition;Jiang-Jing Lv et al.;《Signal Processing:Image Communication》;20160401;第465-475页 *
PARSENET :LOOKINF WIDER TO SEE BETTER;Wei Liu et al.;《arXiv:1506.04579v2》;20151119;第1-11页 *

Also Published As

Publication number Publication date
CN106485230A (en) 2017-03-08

Similar Documents

Publication Publication Date Title
CN106485230B (en) Training, method for detecting human face and the system of Face datection model neural network based
CN112733749B (en) Real-time pedestrian detection method integrating attention mechanism
JP6855098B2 (en) Face detection training methods, equipment and electronics
CN108229442B (en) Method for rapidly and stably detecting human face in image sequence based on MS-KCF
CN108596053B (en) Vehicle detection method and system based on SSD and vehicle posture classification
CN111401516B (en) Searching method for neural network channel parameters and related equipment
CN112101430B (en) Anchor frame generation method for image target detection processing and lightweight target detection method
CN112633350B (en) Multi-scale point cloud classification implementation method based on graph convolution
WO2018107979A1 (en) Multi-pose human face feature point detection method based on cascade regression
CN111104898A (en) Image scene classification method and device based on target semantics and attention mechanism
CN110889446A (en) Face image recognition model training and face image recognition method and device
AU2020104423A4 (en) Multi-View Three-Dimensional Model Retrieval Method Based on Non-Local Graph Convolutional Network
CN112639828A (en) Data processing method, method and equipment for training neural network model
CN108427924A (en) A kind of text recurrence detection method based on rotational sensitive feature
WO2006047253A1 (en) Object recognizer and detector for two-dimensional images using bayesian network based classifier
CN106295533A (en) Optimization method, device and the camera terminal of a kind of image of autodyning
TWI731542B (en) Classification model building apparatus and classification model building method thereof
CN109961107A (en) Training method, device, electronic equipment and the storage medium of target detection model
CN110084253A (en) A method of generating object detection model
CN114898470A (en) Fall behavior detection method and system based on improved YOLOv5
CN110069959A (en) A kind of method for detecting human face, device and user equipment
CN111310821A (en) Multi-view feature fusion method, system, computer device and storage medium
TWI812888B (en) Image recognition method and image recognition system
Raparthi et al. Machine Learning Based Deep Cloud Model to Enhance Robustness and Noise Interference
CN110070106A (en) Smog detection method, device and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant