CN113869271A - Face detection method and device and electronic equipment - Google Patents

Face detection method and device and electronic equipment Download PDF

Info

Publication number
CN113869271A
CN113869271A CN202111191551.7A CN202111191551A CN113869271A CN 113869271 A CN113869271 A CN 113869271A CN 202111191551 A CN202111191551 A CN 202111191551A CN 113869271 A CN113869271 A CN 113869271A
Authority
CN
China
Prior art keywords
vector
classification
color image
depth image
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111191551.7A
Other languages
Chinese (zh)
Inventor
封洁轩
李骊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Huajie Imi Technology Co ltd
Beijing HJIMI Technology Co Ltd
Original Assignee
Nanjing Huajie Imi Technology Co ltd
Beijing HJIMI Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Huajie Imi Technology Co ltd, Beijing HJIMI Technology Co Ltd filed Critical Nanjing Huajie Imi Technology Co ltd
Priority to CN202111191551.7A priority Critical patent/CN113869271A/en
Publication of CN113869271A publication Critical patent/CN113869271A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face detection method, a face detection device and electronic equipment, wherein the features of a color image to be detected and a depth image to be detected are respectively extracted through a face detection model, the features of the color image and the features of the depth image are respectively analyzed to obtain a classification vector and a position vector of the color image and a classification vector and a position vector of the depth image, the classification vectors of the color image and the classification vector of the depth image are fused, the position vectors of the color image and the depth image are fused, the final classification vector and the position vector are output, a candidate face frame is determined through the relation between the classification vector and a threshold value, and a target face frame representing a detection result is determined based on the candidate face frame. Therefore, in the embodiment, the characteristics of the color image and the depth image are fully utilized to carry out the face detection, and the respective detection results are fused, so that the data processing efficiency is improved, and the accuracy of the face detection is improved.

Description

Face detection method and device and electronic equipment
Technical Field
The present invention relates to the field of image processing, and in particular, to a method and an apparatus for detecting a human face, and an electronic device.
Background
The human face detection is an important step in the human face recognition process, and aims to find the position coordinates of the smallest human face enclosure frame in an input image or image group and transmit the position coordinates to the next process, so that a foundation is laid for the subsequent detection and recognition work.
In order to improve the accuracy of face detection, some methods have been proposed to jointly use color images and depth images for face detection, but usually, intermediate results obtained when a convolutional neural network processes color images and depth images are fused, but the number of channels of the intermediate results is large, so that many parameters need to be fused, the processing efficiency of data is reduced, and the channel meanings of the features of the color images and the features of the depth images are different, and the two methods are directly fused, so that the purpose of improving the detection accuracy is difficult to achieve.
Disclosure of Invention
In view of this, the embodiment of the invention discloses a face detection method, a face detection device and an electronic device, which not only improve the processing efficiency, but also improve the accuracy of face detection.
The invention discloses a face detection method, which comprises the following steps:
inputting a color image, a depth image and a preset anchor point frame to be detected into a pre-trained face detection model;
the face detection model is used for respectively extracting the features of the color image and the depth image, outputting a first classification vector and a first position vector based on the features of a preset anchor point frame and the color image, outputting a second classification vector and a second position vector based on the features of the preset anchor point frame and the depth image, fusing the first classification vector and the second classification vector, and fusing the first position vector and the second position vector to obtain a third classification vector and a third position vector; the third classification vector and the third position vector have a one-to-one corresponding relation, and the anchor point frame has a one-to-one corresponding relation with the characteristics of the color image and the characteristics of the depth image respectively;
comparing a preset threshold value with the third classification vector, screening out the third classification vector larger than the preset threshold value, and determining a candidate face frame based on the mapping relation between the third classification vector and the third position vector
And determining a target face frame representing the detection result based on the candidate face frame.
Optionally, the color image and the depth image to be detected are obtained by preprocessing an original color image and an original depth image, and the preprocessing process includes:
respectively carrying out normalization processing on any channel of the color image;
and carrying out normalization processing on the depth image.
Optionally, the training process of the face detection model includes:
obtaining a training sample; the training sample comprises a pre-registered color image and a depth image;
inputting the training sample into a human face detection model to be trained;
respectively extracting features from the color image and the depth image through a face detection model, analyzing the features extracted from the color image to obtain a third classification vector and a third position vector of the color image, analyzing the features extracted from the depth image to obtain a fourth classification vector and a fourth position vector of the depth image, fusing the third classification vector and the fourth classification vector by adopting preset classification parameters, fusing the third position vector and the fourth position vector by adopting preset regression parameters, and updating the classification parameters and the regression parameters respectively through preset classification losses and regression losses.
Optionally, extracting features from the color image and the depth image respectively includes:
extracting features of different scales from the color image, and fusing the features of different scales;
extracting features of different scales from the depth image, and fusing the features of different scales;
the scale is consistent with the step length of the preset anchor point frame.
Optionally, the generating process of the preset anchor frame includes:
determining the step length of the anchor point frame, the scaling of each layer of anchor point frame and the length-width ratio of the anchor point frame;
traversing each anchor point step length along the transverse direction and the longitudinal direction of the reference image, and determining the center coordinate of each anchor point frame;
traversing each scaling corresponding to each anchor point step length along the transverse direction and the longitudinal direction of the reference image;
traversing the length-width ratio of each anchor point frame corresponding to each scaling along the transverse direction and the longitudinal direction of the reference image;
the reference image is the same in scale as the color image to be detected and the depth image to be detected.
The embodiment of the invention discloses a face detection device, which comprises:
the input unit is used for inputting the color image to be detected, the depth image and the preset anchor point frame into a pre-trained face detection model;
the face detection model is used for respectively extracting the features of the color image and the depth image, outputting a first classification vector and a first position vector based on the features of a preset anchor point frame and the color image, outputting a second classification vector and a second position vector based on the features of the preset anchor point frame and the depth image, fusing the first classification vector and the second classification vector, and fusing the first position vector and the second position vector to obtain a third classification vector and a third position vector; the third classification vector and the third position vector have a one-to-one corresponding relation, and the anchor point frame has a one-to-one corresponding relation with the characteristics of the color image and the characteristics of the depth image respectively;
the first determining unit is used for comparing a preset threshold value with the third classification vector, screening out the third classification vector larger than the preset threshold value, and determining a candidate face frame based on the mapping relation between the third classification vector and the third position vector;
and the second determining unit is used for determining a target face frame representing the detection result based on the candidate face frame.
Optionally, the method further includes:
the first normalization processing subunit is used for respectively carrying out normalization processing on any one channel of the color image;
and the second normalization processing subunit is used for performing normalization processing on the depth image.
Optionally, the method further includes:
a face detection model training unit for:
obtaining a training sample; the training sample comprises a pre-registered color image and a depth image;
inputting the training sample into a human face detection model to be trained;
the face detection model is used for respectively extracting features from the color image and the depth image, analyzing the features extracted from the color image to obtain a fourth classification vector and a fourth position vector of the color image, analyzing the features extracted from the depth image to obtain a fifth classification vector and a fifth position vector of the depth image, fusing the fourth classification vector and the fifth classification vector by adopting preset classification parameters, fusing the fourth position vector and the fifth position vector by adopting preset regression parameters, and updating the classification parameters and the regression parameters respectively through preset classification loss and regression loss.
Optionally, the face detection model training unit further includes:
the first feature extraction unit is used for extracting features of different scales from the color image and fusing the features of different scales corresponding to the color image;
the second feature extraction unit is used for extracting features of different scales from the depth image and fusing the features of different scales corresponding to the depth image;
the scale is consistent with the step length of the preset anchor point frame.
The embodiment of the invention discloses an electronic device, which comprises:
a memory and a processor;
the memory is used for storing programs, and the processor is used for executing the human face detection method when processing the programs in the memory.
The embodiment of the invention discloses a face detection method, a face detection device and electronic equipment, wherein the method comprises the following steps: the method comprises the steps of extracting features of a color image to be detected and a depth image to be detected respectively through a face detection model, analyzing the features of the color image and the features of the depth image respectively to obtain a classification vector and a position vector of the color image and a classification vector and a position vector of the depth image, fusing the classification vectors of the color image and the depth image, fusing the position vectors of the color image and the depth image to output a final classification vector and a final position vector, determining a candidate face frame through the relation between the classification vector and a threshold value, and determining a target face frame representing a detection result based on the candidate face frame. Therefore, in the embodiment, the characteristics of the color image and the depth image are fully utilized to carry out the face detection, and the respective detection results are fused, so that the data processing efficiency is improved, and the accuracy of the face detection is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flow chart of a face detection method disclosed in the embodiment of the invention;
FIG. 2 is a flow chart illustrating a training process of a face detection model according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an electronic device according to an embodiment of the disclosure;
fig. 5 shows a schematic structural diagram of an electronic device disclosed in an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, a schematic flow chart of a method for detecting a face according to an embodiment of the present invention is shown, where in the embodiment, the method includes:
s101: inputting a color image, a depth image and a preset anchor point frame to be detected into a pre-trained face detection model;
in this embodiment, the color image and the depth image to be detected may be obtained by preprocessing, and preferably, the preprocessing includes:
respectively carrying out normalization processing on each channel of the original color image;
and carrying out normalization processing on the original depth image.
In this embodiment, in order to avoid the unbalanced distribution of the pixel values of the image, the pixel values of each channel of the color image are limited to [ -1,1] in advance, and for example, normalization processing may be performed on each channel of the color image by the following formula 1):
1)
Figure BDA0003301400010000051
wherein R isnormDenotes the normalized value of the R channel, GnormDenotes the normalized value of the G channel, BnormDenotes the normalized value of the B channel (μ)R,μG,μB) And (σ)R,σG,σB) Respectively representing the mean and variance of the training data set at R, G, B for the three channel distributions.
In this embodiment, when the depth image is normalized, the normalization process of the depth camera is related to the parameters of the depth camera.
The process of preprocessing the original color image and the original depth image may further include: the sizes of the original color image and the original depth image are normalized, so that the sizes of the color image to be detected and the depth image to be detected are the same, the sizes of the color image to be detected and the depth image to be detected are the same as those of a training sample, and the training sample is used for training a face detection model.
In this embodiment, through the above preprocessing process, the convergence rate and the detection effect of the face detection model are further improved.
The face detection model is used for respectively extracting the features of a color image to be detected and a depth image to be detected, outputting a first classification vector and a first position vector based on the features of a preset anchor point frame and the color image to be detected, outputting a second classification vector and a second position vector based on the features of the preset anchor point frame and the depth image to be detected, fusing the first classification vector and the second classification vector, and fusing the first position vector and the second position vector to obtain a third classification vector and a third position vector; the third classification vector and the third position vector have a one-to-one corresponding relation, and the anchor point frame has a one-to-one corresponding relation with the characteristics of the color image and the characteristics of the depth image respectively;
the face detection model is obtained by training a color image and a depth image which are registered in advance, the trained face detection model can carry out face detection through the color image and the depth image respectively to obtain a detection result of the color image: a classification vector and a location vector; obtaining a detection result of the depth image: and the classification vector and the position vector are fused, the classification vector of the color image and the classification vector of the depth image are fused, and the position vector of the color image and the position vector of the depth image are fused, so that the final classification vector and the final position vector are obtained.
The training process of the face detection model will be described in detail below, and is not described in detail in this embodiment.
In this embodiment, the elements generated by the anchor frame include: step size, scaling and length-width ratio of the anchor frame, wherein each step size corresponds to a plurality of scaling, each scaling corresponds to a plurality of length-width ratios, and specifically, the generation process of the anchor frame includes:
determining the step length of an anchor point, the scaling of an anchor point frame corresponding to each step length, and the length-width ratio example of the anchor point frame corresponding to the scaling of each anchor point frame;
traversing each anchor point step length along the transverse direction and the longitudinal direction of the reference image, and determining the center coordinate of each anchor point frame;
traversing each scaling corresponding to each anchor point step length along the transverse direction and the longitudinal direction of the reference image;
and traversing the length-width ratio of each anchor point frame corresponding to each scaling along the transverse direction and the longitudinal direction of the reference image.
For example, the following steps are carried out: firstly, defining anchor frame step length types strides, anchor frame scaling scales of each layer and anchor frame length-width ratios, wherein the strides comprises five types of 8, 16, 32, 64 and 128, represents the step lengths of the center distances of the five anchor frames, and has the unit of pixel, and is used for determining the coordinate position of the center of the anchor frame; scales contain both 0.6 and 0.848, representing the box size category located at the center of each anchor box; the ratios represent the kinds of the aspect ratios of the anchor frames of each size of each central point, and since the detection target is only a human face, the ratio is only one, and is 1: 1. Then, a triple cycle was performed: the first iteration traverses each step in stripes in both the transverse and longitudinal directions on the graph, finding a total of 40 × 40+20 +10 + 5+3 × 3 — 2134 anchor point box centers (320/8 — 40, 320/16 — 20 … …, and so on); the second iteration traverses each scale and the third iteration traverses each ratio, forming two different sized square anchor boxes with aspect ratios of 1:1 at the center of each anchor box.
It should be noted that: the position vectors and the classification vectors corresponding to the output color images are in one-to-one correspondence, and can be expressed as a classification result of a certain position area of a certain color image; the output position vector and the classification vector of the depth image are in one-to-one correspondence, and can be represented as a classification result on a certain position area of the depth image.
In this embodiment, the manner of fusing the first classification vector of the color image and the second classification vector of the depth image may include multiple manners, for example, directly fusing the first classification vector and the second classification vector, or fusing the first classification vector and the second classification vector by using a weighted fusion manner.
In this embodiment, the manner of fusing the first position vector of the color image and the second position vector of the depth image may also include multiple manners, for example, directly fusing the first position vector and the second position vector, or fusing the first position vector and the second position vector by using a weighted fusion manner.
S102: comparing a preset threshold value with the third classification vector, screening out a third classification vector larger than the preset threshold value, and determining a candidate face frame based on the mapping relation between the third classification vector and the third position vector;
in this embodiment, the threshold may be determined empirically, may also be obtained through calculation, or may also be determined according to requirements, and is not limited in this embodiment.
Since the position vector includes a plurality of position vectors, the classification vectors may be compared with the threshold one by one, and if the value of the ith number in the classification vector is greater than the threshold, the ith group of coordinates in the position vector is retained, after the classification vector is traversed once, M positions are greater than the threshold in total, M groups of coordinates of corresponding positions in the position vector are retained, and the retained M groups of coordinates are the filtered result and are represented as a candidate face frame.
Each group of coordinates in the obtained candidate face frame represents a face frame, the number of the face frames is 4, the coordinates in two directions of the center point of the face frame, the width of the face frame and the height of the face frame are respectively represented, but the coordinates are not directly represented, the final face frame coordinates can be obtained only through a group of conversion formulas, and the position coordinates of the candidate face frame on the image can be obtained through conversion calculation according to the following formulas:
2)
Figure BDA0003301400010000081
wherein, each parameter in the formula is respectively expressed as:
w denotes the width of the final face frame, h denotes the height of the final face frame, xctrCenter point abscissa, y, representing the final face boxctrRepresents the longitudinal coordinate of the center point of the final face frame, rgrt [ i ]]Represents the ith set of position vectors, Anchor [ i ]]Representing the ith anchor box.
It is to be appreciated that the candidate face boxes represent the locations of regions in the image that contain faces of persons that are detected.
S103, determining a target face frame representing the detection result based on the candidate face frame;
in this embodiment, the candidate face frames may include a plurality of candidate face frames, and in order to obtain a more accurate target face frame, a final target face frame needs to be determined based on the candidate face frames, where the screening method includes a plurality of methods, for example, one candidate face frame with a best effect may be screened from the plurality of candidate face frames to be used as the target face frame, or a Non-maximum suppression (NMS) algorithm (Non-maximum suppression algorithm) may be used to merge the candidate face frames to obtain the final target face frame.
In this embodiment, the dimension of the obtained classification vector and the position vector may be high, and in order to eliminate useless information in the vector, the dimension of the vector may be reduced to obtain useful information.
In this embodiment, the features of the color image to be detected and the depth image to be detected are respectively extracted through the face detection model, and the features of the color image and the features of the depth image are respectively analyzed to obtain a classification vector and a position vector of the color image and a classification vector and a position vector of the depth image, the classification vectors of the color image and the depth image are fused, the position vectors of the color image and the depth image are fused, so that a final classification vector and a final position vector are output, a candidate face frame is determined through the relationship between the classification vector and a threshold, and then a target face frame representing a detection result is determined based on the candidate face frame. Therefore, in the embodiment, the characteristics of the color image and the depth image are fully utilized to carry out the face detection, and the respective detection results are fused, so that the data processing efficiency is improved, and the accuracy of the face detection is improved.
Referring to fig. 2, a flow chart of a training process of a face detection model according to an embodiment of the present invention is shown, in this embodiment, the method includes:
s201, acquiring a training sample; the training sample comprises a pre-registered color image and a depth image;
in this embodiment, the images included in the training sample are color images and depth images that are registered in advance, and in order to further improve the convergence rate and the detection effect of the face detection model, the preprocessing is performed on the color images and the depth images in the training sample, including, for example:
carrying out normalization processing on each channel of the color image;
and carrying out normalization processing on each channel of the depth image.
Specifically, the normalization method has already been described above, and is not described in detail in this embodiment.
S202: inputting the training sample into a human face detection model to be trained;
s203: respectively extracting features from the color image and the depth image through a face detection model, analyzing the features extracted from the color image to obtain a third classification vector and a third position vector of the color image, analyzing the features extracted from the depth image to obtain a fourth classification vector and a fourth position vector of the depth image, fusing the third classification vector and the fourth classification vector by adopting preset classification parameters, fusing the third position vector and the fourth position vector by adopting preset regression parameters, and updating the classification parameters and the regression parameters respectively through preset classification losses and regression losses.
In this embodiment, the method for extracting features from a color image and a depth image may include:
extracting features of different scales from the color image, and fusing the features of different scales;
extracting features of different scales from the depth image, and fusing the features of different scales;
the scale is consistent with the step length of the preset anchor point frame.
In this embodiment, the features extracted from the color image and the depth image may be features of multiple scales, where the extraction manner of the features of multiple scales may include:
the first implementation mode comprises the following steps: by the convolutional neural network, in the case of adopting different receptive fields, features of different scales are extracted, for example, features of 5 scales can be extracted.
The second embodiment: by the convolutional neural network, under the condition of adopting different receptive fields, a plurality of features with different scales are extracted, and at least one down-sampling is carried out on the extracted features, so that the features with other scales are obtained.
For example, through a convolutional neural network, under the condition of adopting different receptive fields, the method extracts a first feature, a second feature and a third feature, wherein the scales of the first feature, the second feature and the third feature are different, and downsampling is performed on the third feature twice to obtain a fourth feature and a fifth feature.
The manner of fusing the features of multiple scales may include multiple ways, which are not limited in this embodiment, and preferably, a BiFPN algorithm may be used.
In this embodiment, referring to fig. 3, a processing procedure of a face detection model is shown, in which a color image and a depth image are input into the face detection model, a model a in the face detection model is used to process the color image, extract features of the color image, and input the extracted features into a classifier a and a regressor a, and at the same time, an anchor point frame generated in advance by an anchor point generator is also input into the classifier a and the regressor a; the model B in the face detection model is used for processing the depth image, extracting the features of the depth image, inputting the features of the depth image into the classifier B and the regressor B, and simultaneously inputting an anchor point frame generated in advance by the anchor point generator into the classifier B and the regressor B; the classifier A and the regressor A respectively analyze the characteristics of the color image and the anchor point frame to obtain a classification vector and a position vector; the classifier B and the regressor B respectively analyze the features of the depth image and the anchor point frame to obtain a classification vector and a position vector; and combining preset classification parameters to perform weighted fusion on the classification vectors of the color images and the classification vectors of the depth images, gathering the preset regression parameters to perform weighted fusion on the position vectors of the color images and the position vectors of the depth vectors, and outputting final classification information and position information. And updating the classification parameters and the regression parameters through the classification loss and the regression loss to realize the training of the face detection model.
The features of the color image and the features of the depth image can be analyzed through the classifier respectively to obtain a classification vector, and the features of the color image and the features of the depth image can be analyzed through the regressor respectively to correct the anchor point frame to obtain a position vector.
In this embodiment, when fusing the classification vectors of the color image and the depth image and the position vectors of the color image and the depth image, the classification vectors and the position vectors are respectively fused by the classification parameters and the regression parameters, and for example, the fusion can be expressed by the following formula 3):
3)
Figure BDA0003301400010000111
wherein, betaclsA、βclsBDenotes a classification parameter, βrgrAAnd betargrBExpressing the regression parameter, ∈ is a minimum value set to prevent the denominator from being 0, and takes a value of ∈ ═ 1 e-8.
In this embodiment, the classification loss L is calculated from the information pre-labeled with the classification vector and the position vectorclsAnd regression loss LrgrThe final total Loss is Lcls+Lrgrγ, where γ is an empirical parameter that balances the order of two part losses, and takes the value 10, where classification losses can use, for example, the FocalLoss algorithm, and regression losses can use, for example, the DIoULoss algorithm.
Referring to fig. 4, a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention is shown, in this embodiment, the apparatus includes:
an input unit 401, configured to input a color image to be detected, a depth image, and a preset anchor point frame into a pre-trained face detection model;
the face detection model is used for respectively extracting the features of the color image and the depth image, outputting a first classification vector and a first position vector based on the features of a preset anchor point frame and the color image, outputting a second classification vector and a second position vector based on the features of the preset anchor point frame and the depth image, fusing the first classification vector and the second classification vector, and fusing the first position vector and the second position vector to obtain a third classification vector and a third position vector; the third classification vector and the third position vector have a one-to-one corresponding relation, and the anchor point frame has a one-to-one corresponding relation with the characteristics of the color image and the characteristics of the depth image respectively;
a first determining unit 402, configured to compare a preset threshold with the third classification vector, screen out a third classification vector larger than the preset threshold, and determine a candidate face frame based on a mapping relationship between the third classification vector and the third position vector;
a second determining unit 403, configured to determine a target face frame representing the detection result based on the candidate face frame.
Optionally, the method further includes:
the first normalization processing subunit is used for respectively carrying out normalization processing on any one channel of the color image;
and the second normalization processing subunit is used for performing normalization processing on the depth image.
Optionally, the method further includes:
a face detection model training unit for:
obtaining a training sample; the training sample comprises a pre-registered color image and a depth image;
inputting the training sample into a human face detection model to be trained;
respectively extracting features from the color image and the depth image through a face detection model, analyzing the features extracted from the color image to obtain a third classification vector and a third position vector of the color image, analyzing the features extracted from the depth image to obtain a fourth classification vector and a fourth position vector of the depth image, fusing the third classification vector and the fourth classification vector by adopting preset classification parameters, fusing the third position vector and the fourth position vector by adopting preset regression parameters, and updating the classification parameters and the regression parameters respectively through preset classification losses and regression losses.
Optionally, the face detection model training unit further includes:
the first feature extraction unit is used for extracting features of different scales from the color image and fusing the features of different scales;
the second feature extraction unit is used for extracting features of different scales from the depth image and fusing the features of different scales;
the scale is consistent with the step length of the preset anchor point frame.
Optionally, the method further includes:
a third determining unit, configured to determine a step size of the anchor point frame, a scaling ratio of each layer of anchor point frame, and a length-width ratio of the anchor point frame;
the first traversal unit is used for traversing each anchor point step length along the transverse direction and the longitudinal direction of the reference image and determining the central coordinate of each anchor point frame;
the second traversal unit is used for traversing each scaling corresponding to each anchor point step length along the transverse direction and the longitudinal direction of the reference image;
and the third traversing unit is used for traversing the length-width ratio of each anchor point frame corresponding to each scaling along the transverse direction and the longitudinal direction of the reference image.
The device of the embodiment respectively extracts the features of a color image to be detected and a depth image to be detected through a face detection model, analyzes the features of the color image and the features of the depth image respectively to obtain a classification vector and a position vector of the color image and a classification vector and a position vector of the depth image, fuses the classification vectors of the color image and the depth image, fuses the position vectors of the color image and the depth image, outputs a final classification vector and a final position vector, determines a candidate face frame through the relationship between the classification vector and a threshold value, and determines a target face frame representing a detection result based on the candidate face frame. Therefore, in the embodiment, the characteristics of the color image and the depth image are fully utilized to carry out the face detection, and the respective detection results are fused, so that the data processing efficiency is improved, and the accuracy of the face detection is improved.
Referring to fig. 5, a schematic structural diagram of an electronic device disclosed in an embodiment of the present invention is shown, where the electronic device includes:
a memory 501 and a processor 502;
the memory is used for storing programs, and the processor is used for executing the following human face detection method when processing the programs in the memory:
inputting a color image, a depth image and a preset anchor point frame to be detected into a pre-trained face detection model;
the face detection model is used for respectively extracting the features of the color image and the depth image, outputting a first classification vector and a first position vector based on the features of a preset anchor point frame and the color image, outputting a second classification vector and a second position vector based on the features of the preset anchor point frame and the depth image, fusing the first classification vector and the second classification vector, and fusing the first position vector and the second position vector to obtain a third classification vector and a third position vector; the third classification vector and the third position vector have a one-to-one corresponding relation, and the anchor point frame has a one-to-one corresponding relation with the characteristics of the color image and the characteristics of the depth image respectively;
comparing a preset threshold value with the third classification vector, screening out a third classification vector larger than the preset threshold value, and determining a candidate face frame based on the mapping relation between the third classification vector and the third position vector;
and determining a target face frame representing the detection result based on the candidate face frame.
Optionally, the color image and the depth image to be detected are obtained by preprocessing an original color image and an original depth image, and the preprocessing process includes:
respectively carrying out normalization processing on any channel of the color image;
and carrying out normalization processing on the depth image.
Optionally, the training process of the face detection model includes:
obtaining a training sample; the training sample comprises a pre-registered color image and a depth image;
inputting the training sample into a human face detection model to be trained;
respectively extracting features from the color image and the depth image through a face detection model, analyzing the features extracted from the color image to obtain a third classification vector and a third position vector of the color image, analyzing the features extracted from the depth image to obtain a fourth classification vector and a fourth position vector of the depth image, fusing the third classification vector and the fourth classification vector by adopting preset classification parameters, fusing the third position vector and the fourth position vector by adopting preset regression parameters, and updating the classification parameters and the regression parameters respectively through preset classification losses and regression losses.
Optionally, extracting features from the color image and the depth image respectively includes:
extracting features of different scales from the color image, and fusing the features of different scales;
extracting features of different scales from the depth image, and fusing the features of different scales;
the scale is consistent with the step length of the preset anchor point frame.
Optionally, the generating process of the preset anchor frame includes:
determining the step length of the anchor point frame, the scaling of each layer of anchor point frame and the length-width ratio of the anchor point frame;
traversing each anchor point step length along the transverse direction and the longitudinal direction of the reference image, and determining the center coordinate of each anchor point frame;
traversing each scaling corresponding to each anchor point step length along the transverse direction and the longitudinal direction of the reference image;
traversing the length-width ratio of each anchor point frame corresponding to each scaling along the transverse direction and the longitudinal direction of the reference image;
the reference image is the same in scale as the color image to be detected and the depth image to be detected.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A face detection method, comprising:
inputting a color image to be detected, a depth image to be detected and a preset anchor point frame into a pre-trained face detection model;
the face detection model is used for respectively extracting the features of a color image to be detected and a depth image to be detected, outputting a first classification vector and a first position vector based on the features of a preset anchor point frame and the color image to be detected, outputting a second classification vector and a second position vector based on the features of the preset anchor point frame and the depth image to be detected, fusing the first classification vector and the second classification vector, and fusing the first position vector and the second position vector to obtain a third classification vector and a third position vector; the third classification vector and the third position vector have a one-to-one corresponding relation, and the anchor point frame has a one-to-one corresponding relation with the characteristics of the color image and the characteristics of the depth image respectively;
comparing a preset threshold value with the third classification vector, screening out a third classification vector larger than the preset threshold value, and determining a candidate face frame based on the mapping relation between the third classification vector and the third position vector;
and determining a target face frame representing the detection result based on the candidate face frame.
2. The method according to claim 1, wherein the color image and the depth image to be detected are obtained by preprocessing an original color image and an original depth image, and the preprocessing comprises:
respectively carrying out normalization processing on any channel of the color image;
and carrying out normalization processing on the depth image.
3. The method of claim 1, wherein the training process of the face detection model comprises:
obtaining a training sample; the training sample comprises a pre-registered color image and a depth image;
inputting the training sample into a human face detection model to be trained;
respectively extracting features from the color image and the depth image through a face detection model, analyzing the features extracted from the color image to obtain a third classification vector and a third position vector of the color image, analyzing the features extracted from the depth image to obtain a fourth classification vector and a fourth position vector of the depth image, fusing the third classification vector and the fourth classification vector by adopting preset classification parameters, fusing the third position vector and the fourth position vector by adopting preset regression parameters, and updating the classification parameters and the regression parameters respectively through preset classification losses and regression losses.
4. The method of claim 3, wherein extracting features from the color image and the depth image, respectively, comprises:
extracting features of different scales from the color image, and fusing the features of different scales;
extracting features of different scales from the depth image, and fusing the features of different scales;
the scale is consistent with the step length of the preset anchor point frame.
5. The method of claim 1, wherein the generating of the pre-defined anchor block comprises:
determining the step length of the anchor point frame, the scaling of each layer of anchor point frame and the length-width ratio of the anchor point frame;
traversing each anchor point step length along the transverse direction and the longitudinal direction of the reference image, and determining the center coordinate of each anchor point frame;
traversing each scaling corresponding to each anchor point step length along the transverse direction and the longitudinal direction of the reference image;
traversing the length-width ratio of each anchor point frame corresponding to each scaling along the transverse direction and the longitudinal direction of the reference image;
the reference image is the same in scale as the color image to be detected and the depth image to be detected.
6. A face detection apparatus, comprising:
the input unit is used for inputting the color image to be detected, the depth image and the preset anchor point frame into a pre-trained face detection model;
the face detection model is used for respectively extracting the features of the color image and the depth image, outputting a first classification vector and a first position vector based on the features of a preset anchor point frame and the color image, outputting a second classification vector and a second position vector based on the features of the preset anchor point frame and the depth image, fusing the first classification vector and the second classification vector, and fusing the first position vector and the second position vector to obtain a third classification vector and a third position vector; the third classification vector and the third position vector have a one-to-one corresponding relation, and the anchor point frame has a one-to-one corresponding relation with the characteristics of the color image and the characteristics of the depth image respectively;
the first determining unit is used for comparing a preset threshold value with the third classification vector, screening out the third classification vector larger than the preset threshold value, and determining a candidate face frame based on the mapping relation between the third classification vector and the third position vector;
and the second determining unit is used for determining a target face frame representing the detection result based on the candidate face frame.
7. The apparatus of claim 6, further comprising:
the first normalization processing subunit is used for respectively carrying out normalization processing on any one channel of the color image;
and the second normalization processing subunit is used for performing normalization processing on the depth image.
8. The apparatus of claim 6, further comprising:
a face detection model training unit for:
obtaining a training sample; the training sample comprises a pre-registered color image and a depth image;
inputting the training sample into a human face detection model to be trained;
the face detection model is used for respectively extracting features from the color image and the depth image, analyzing the features extracted from the color image to obtain a fourth classification vector and a fourth position vector of the color image, analyzing the features extracted from the depth image to obtain a fifth classification vector and a fifth position vector of the depth image, fusing the fourth classification vector and the fifth classification vector by adopting preset classification parameters, fusing the fourth position vector and the fifth position vector by adopting preset regression parameters, and updating the classification parameters and the regression parameters respectively through preset classification loss and regression loss.
9. The apparatus of claim 6, wherein the face detection model training unit further comprises:
the first feature extraction unit is used for extracting features of different scales from the color image and fusing the features of different scales corresponding to the color image;
the second feature extraction unit is used for extracting features of different scales from the depth image and fusing the features of different scales corresponding to the depth image;
the scale is consistent with the step length of the preset anchor point frame.
10. An electronic device, comprising:
a memory and a processor;
the memory is used for storing programs, and the processor is used for executing the face detection method of the claims 1-5 when processing the programs in the memory.
CN202111191551.7A 2021-10-13 2021-10-13 Face detection method and device and electronic equipment Pending CN113869271A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111191551.7A CN113869271A (en) 2021-10-13 2021-10-13 Face detection method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111191551.7A CN113869271A (en) 2021-10-13 2021-10-13 Face detection method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113869271A true CN113869271A (en) 2021-12-31

Family

ID=78998911

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111191551.7A Pending CN113869271A (en) 2021-10-13 2021-10-13 Face detection method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113869271A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034102A (en) * 2018-08-14 2018-12-18 腾讯科技(深圳)有限公司 Human face in-vivo detection method, device, equipment and storage medium
CN109766856A (en) * 2019-01-16 2019-05-17 华南农业大学 A kind of method of double fluid RGB-D Faster R-CNN identification milking sow posture
CN111242097A (en) * 2020-02-27 2020-06-05 腾讯科技(深圳)有限公司 Face recognition method and device, computer readable medium and electronic equipment
CN111611934A (en) * 2020-05-22 2020-09-01 北京华捷艾米科技有限公司 Face detection model generation and face detection method, device and equipment
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN112465880A (en) * 2020-11-26 2021-03-09 西安电子科技大学 Target detection method based on multi-source heterogeneous data cognitive fusion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034102A (en) * 2018-08-14 2018-12-18 腾讯科技(深圳)有限公司 Human face in-vivo detection method, device, equipment and storage medium
CN109766856A (en) * 2019-01-16 2019-05-17 华南农业大学 A kind of method of double fluid RGB-D Faster R-CNN identification milking sow posture
WO2020181685A1 (en) * 2019-03-12 2020-09-17 南京邮电大学 Vehicle-mounted video target detection method based on deep learning
CN111242097A (en) * 2020-02-27 2020-06-05 腾讯科技(深圳)有限公司 Face recognition method and device, computer readable medium and electronic equipment
CN111611934A (en) * 2020-05-22 2020-09-01 北京华捷艾米科技有限公司 Face detection model generation and face detection method, device and equipment
CN112465880A (en) * 2020-11-26 2021-03-09 西安电子科技大学 Target detection method based on multi-source heterogeneous data cognitive fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高伟;张菱珂;王嶺;苗鹏;: "基于RGB-D深度相机的下一代虚拟演播室构建", 工业控制计算机, no. 01, 25 January 2018 (2018-01-25) *

Similar Documents

Publication Publication Date Title
CN108537743B (en) Face image enhancement method based on generation countermeasure network
CN109118479B (en) Capsule network-based insulator defect identification and positioning device and method
CN106683048B (en) Image super-resolution method and device
EP0363828B1 (en) Method and apparatus for adaptive learning type general purpose image measurement and recognition
CN109903331B (en) Convolutional neural network target detection method based on RGB-D camera
US20190147283A1 (en) Deep convolutional neural networks for crack detection from image data
CN106228528B (en) A kind of multi-focus image fusing method based on decision diagram and rarefaction representation
CN112818764B (en) Low-resolution image facial expression recognition method based on feature reconstruction model
CN109711268B (en) Face image screening method and device
CN112215157B (en) Multi-model fusion-based face feature dimension reduction extraction method
CN111931686B (en) Video satellite target tracking method based on background knowledge enhancement
CN111583148A (en) Rock core image reconstruction method based on generation countermeasure network
CN113065431B (en) Human body violation prediction method based on hidden Markov model and recurrent neural network
CN112633221A (en) Face direction detection method and related device
CN112861785A (en) Shielded pedestrian re-identification method based on example segmentation and image restoration
CN115239672A (en) Defect detection method and device, equipment and storage medium
CN112766419A (en) Image quality evaluation method and device based on multitask learning
CN113421223B (en) Industrial product surface defect detection method based on deep learning and Gaussian mixture
CN111985488B (en) Target detection segmentation method and system based on offline Gaussian model
CN116503398B (en) Insulator pollution flashover detection method and device, electronic equipment and storage medium
CN117789081A (en) Dual-attention mechanism small object identification method based on self-information
CN117437691A (en) Real-time multi-person abnormal behavior identification method and system based on lightweight network
CN113869271A (en) Face detection method and device and electronic equipment
CN111723688A (en) Human body action recognition result evaluation method and device and electronic equipment
CN108256578B (en) Gray level image identification method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination