CN112560584A - Face detection method and device, storage medium and terminal - Google Patents

Face detection method and device, storage medium and terminal Download PDF

Info

Publication number
CN112560584A
CN112560584A CN202011360224.5A CN202011360224A CN112560584A CN 112560584 A CN112560584 A CN 112560584A CN 202011360224 A CN202011360224 A CN 202011360224A CN 112560584 A CN112560584 A CN 112560584A
Authority
CN
China
Prior art keywords
face
detected
key point
network model
face image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011360224.5A
Other languages
Chinese (zh)
Inventor
郭峰
单增光
叶云
黄冠
都大龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xinyi Intelligent Technology Co.,Ltd.
Original Assignee
Beijing Xinyi Intelligent Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xinyi Intelligent Information Technology Co ltd filed Critical Beijing Xinyi Intelligent Information Technology Co ltd
Priority to CN202011360224.5A priority Critical patent/CN112560584A/en
Publication of CN112560584A publication Critical patent/CN112560584A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Geometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

A face detection method and device, a storage medium and a terminal are provided, the method comprises the following steps: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected. By the scheme of the invention, the positions of the key points of the face and whether each key point is shielded or not can be detected by a single detection network model.

Description

Face detection method and device, storage medium and terminal
Technical Field
The invention relates to the field of computer vision, in particular to a face detection method and device, a storage medium and a terminal.
Background
At present, the face detection is widely applied to a plurality of scenes such as financial payment, entrance guard attendance, electronic commerce and the like, and brings great convenience to the life of people. The face detection mainly comprises a process of positioning and analyzing a face in an image, and specifically, in the process of face detection, after a face region and a face key point in the image are detected, face image analysis tasks such as face recognition, face expression recognition and the like are required. The detection of the key points of the face (e.g., left eye, right eye, nose tip, left mouth corner, right mouth corner, etc.) is a crucial link in the face detection, and the key points of the face in the image need to be accurately detected to accurately analyze the face in the image.
In the practical application process, a situation that the key points of the face are blocked (for example, the key points of the face are blocked by objects such as sunglasses, masks, caps, and the like) often occurs, but in the prior art, the determination of whether the key points of the face are blocked is not included when the key points of the face are located.
Therefore, a face detection method capable of determining the occlusion state of key points of a face in an image is needed.
Disclosure of Invention
The technical problem solved by the invention is how to determine the shielding state of the key points of the human face in the image.
In order to solve the above technical problem, an embodiment of the present invention provides a face detection method, where the method includes: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.
Optionally, the training data further includes a position of a face region of each sample face image, and the method further includes: and when the trained detection network model is adopted to detect the face image to be detected, determining the face area of the face image to be detected.
Optionally, each face key point has a corresponding anchor point, and the position of each face key point is determined in the following manner: for each anchor point, determining the position of the anchor point according to the positions of corresponding face key points in the plurality of sample face images; and calculating the offset of each face key point relative to the corresponding anchor point to obtain the position of the face key point.
Optionally, the detecting the face image to be detected by using the trained detection network model to determine the position and the shielding state of the face key point in the face image to be detected includes: determining key point data of a face to be detected by adopting the trained detection network model, wherein the key point data of the face to be detected comprises the offset of the key point of the face in the image of the face to be detected relative to the anchor point corresponding to the key point; and superposing the offset of the face key point in the face image to be detected relative to the anchor point corresponding to the face image to be detected at the position of the anchor point so as to determine the position of the face key point in the face image to be detected.
Optionally, the facial key point data to be detected further includes a regression value of the occlusion state of the facial key point in the facial image to be detected, and detecting the facial image to be detected by using the trained detection network model further includes: and comparing the occlusion state regression value with a preset threshold value, and if the occlusion state regression value is larger than the preset threshold value, determining that the key point of the face is not occluded.
Optionally, before comparing the occlusion state regression value with a preset threshold, the method further includes: determining the preset threshold value; determining the preset threshold comprises: the method comprises the following steps: setting an initial value of the preset threshold value and acquiring a plurality of verification face images; step two: for each verification face image, detecting whether the face key points in the verification face image are shielded by adopting the trained detection network model, and judging whether the detection result is accurate; step three: counting the detection accuracy of the detection network model; step four: and comparing the accuracy with a preset accuracy threshold, if the accuracy is smaller than the preset accuracy threshold, adjusting the preset threshold, and returning to the second step until the accuracy is not lower than the preset accuracy threshold.
Optionally, training a single detection network model by using the training data to obtain a trained detection network model includes: constructing a loss function, wherein the loss function is used for calculating regression loss values of face key points in each sample face image, and the regression loss values of the face key points comprise position regression loss values and occlusion state regression loss values; and training the detection network model according to the loss function and the training data until the position regression loss value and the shielding state regression loss value are both smaller than a preset loss value.
Optionally, the detecting network model includes a backbone network, a feature pyramid network, and a prediction network, where the backbone network, the feature pyramid network, and the prediction network all include a preset number of levels, and detecting the face image to be detected using the trained detecting network model includes: fusing the features of the face image to be detected extracted from the ith layer of the backbone network and the up-sampling features of the ith layer of the feature pyramid network on the ith layer of the feature pyramid network to obtain the fusion features of the ith layer of the feature pyramid network; detecting the position and the shielding state of a face key point in the face image to be detected according to the fusion feature of the ith layer of the feature pyramid network on the ith layer of the prediction network; wherein i is more than or equal to 1 and less than or equal to N, and i and N are preset positive integers.
In order to solve the above technical problem, an embodiment of the present invention further provides a face detection apparatus, where the apparatus includes: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring training data, the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the position and the shielding state of each face key point; the training module is used for training a single detection network model by adopting the training data to obtain a trained detection network model; and the detection module is used for detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.
The embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the above-mentioned face detection method.
The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory is stored with a computer program capable of running on the processor, and the processor executes the steps of the human face detection method when running the computer program.
Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects: the embodiment of the invention provides a face detection method, which comprises the following steps: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected. In the embodiment of the invention, the face key point data in the training data comprises the position and the shielding state of each face key point, and the training data is adopted to train a single detection network model, so that the single detection network model can simultaneously learn the position characteristics and the shielding state characteristics of the face key points, and when the trained detection network model is used for detecting the face image to be detected, the position of the face key point in the face image to be detected can be detected, and the shielding state of the face key point can also be detected, so that the face image analysis can be carried out subsequently.
Furthermore, in the embodiment of the invention, the corresponding anchor point is determined according to the positions of the face key points in the plurality of sample face images, so that the obtained positions of the anchor points can embody the characteristics of the positions of the face key points corresponding to the anchor points in the sample face images, the determined positions of the anchor points are more optimized, and the accuracy of positioning the face key points can be improved.
Furthermore, the training data in the embodiment of the present invention further includes the position of the face region of each sample face image, when the training data is used to train the detection network model, the detection network model can also learn the characteristics of the position of the face region, and when the trained detection network model is used to detect the face key points in the face image to be detected, the face region of the face image to be detected can be determined together, so that the positioning of the face region, the positioning of the face key points, and the judgment of the shielding state can be performed synchronously, and the detection speed is increased.
Drawings
Fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention.
Fig. 2 is a scene schematic diagram of a face detection method in an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a detection network model according to an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating an effect of a face detection method according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention.
Detailed Description
As described above, a face detection method capable of determining the occlusion state of key points of a face in an image is needed.
The inventor of the invention discovers through research that in the prior art, the positioning of the key points of the human face and the detection of the shielding state are implemented step by step, and the positioning of the key points of the human face and the detection of the shielding state are respectively completed by two network models. Specifically, firstly, the position of a face key point in a face image to be detected is determined by using a face key point positioning model, a plurality of corresponding face region images are obtained by dividing according to the position of the face key point in the face image to be detected, and the face region images are input into a face shielding detection model to detect face shielding. The method ignores the internal relation between the two tasks of positioning the key points of the human face and judging the shielding state, which causes low overall detection efficiency, and particularly when a large-scale human face scene is processed (namely, a human face image to be detected contains a large number of human faces), the detection method has low detection efficiency and obvious performance bottleneck.
In order to solve the above technical problem, an embodiment of the present invention provides a face detection method, where the method includes: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected. In the embodiment of the invention, the face key point data in the training data comprises the position and the shielding state of each face key point, and the training data is adopted to train a single detection network model, so that the single detection network model can simultaneously learn the position characteristics and the shielding state characteristics of the face key points, and when the trained detection network model is used for detecting the face image to be detected, the position of the face key point in the face image to be detected can be detected, and the shielding state of the face key point can also be detected, so that the face image analysis can be carried out subsequently.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
Referring to fig. 1, fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention. The face detection method may be executed by a terminal, and the terminal may be any appropriate terminal, such as a mobile phone, a computer, an internet of things device, and the like, but is not limited thereto. The method may be used to detect the positions and the occlusion states of key points of a face in a face image to be detected, and the terminal may detect the positions of faces (e.g., left eye, right eye, nose tip, left mouth corner, and right mouth corner) in the face image to be detected, and determine whether each key point of the face is occluded, but is not limited thereto. The face image to be detected may be an image acquired by the terminal in real time, may also be an image pre-stored in the terminal, and may also be an image received by the terminal from the outside, but is not limited thereto.
It should be noted that the face key points may be specific portions of a predetermined face, specifically, the face key points to be detected may be left eyes and right eyes, or the face key points to be detected may be left eyes, right eyes, nose tips, left mouth corners, and right mouth corners.
With continuing reference to fig. 1, the face detection method shown in fig. 1 may specifically include the following steps:
step S101: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points;
step S102: training a single detection network model by using the training data to obtain a trained detection network model;
step S103: and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.
In the specific implementation of step S101, when the terminal acquires the training data, it may first acquire a plurality of sample face images, and then determine face key point data in each sample face image to obtain the training data, where the face key point data includes the position and the shielding state of each face key point.
Specifically, the terminal may acquire a plurality of sample face images from the outside, or may select at least a part of the sample face images from a training data set stored locally as the sample face images. The sample face image can be provided with an identification graph, and the identification graph is used for indicating the positions of all face key points in the sample face image. The identification graph may also be used to indicate an occlusion state of each face key point in the sample face image, for example, if the identification graph is rectangular, the face key point is not occluded, and if the identification graph is circular, the face key point is occluded, that is, the occlusion state of the face key point may be represented by different shapes of the identification graph, and different occlusion states of the key points may also be distinguished according to different colors of the identification graph, but the present invention is not limited thereto.
It should be noted that the identification pattern may be pre-marked on the sample face image, or may be obtained by operating the sample face image after the terminal acquires the sample face image, for example, after the terminal acquires the sample face image, the identification pattern may be manually marked on the sample face image, or the identification pattern may be automatically marked in the sample face image by the terminal.
Further, the terminal can determine the position and the shielding state of each face key point by searching and identifying the identification graph in each sample face image, so as to obtain the face key point data of each sample face image. For example: after the terminal acquires a plurality of sample face images, the positions of the face key points in the sample face images are obtained by searching the identification graphs in the sample face images, and the shielding state of the face key points can be determined according to the shapes of the identification graphs.
It should be noted that each face key point has a corresponding anchor point, for example, the face key points to be detected are a left eye, a right eye, a nose tip, a left mouth corner and a right mouth corner, the left eyes in the multiple sample face images all correspond to the first anchor point, the right eyes in the multiple sample face images all correspond to the second anchor point, the nose tips in the multiple sample face images all correspond to the third anchor point, the left mouth corners in the multiple sample face images all correspond to the fourth anchor point, and the right mouth corners in the multiple sample face images all correspond to the fifth anchor point, but not limited thereto.
Further, for each anchor point, the position of the anchor point can be determined according to the positions of the face key points corresponding to the anchor point in the plurality of sample face images. Specifically, the positions of the key points corresponding to the anchor points in the plurality of sample face images may be counted, and the result obtained by the counting may be used as the position of the anchor point. For example, according to the positions of the left eyes in the multiple sample face images, the coordinate values of the left eyes in the multiple sample face images are averaged to obtain the position of the first anchor point corresponding to the left eye. Compared with a method for directly setting the anchor point position, the method for determining the anchor point position by adopting the statistical method can enable the anchor point position to embody the characteristics of the corresponding human face key point position, so that the detection network model can learn the characteristics of the human face key point position more easily, and the detection result is more accurate.
Further, after the positions of the anchor points are determined, the offset of each face key point in the sample face image relative to the anchor point corresponding to the face key point can be calculated, and the offset of each face key point in each sample image relative to the anchor point corresponding to the face key point is added into the training data, that is, the offset of each face key point in each sample face image relative to the anchor point corresponding to the face key point is the position of each face key point in the training data.
Furthermore, the terminal can also identify the shielding state of each face key point according to the identification graph in each sample face image, and different shielding state values can be set for the face key points in different shielding states. For example: if the face key point is identified to be not shielded, the shielding state value of the face key point can be determined to be 1, if the face key point is identified to be shielded, the shielding state value of the face key point can be determined to be 0, and the shielding state value of each face key point in each sample face image is added into the training data.
In one non-limiting embodiment of the invention, forEach face key point can be a three-dimensional vector
Figure BDA0002803786250000081
And representing the corresponding face key point data, wherein the training data comprises three-dimensional vectors corresponding to the face key points. Therefore, when the training data is adopted to train a single detection network model, the position characteristic and the shielding state characteristic of the key point of the face can be learned by the single detection network model at the same time.
Specifically, in the three-dimensional vector, x is used to represent a component of an offset of the face key point relative to its corresponding anchor point in a first coordinate axis direction, y is used to represent a component of the offset of the face key point relative to its corresponding anchor point in a second coordinate axis direction, and z is used to represent an occlusion state value of the face key point. The value of z is 0 or 1, when z is 0, the face key point is occluded, and when z is 1, the face key point is not occluded, but is not limited thereto.
Further, the training data may further include the position of the face region in each sample face image. Specifically, the sample face image may further include a specific identification pattern for indicating the face region. The terminal can determine the position of the face region in the sample face image by searching and recognizing a specific identification pattern for indicating the face region.
In a specific implementation of step S102, a single detection network model may be trained using the training data to obtain a trained detection network model. That is, the detection network model for detecting the position of the key point of the face in the image to be detected and the detection network model for detecting the shielding state of the key point of the face are the same detection network model. Because the face key point data in the training data comprises the position and the shielding state of each face key point, when the training data is adopted to train the detection network model, the detection network model can simultaneously learn the position characteristics and the shielding state characteristics of the face key points, so that the position of the face key points and the shielding state of the face key points can be detected when the trained detection network model is adopted to detect the face image to be detected.
Further, when the training data includes the position of the face region in the sample face image, and the training data is adopted for training the detection network model, the detection network model can also learn the position characteristics of the face region, and therefore the detection network model after training can also be adopted for detecting the face region in the face image to be detected.
In particular, the detection network model may comprise one or more detection network elements. The detection network model can comprise a first detection network unit, the first detection network unit is used for detecting the position and the shielding state of key points of the face in the face image to be detected, the detection network model can also comprise a second detection network unit, and the second detection network unit can be used for detecting the face area in the image to be detected.
Further, a loss function may be constructed, and the detection network model may be trained using the training data and the loss function. The loss function is used for calculating a regression loss value of a face key point in each sample face image, wherein the regression loss value of the face key point comprises a position regression loss value and an occlusion state regression loss value, the position regression loss value can be a difference value between an offset of the face key point, which is calculated by a detection network model, relative to a corresponding anchor point of the face key point and an actual offset in the training data, and the occlusion state regression value can be a difference value between an occlusion state value of the face key point, which is calculated by the detection network model, and an actual occlusion state value in the training data; and then training the detection network model according to the loss function and the training data, namely, after calculating the regression loss value of the key point of the face, updating the parameters of the detection network model by utilizing random gradient descent and back propagation errors, and calculating the regression loss value of the key point of the face again until the position regression loss value and the occlusion state regression loss value are both smaller than a preset loss value. That is, when both the position regression loss value and the occlusion state regression loss value are smaller than the preset loss value, the trained detection network model is obtained. The preset loss value can be manually preset or determined by the terminal through calculation.
As a non-limiting example, the loss function may be smoothL1 loss function, may be a binary softmax loss function, and may be any other appropriate loss function, which is not limited herein.
Further, a first loss function may be constructed, and the first detection network unit is trained by using the face key point data in the training data and the first loss function, that is, the same loss function is used to train the part of the detection network model for detecting the position of the face key point and the shielding state of the face key point. A second loss function may also be constructed, and the second detection network element is trained using the position of the face region in the training data and the second loss function.
In the specific implementation of step S103, the face image to be detected is input into the trained detection network model to obtain face key point data to be detected, where the face key point data to be detected may include an offset of a face key point in the face image to be detected with respect to an anchor point corresponding thereto, and the offset of the face key point in the face image to be detected with respect to the anchor point corresponding thereto is superimposed on the position of the anchor point to determine the position of the face key point in the face image to be detected.
Referring to fig. 2, fig. 2 is a scene schematic diagram of a face detection method in an embodiment of the present invention.
Specifically, after the face image 21 to be detected is input into the trained detection network model, the position of the anchor point 22 is determined on the face image 21 to be detected. It should be noted that, when the trained detection network model is used to detect the face image to be detected, the positions of the anchor points 22 corresponding to the key points of each face are the same as the positions of the anchor points determined when the training data is obtained. In particular, the location of the anchor point 22 may be determined by the detection network model when the training data is acquired; the locations of the anchor points 22 may also be determined by other network models when the training data is acquired, and the locations of the anchor points 22 are given to the detection network model after the training data is acquired.
Further, the trained detection network model calculates the offset 23 of the face key point 24 in the face image 21 to be detected relative to the corresponding anchor point, and superimposes the offset 23 of the face key point in the face image to be detected relative to the anchor point corresponding to the face key point on the position of the anchor point 22, so that the position of the face key point 24 in the face image to be detected can be determined, and the position of the face key point 24 in the face image 21 to be detected is determined.
With continued reference to fig. 1, the trained detection network model may be used to determine the occlusion state of the face key point while obtaining the position of the face key point. Specifically, after the face image to be detected is input into the trained detection network model, the trained detection network model can also calculate an occlusion state regression value of each face key point, and the occlusion state regression value is used for indicating the occlusion state of each face key point in the face image to be detected.
Further, the occlusion state regression value of each face key point is compared with a preset threshold value, if the occlusion state regression value is greater than the preset threshold value, the corresponding face key point is determined to be not occluded, and if the occlusion state regression value does not exceed the preset threshold value, the corresponding face key point is determined to be occluded.
It should be noted that the preset threshold may be received from the outside by the terminal, may be predetermined manually, or may be determined by calculation performed by the terminal, but is not limited thereto. Preferably, the preset threshold is 0.5, when the regression value of the occlusion state calculated by the trained detection network model is greater than 0.5, it is determined that the corresponding key point of the face is not occluded, and if the regression value of the occlusion state is not greater than 0.5, it is determined that the corresponding key point of the face is occluded.
In a non-limiting embodiment of the present invention, the preset threshold is determined by a terminal, that is, before comparing the occlusion state regression value with the preset threshold, the method may further include determining the preset threshold. The preset threshold may be determined by the detection network model.
Specifically, determining the preset threshold may include: the method comprises the following steps: setting an initial value of the preset threshold value and acquiring a plurality of verification face images, wherein the number of face key points in the verification face images is the same as the number of corresponding face key points in the sample face images; step two: for each verification face image, adopting the trained detection network model to sequentially detect whether each face key point is shielded or not, and judging whether the detection result of each face key point is accurate or not; step three: counting the detection accuracy of the detection network model; step four: and comparing the accuracy with a preset accuracy threshold, if the accuracy is smaller than the preset accuracy threshold, adjusting the preset threshold, and returning to the second step until the accuracy is not lower than the preset accuracy threshold. Wherein the preset accuracy threshold may be previously received by the terminal from the outside.
Further, when the training data further includes the position of the face region of each sample face image, the trained detection network model is adopted to detect the face image to be detected, and the face region in the face image to be detected can also be determined together.
Specifically, when the detection network model is trained by using the training data, the second detection network unit in the detection network model may be trained by using the position of the face region of each sample face image in the training data. After the network face image to be detected is input into the trained detection network model, the second detection network unit can be adopted to determine the face area in the face image to be detected.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a detection network model according to an embodiment of the present invention, where the detection network model 31 shown in fig. 3 includes a backbone network 32, a feature pyramid network 33, and a prediction network 34, and the backbone network 32, the feature pyramid network 33, and the prediction network 34 all include a preset number of levels. The backbone network 32 and the feature pyramid network 33 are used for extracting features in the face image to be detected, and the prediction network 34 detects the positions and the shielding states of key points of the face in the face image to be detected based on the features extracted by the backbone network 32 and the feature pyramid network 33.
Further, when the trained detection network model 31 is used for detecting a face image to be detected, the features of the face image to be detected extracted from the ith layer of the backbone network 32 and the upsampling features of the ith layer of the feature pyramid network 33 are fused on the ith layer of the feature pyramid network to obtain the fusion features of the ith layer of the feature pyramid network 33; detecting the position and the shielding state of a key point of the face in the face image to be detected according to the fusion feature of the ith layer of the feature pyramid network 33 on the ith layer of the prediction network 34; wherein i is more than or equal to 1 and less than or equal to N, and i and N are preset positive integers.
Specifically, different levels correspond to different resolutions, with higher levels (i.e., larger i) having lower resolution but richer semantic information, and lower levels (i.e., smaller i) having higher resolution but less semantic information.
In the embodiment of the invention, the backbone network 32 is adopted to extract the features of the face image to be detected at different levels, specifically, the backbone network 32 performs downsampling on the face image to be detected according to a path from bottom to top (the resolution of the face image to be detected is decreased progressively), and extracts the features of the face image to be detected at different resolutions. The feature pyramid network 33 may perform upsampling according to a top-down path (resolution of the face image to be detected is increased) to obtain an upsampled feature of each level. At each level of the feature pyramid network 33, the features extracted from the corresponding level of the backbone network 32 may be combined with the upsampled features of the level in the feature pyramid network 33 to obtain the fused features of the level, where the upsampled features of the level in the feature pyramid network 33 are obtained by sampling the fused features of the previous level. Therefore, each level of the feature pyramid network 33 can combine features with low resolution and strong semantics with features with high resolution and weak semantics, so that each level of the feature pyramid network 33 has features with high resolution and strong semantics, and the accuracy of the detection result is improved.
Further, the positions and the shielding states of the face key points in the face image to be detected are detected in each level in the prediction network 34 according to the fusion features of the corresponding levels in the feature pyramid network 33, and finally the positions and the shielding states of the face key points in the face image to be detected are determined by adopting a non-maximum suppression method according to the positions and the shielding states obtained in each level.
Further, the position of the face region in the face image to be detected may also be detected in each level in the prediction network 34 according to the fusion feature of the corresponding level in the feature pyramid network 33, and finally, the face-related region in the face image to be detected is determined by using a non-maximum suppression method according to the result obtained in each level.
With reference to fig. 1, after the trained detection network model is used to determine the positions and the shielding states of the face key points in the face image to be detected, the first identification pattern may be used to mark each shielded face key point in the face image to be detected, the second identification pattern may be used to mark each non-shielded face key point, or the third identification pattern may be used to mark a face region in the face image to be detected after the face region of the face image to be detected is determined.
Referring to fig. 4, fig. 4 is a schematic diagram illustrating an effect of a face detection method according to an embodiment of the present invention. After the terminal detects the face image 41 to be detected by using the method shown in fig. 1, the positions and the shielding states of 5 key points (a left eye 42, a right eye 43, a nose tip 44, a left mouth corner 45 and a right mouth corner 46) in the face image 41 to be detected can be obtained and respectively marked in the face image to be detected, wherein the nose tip 44, the left mouth corner 45 and the right mouth corner 46 are shielded and marked by using a first identification graph; the left and right eyes 42, 43 are unobstructed and marked with a second logo graphic. In addition, a third identification pattern is used to mark a face region 47 in the face image to be detected.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention. The face detection device in the embodiment of the present invention may include an obtaining module 51, a training module 52, and a detection module 53.
The obtaining module 51 is configured to obtain training data, where the training data includes a plurality of sample face images and face key point data of each sample face image, and the face key point data includes positions and shielding states of each face key point; the training module 52 is configured to train a single detection network model by using the training data to obtain a trained detection network model; the detection module 53 is configured to detect a to-be-detected face image by using the trained detection network model, so as to determine the position and the shielding state of a face key point in the to-be-detected face image.
For more details of the working principle and the working mode of the face detection apparatus, reference may be made to the related descriptions in fig. 1 to fig. 4, which are not repeated herein.
The embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the above-mentioned face detection method. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.
The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with a computer program which can run on the processor. The processor, when running the computer program, may perform the steps of the face detection method shown in fig. 1. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document indicates that the former and latter related objects are in an "or" relationship.
The "plurality" appearing in the embodiments of the present application means two or more.
The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent the order or the particular limitation of the number of the devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application.
Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (11)

1. A face detection method, comprising:
acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points;
training a single detection network model by using the training data to obtain a trained detection network model;
and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.
2. The method of claim 1, wherein the training data further includes a location of a face region of each sample face image, the method further comprising:
and when the trained detection network model is adopted to detect the face image to be detected, determining the face area of the face image to be detected.
3. The face detection method of claim 1, wherein each face keypoint has a corresponding anchor point, and the position of each face keypoint is determined as follows:
for each anchor point, determining the position of the anchor point according to the positions of corresponding face key points in the plurality of sample face images;
and calculating the offset of each face key point relative to the corresponding anchor point to obtain the position of the face key point.
4. The face detection method according to claim 3, wherein detecting the face image to be detected by using the trained detection network model to determine the position and the shielding state of the face key point in the face image to be detected comprises:
determining key point data of a face to be detected by adopting the trained detection network model, wherein the key point data of the face to be detected comprises the offset of the key point of the face in the image of the face to be detected relative to the anchor point corresponding to the key point;
and superposing the offset of the face key point in the face image to be detected relative to the anchor point corresponding to the face image to be detected at the position of the anchor point so as to determine the position of the face key point in the face image to be detected.
5. The face detection method according to claim 4, wherein the face key point data to be detected further includes an occlusion state regression value of a face key point in the face image to be detected, and the detecting the face image to be detected by using the trained detection network model further includes:
and comparing the occlusion state regression value with a preset threshold value, and if the occlusion state regression value is larger than the preset threshold value, determining that the key point of the face is not occluded.
6. The method according to claim 5, wherein before comparing the occlusion state regression value with a preset threshold, the method further comprises: determining the preset threshold value;
determining the preset threshold comprises:
the method comprises the following steps: setting an initial value of the preset threshold value and acquiring a plurality of verification face images;
step two: for each verification face image, detecting whether the face key points in the verification face image are shielded by adopting the trained detection network model, and judging whether the detection result is accurate;
step three: counting the detection accuracy of the detection network model;
step four: and comparing the accuracy with a preset accuracy threshold, if the accuracy is smaller than the preset accuracy threshold, adjusting the preset threshold, and returning to the second step until the accuracy is not lower than the preset accuracy threshold.
7. The method of claim 1, wherein training a single detection network model using the training data to obtain a trained detection network model comprises:
constructing a loss function, wherein the loss function is used for calculating regression loss values of face key points in each sample face image, and the regression loss values of the face key points comprise position regression loss values and occlusion state regression loss values;
and training the detection network model according to the loss function and the training data until the position regression loss value and the shielding state regression loss value are both smaller than a preset loss value.
8. The method of claim 1, wherein the detection network model comprises a backbone network, a feature pyramid network and a prediction network, the backbone network, the feature pyramid network and the prediction network each comprise a preset number of levels, and the detecting the image of the face to be detected by using the trained detection network model comprises:
fusing the features of the face image to be detected extracted from the ith layer of the backbone network and the up-sampling features of the ith layer of the feature pyramid network on the ith layer of the feature pyramid network to obtain the fusion features of the ith layer of the feature pyramid network;
detecting the position and the shielding state of a face key point in the face image to be detected according to the fusion feature of the ith layer of the feature pyramid network on the ith layer of the prediction network;
wherein i is more than or equal to 1 and less than or equal to N, and i and N are preset positive integers.
9. An apparatus for face detection, the apparatus comprising:
the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring training data, the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the position and the shielding state of each face key point;
the training module is used for training a single detection network model by adopting the training data to obtain a trained detection network model;
and the detection module is used for detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.
10. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the face detection method according to any one of claims 1 to 8.
11. A terminal comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the face detection method according to any of claims 1 to 8.
CN202011360224.5A 2020-11-27 2020-11-27 Face detection method and device, storage medium and terminal Pending CN112560584A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011360224.5A CN112560584A (en) 2020-11-27 2020-11-27 Face detection method and device, storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011360224.5A CN112560584A (en) 2020-11-27 2020-11-27 Face detection method and device, storage medium and terminal

Publications (1)

Publication Number Publication Date
CN112560584A true CN112560584A (en) 2021-03-26

Family

ID=75046414

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011360224.5A Pending CN112560584A (en) 2020-11-27 2020-11-27 Face detection method and device, storage medium and terminal

Country Status (1)

Country Link
CN (1) CN112560584A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065535A (en) * 2021-06-03 2021-07-02 北京的卢深视科技有限公司 Method for detecting key point and detecting network training, electronic equipment and storage medium
CN113366491A (en) * 2021-04-26 2021-09-07 华为技术有限公司 Eyeball tracking method, device and storage medium
CN114093012A (en) * 2022-01-18 2022-02-25 荣耀终端有限公司 Face shielding detection method and detection device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109063695A (en) * 2018-09-18 2018-12-21 图普科技(广州)有限公司 A kind of face critical point detection method, apparatus and its computer storage medium
CN109299658A (en) * 2018-08-21 2019-02-01 腾讯科技(深圳)有限公司 Face area detecting method, face image rendering method, device and storage medium
CN109960974A (en) * 2017-12-22 2019-07-02 北京市商汤科技开发有限公司 Face critical point detection method, apparatus, electronic equipment and storage medium
CN111027504A (en) * 2019-12-18 2020-04-17 上海眼控科技股份有限公司 Face key point detection method, device, equipment and storage medium
CN111860300A (en) * 2020-07-17 2020-10-30 广州视源电子科技股份有限公司 Key point detection method and device, terminal equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960974A (en) * 2017-12-22 2019-07-02 北京市商汤科技开发有限公司 Face critical point detection method, apparatus, electronic equipment and storage medium
CN109299658A (en) * 2018-08-21 2019-02-01 腾讯科技(深圳)有限公司 Face area detecting method, face image rendering method, device and storage medium
CN109063695A (en) * 2018-09-18 2018-12-21 图普科技(广州)有限公司 A kind of face critical point detection method, apparatus and its computer storage medium
CN111027504A (en) * 2019-12-18 2020-04-17 上海眼控科技股份有限公司 Face key point detection method, device, equipment and storage medium
CN111860300A (en) * 2020-07-17 2020-10-30 广州视源电子科技股份有限公司 Key point detection method and device, terminal equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董瑞霞: "结合人脸检测的人脸特征点定位方法研究", 中国优秀硕士学位论文全文数据库 信息科技辑, 15 January 2018 (2018-01-15), pages 49 - 60 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113366491A (en) * 2021-04-26 2021-09-07 华为技术有限公司 Eyeball tracking method, device and storage medium
WO2022226747A1 (en) * 2021-04-26 2022-11-03 华为技术有限公司 Eyeball tracking method and apparatus and storage medium
CN113065535A (en) * 2021-06-03 2021-07-02 北京的卢深视科技有限公司 Method for detecting key point and detecting network training, electronic equipment and storage medium
CN113065535B (en) * 2021-06-03 2021-08-17 北京的卢深视科技有限公司 Method for detecting key point and detecting network training, electronic equipment and storage medium
CN114093012A (en) * 2022-01-18 2022-02-25 荣耀终端有限公司 Face shielding detection method and detection device
CN114093012B (en) * 2022-01-18 2022-06-10 荣耀终端有限公司 Face shielding detection method and detection device

Similar Documents

Publication Publication Date Title
EP3916627A1 (en) Living body detection method based on facial recognition, and electronic device and storage medium
CN107358149B (en) Human body posture detection method and device
CN109284733B (en) Shopping guide negative behavior monitoring method based on yolo and multitask convolutional neural network
CN101142584B (en) Method for facial features detection
CN112560584A (en) Face detection method and device, storage medium and terminal
JP7246104B2 (en) License plate identification method based on text line identification
CN112633144A (en) Face occlusion detection method, system, device and storage medium
CN109934847B (en) Method and device for estimating posture of weak texture three-dimensional object
US8019164B2 (en) Apparatus, method and program product for matching with a template
CN112381775A (en) Image tampering detection method, terminal device and storage medium
CN110378254B (en) Method and system for identifying vehicle damage image modification trace, electronic device and storage medium
CN109117746A (en) Hand detection method and machine readable storage medium
CN112836625A (en) Face living body detection method and device and electronic equipment
CN112200056A (en) Face living body detection method and device, electronic equipment and storage medium
CN111325107A (en) Detection model training method and device, electronic equipment and readable storage medium
CN113487610A (en) Herpes image recognition method and device, computer equipment and storage medium
CN110909685A (en) Posture estimation method, device, equipment and storage medium
CN111784658A (en) Quality analysis method and system for face image
CN112766028A (en) Face fuzzy processing method and device, electronic equipment and storage medium
CN110930384A (en) Crowd counting method, device, equipment and medium based on density information
US20170309040A1 (en) Method and device for positioning human eyes
CN113486715A (en) Image reproduction identification method, intelligent terminal and computer storage medium
CN112464827A (en) Mask wearing identification method, device, equipment and storage medium
CN110765898A (en) Method and device for determining object and key point thereof in image
CN115937991A (en) Human body tumbling identification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210819

Address after: 200080 7th floor, No.137 Haining Road, Hongkou District, Shanghai

Applicant after: Shanghai Xinyi Intelligent Technology Co.,Ltd.

Address before: 100190 1008, 10th floor, building 51, 63 Zhichun Road, Haidian District, Beijing

Applicant before: Beijing Xinyi Intelligent Information Technology Co.,Ltd.

TA01 Transfer of patent application right