CN112560584A

CN112560584A - Face detection method and device, storage medium and terminal

Info

Publication number: CN112560584A
Application number: CN202011360224.5A
Authority: CN
Inventors: 郭峰; 单增光; 叶云; 黄冠; 都大龙
Original assignee: Beijing Xinyi Intelligent Information Technology Co ltd
Current assignee: Shanghai Xinyi Intelligent Technology Co.,Ltd.
Priority date: 2020-11-27
Filing date: 2020-11-27
Publication date: 2021-03-26

Abstract

A face detection method and device, a storage medium and a terminal are provided, the method comprises the following steps: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected. By the scheme of the invention, the positions of the key points of the face and whether each key point is shielded or not can be detected by a single detection network model.

Description

Face detection method and device, storage medium and terminal

Technical Field

The invention relates to the field of computer vision, in particular to a face detection method and device, a storage medium and a terminal.

Background

At present, the face detection is widely applied to a plurality of scenes such as financial payment, entrance guard attendance, electronic commerce and the like, and brings great convenience to the life of people. The face detection mainly comprises a process of positioning and analyzing a face in an image, and specifically, in the process of face detection, after a face region and a face key point in the image are detected, face image analysis tasks such as face recognition, face expression recognition and the like are required. The detection of the key points of the face (e.g., left eye, right eye, nose tip, left mouth corner, right mouth corner, etc.) is a crucial link in the face detection, and the key points of the face in the image need to be accurately detected to accurately analyze the face in the image.

In the practical application process, a situation that the key points of the face are blocked (for example, the key points of the face are blocked by objects such as sunglasses, masks, caps, and the like) often occurs, but in the prior art, the determination of whether the key points of the face are blocked is not included when the key points of the face are located.

Therefore, a face detection method capable of determining the occlusion state of key points of a face in an image is needed.

Disclosure of Invention

The technical problem solved by the invention is how to determine the shielding state of the key points of the human face in the image.

In order to solve the above technical problem, an embodiment of the present invention provides a face detection method, where the method includes: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.

Optionally, the training data further includes a position of a face region of each sample face image, and the method further includes: and when the trained detection network model is adopted to detect the face image to be detected, determining the face area of the face image to be detected.

Optionally, each face key point has a corresponding anchor point, and the position of each face key point is determined in the following manner: for each anchor point, determining the position of the anchor point according to the positions of corresponding face key points in the plurality of sample face images; and calculating the offset of each face key point relative to the corresponding anchor point to obtain the position of the face key point.

Optionally, the detecting the face image to be detected by using the trained detection network model to determine the position and the shielding state of the face key point in the face image to be detected includes: determining key point data of a face to be detected by adopting the trained detection network model, wherein the key point data of the face to be detected comprises the offset of the key point of the face in the image of the face to be detected relative to the anchor point corresponding to the key point; and superposing the offset of the face key point in the face image to be detected relative to the anchor point corresponding to the face image to be detected at the position of the anchor point so as to determine the position of the face key point in the face image to be detected.

Optionally, the facial key point data to be detected further includes a regression value of the occlusion state of the facial key point in the facial image to be detected, and detecting the facial image to be detected by using the trained detection network model further includes: and comparing the occlusion state regression value with a preset threshold value, and if the occlusion state regression value is larger than the preset threshold value, determining that the key point of the face is not occluded.

Optionally, before comparing the occlusion state regression value with a preset threshold, the method further includes: determining the preset threshold value; determining the preset threshold comprises: the method comprises the following steps: setting an initial value of the preset threshold value and acquiring a plurality of verification face images; step two: for each verification face image, detecting whether the face key points in the verification face image are shielded by adopting the trained detection network model, and judging whether the detection result is accurate; step three: counting the detection accuracy of the detection network model; step four: and comparing the accuracy with a preset accuracy threshold, if the accuracy is smaller than the preset accuracy threshold, adjusting the preset threshold, and returning to the second step until the accuracy is not lower than the preset accuracy threshold.

Optionally, training a single detection network model by using the training data to obtain a trained detection network model includes: constructing a loss function, wherein the loss function is used for calculating regression loss values of face key points in each sample face image, and the regression loss values of the face key points comprise position regression loss values and occlusion state regression loss values; and training the detection network model according to the loss function and the training data until the position regression loss value and the shielding state regression loss value are both smaller than a preset loss value.

Optionally, the detecting network model includes a backbone network, a feature pyramid network, and a prediction network, where the backbone network, the feature pyramid network, and the prediction network all include a preset number of levels, and detecting the face image to be detected using the trained detecting network model includes: fusing the features of the face image to be detected extracted from the ith layer of the backbone network and the up-sampling features of the ith layer of the feature pyramid network on the ith layer of the feature pyramid network to obtain the fusion features of the ith layer of the feature pyramid network; detecting the position and the shielding state of a face key point in the face image to be detected according to the fusion feature of the ith layer of the feature pyramid network on the ith layer of the prediction network; wherein i is more than or equal to 1 and less than or equal to N, and i and N are preset positive integers.

In order to solve the above technical problem, an embodiment of the present invention further provides a face detection apparatus, where the apparatus includes: the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring training data, the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the position and the shielding state of each face key point; the training module is used for training a single detection network model by adopting the training data to obtain a trained detection network model; and the detection module is used for detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.

The embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the above-mentioned face detection method.

The embodiment of the invention also provides a terminal, which comprises a memory and a processor, wherein the memory is stored with a computer program capable of running on the processor, and the processor executes the steps of the human face detection method when running the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects: the embodiment of the invention provides a face detection method, which comprises the following steps: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected. In the embodiment of the invention, the face key point data in the training data comprises the position and the shielding state of each face key point, and the training data is adopted to train a single detection network model, so that the single detection network model can simultaneously learn the position characteristics and the shielding state characteristics of the face key points, and when the trained detection network model is used for detecting the face image to be detected, the position of the face key point in the face image to be detected can be detected, and the shielding state of the face key point can also be detected, so that the face image analysis can be carried out subsequently.

Furthermore, in the embodiment of the invention, the corresponding anchor point is determined according to the positions of the face key points in the plurality of sample face images, so that the obtained positions of the anchor points can embody the characteristics of the positions of the face key points corresponding to the anchor points in the sample face images, the determined positions of the anchor points are more optimized, and the accuracy of positioning the face key points can be improved.

Furthermore, the training data in the embodiment of the present invention further includes the position of the face region of each sample face image, when the training data is used to train the detection network model, the detection network model can also learn the characteristics of the position of the face region, and when the trained detection network model is used to detect the face key points in the face image to be detected, the face region of the face image to be detected can be determined together, so that the positioning of the face region, the positioning of the face key points, and the judgment of the shielding state can be performed synchronously, and the detection speed is increased.

Drawings

Fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention.

Fig. 2 is a scene schematic diagram of a face detection method in an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a detection network model according to an embodiment of the present invention.

Fig. 4 is a schematic diagram illustrating an effect of a face detection method according to an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention.

Detailed Description

As described above, a face detection method capable of determining the occlusion state of key points of a face in an image is needed.

The inventor of the invention discovers through research that in the prior art, the positioning of the key points of the human face and the detection of the shielding state are implemented step by step, and the positioning of the key points of the human face and the detection of the shielding state are respectively completed by two network models. Specifically, firstly, the position of a face key point in a face image to be detected is determined by using a face key point positioning model, a plurality of corresponding face region images are obtained by dividing according to the position of the face key point in the face image to be detected, and the face region images are input into a face shielding detection model to detect face shielding. The method ignores the internal relation between the two tasks of positioning the key points of the human face and judging the shielding state, which causes low overall detection efficiency, and particularly when a large-scale human face scene is processed (namely, a human face image to be detected contains a large number of human faces), the detection method has low detection efficiency and obvious performance bottleneck.

In order to solve the above technical problem, an embodiment of the present invention provides a face detection method, where the method includes: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points; training a single detection network model by using the training data to obtain a trained detection network model; and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected. In the embodiment of the invention, the face key point data in the training data comprises the position and the shielding state of each face key point, and the training data is adopted to train a single detection network model, so that the single detection network model can simultaneously learn the position characteristics and the shielding state characteristics of the face key points, and when the trained detection network model is used for detecting the face image to be detected, the position of the face key point in the face image to be detected can be detected, and the shielding state of the face key point can also be detected, so that the face image analysis can be carried out subsequently.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Referring to fig. 1, fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention. The face detection method may be executed by a terminal, and the terminal may be any appropriate terminal, such as a mobile phone, a computer, an internet of things device, and the like, but is not limited thereto. The method may be used to detect the positions and the occlusion states of key points of a face in a face image to be detected, and the terminal may detect the positions of faces (e.g., left eye, right eye, nose tip, left mouth corner, and right mouth corner) in the face image to be detected, and determine whether each key point of the face is occluded, but is not limited thereto. The face image to be detected may be an image acquired by the terminal in real time, may also be an image pre-stored in the terminal, and may also be an image received by the terminal from the outside, but is not limited thereto.

It should be noted that the face key points may be specific portions of a predetermined face, specifically, the face key points to be detected may be left eyes and right eyes, or the face key points to be detected may be left eyes, right eyes, nose tips, left mouth corners, and right mouth corners.

With continuing reference to fig. 1, the face detection method shown in fig. 1 may specifically include the following steps:

step S101: acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points;

step S102: training a single detection network model by using the training data to obtain a trained detection network model;

step S103: and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.

In the specific implementation of step S101, when the terminal acquires the training data, it may first acquire a plurality of sample face images, and then determine face key point data in each sample face image to obtain the training data, where the face key point data includes the position and the shielding state of each face key point.

Specifically, the terminal may acquire a plurality of sample face images from the outside, or may select at least a part of the sample face images from a training data set stored locally as the sample face images. The sample face image can be provided with an identification graph, and the identification graph is used for indicating the positions of all face key points in the sample face image. The identification graph may also be used to indicate an occlusion state of each face key point in the sample face image, for example, if the identification graph is rectangular, the face key point is not occluded, and if the identification graph is circular, the face key point is occluded, that is, the occlusion state of the face key point may be represented by different shapes of the identification graph, and different occlusion states of the key points may also be distinguished according to different colors of the identification graph, but the present invention is not limited thereto.

It should be noted that the identification pattern may be pre-marked on the sample face image, or may be obtained by operating the sample face image after the terminal acquires the sample face image, for example, after the terminal acquires the sample face image, the identification pattern may be manually marked on the sample face image, or the identification pattern may be automatically marked in the sample face image by the terminal.

Further, the terminal can determine the position and the shielding state of each face key point by searching and identifying the identification graph in each sample face image, so as to obtain the face key point data of each sample face image. For example: after the terminal acquires a plurality of sample face images, the positions of the face key points in the sample face images are obtained by searching the identification graphs in the sample face images, and the shielding state of the face key points can be determined according to the shapes of the identification graphs.

It should be noted that each face key point has a corresponding anchor point, for example, the face key points to be detected are a left eye, a right eye, a nose tip, a left mouth corner and a right mouth corner, the left eyes in the multiple sample face images all correspond to the first anchor point, the right eyes in the multiple sample face images all correspond to the second anchor point, the nose tips in the multiple sample face images all correspond to the third anchor point, the left mouth corners in the multiple sample face images all correspond to the fourth anchor point, and the right mouth corners in the multiple sample face images all correspond to the fifth anchor point, but not limited thereto.

Further, for each anchor point, the position of the anchor point can be determined according to the positions of the face key points corresponding to the anchor point in the plurality of sample face images. Specifically, the positions of the key points corresponding to the anchor points in the plurality of sample face images may be counted, and the result obtained by the counting may be used as the position of the anchor point. For example, according to the positions of the left eyes in the multiple sample face images, the coordinate values of the left eyes in the multiple sample face images are averaged to obtain the position of the first anchor point corresponding to the left eye. Compared with a method for directly setting the anchor point position, the method for determining the anchor point position by adopting the statistical method can enable the anchor point position to embody the characteristics of the corresponding human face key point position, so that the detection network model can learn the characteristics of the human face key point position more easily, and the detection result is more accurate.

Further, after the positions of the anchor points are determined, the offset of each face key point in the sample face image relative to the anchor point corresponding to the face key point can be calculated, and the offset of each face key point in each sample image relative to the anchor point corresponding to the face key point is added into the training data, that is, the offset of each face key point in each sample face image relative to the anchor point corresponding to the face key point is the position of each face key point in the training data.

Furthermore, the terminal can also identify the shielding state of each face key point according to the identification graph in each sample face image, and different shielding state values can be set for the face key points in different shielding states. For example: if the face key point is identified to be not shielded, the shielding state value of the face key point can be determined to be 1, if the face key point is identified to be shielded, the shielding state value of the face key point can be determined to be 0, and the shielding state value of each face key point in each sample face image is added into the training data.

In one non-limiting embodiment of the invention, forEach face key point can be a three-dimensional vector

And representing the corresponding face key point data, wherein the training data comprises three-dimensional vectors corresponding to the face key points. Therefore, when the training data is adopted to train a single detection network model, the position characteristic and the shielding state characteristic of the key point of the face can be learned by the single detection network model at the same time.

Specifically, in the three-dimensional vector, x is used to represent a component of an offset of the face key point relative to its corresponding anchor point in a first coordinate axis direction, y is used to represent a component of the offset of the face key point relative to its corresponding anchor point in a second coordinate axis direction, and z is used to represent an occlusion state value of the face key point. The value of z is 0 or 1, when z is 0, the face key point is occluded, and when z is 1, the face key point is not occluded, but is not limited thereto.

Further, the training data may further include the position of the face region in each sample face image. Specifically, the sample face image may further include a specific identification pattern for indicating the face region. The terminal can determine the position of the face region in the sample face image by searching and recognizing a specific identification pattern for indicating the face region.

In a specific implementation of step S102, a single detection network model may be trained using the training data to obtain a trained detection network model. That is, the detection network model for detecting the position of the key point of the face in the image to be detected and the detection network model for detecting the shielding state of the key point of the face are the same detection network model. Because the face key point data in the training data comprises the position and the shielding state of each face key point, when the training data is adopted to train the detection network model, the detection network model can simultaneously learn the position characteristics and the shielding state characteristics of the face key points, so that the position of the face key points and the shielding state of the face key points can be detected when the trained detection network model is adopted to detect the face image to be detected.

Further, when the training data includes the position of the face region in the sample face image, and the training data is adopted for training the detection network model, the detection network model can also learn the position characteristics of the face region, and therefore the detection network model after training can also be adopted for detecting the face region in the face image to be detected.

In particular, the detection network model may comprise one or more detection network elements. The detection network model can comprise a first detection network unit, the first detection network unit is used for detecting the position and the shielding state of key points of the face in the face image to be detected, the detection network model can also comprise a second detection network unit, and the second detection network unit can be used for detecting the face area in the image to be detected.

Further, a loss function may be constructed, and the detection network model may be trained using the training data and the loss function. The loss function is used for calculating a regression loss value of a face key point in each sample face image, wherein the regression loss value of the face key point comprises a position regression loss value and an occlusion state regression loss value, the position regression loss value can be a difference value between an offset of the face key point, which is calculated by a detection network model, relative to a corresponding anchor point of the face key point and an actual offset in the training data, and the occlusion state regression value can be a difference value between an occlusion state value of the face key point, which is calculated by the detection network model, and an actual occlusion state value in the training data; and then training the detection network model according to the loss function and the training data, namely, after calculating the regression loss value of the key point of the face, updating the parameters of the detection network model by utilizing random gradient descent and back propagation errors, and calculating the regression loss value of the key point of the face again until the position regression loss value and the occlusion state regression loss value are both smaller than a preset loss value. That is, when both the position regression loss value and the occlusion state regression loss value are smaller than the preset loss value, the trained detection network model is obtained. The preset loss value can be manually preset or determined by the terminal through calculation.

As a non-limiting example, the loss function may be smoothL1 loss function, may be a binary softmax loss function, and may be any other appropriate loss function, which is not limited herein.

Further, a first loss function may be constructed, and the first detection network unit is trained by using the face key point data in the training data and the first loss function, that is, the same loss function is used to train the part of the detection network model for detecting the position of the face key point and the shielding state of the face key point. A second loss function may also be constructed, and the second detection network element is trained using the position of the face region in the training data and the second loss function.

In the specific implementation of step S103, the face image to be detected is input into the trained detection network model to obtain face key point data to be detected, where the face key point data to be detected may include an offset of a face key point in the face image to be detected with respect to an anchor point corresponding thereto, and the offset of the face key point in the face image to be detected with respect to the anchor point corresponding thereto is superimposed on the position of the anchor point to determine the position of the face key point in the face image to be detected.

Referring to fig. 2, fig. 2 is a scene schematic diagram of a face detection method in an embodiment of the present invention.

Specifically, after the face image 21 to be detected is input into the trained detection network model, the position of the anchor point 22 is determined on the face image 21 to be detected. It should be noted that, when the trained detection network model is used to detect the face image to be detected, the positions of the anchor points 22 corresponding to the key points of each face are the same as the positions of the anchor points determined when the training data is obtained. In particular, the location of the anchor point 22 may be determined by the detection network model when the training data is acquired; the locations of the anchor points 22 may also be determined by other network models when the training data is acquired, and the locations of the anchor points 22 are given to the detection network model after the training data is acquired.

Further, the trained detection network model calculates the offset 23 of the face key point 24 in the face image 21 to be detected relative to the corresponding anchor point, and superimposes the offset 23 of the face key point in the face image to be detected relative to the anchor point corresponding to the face key point on the position of the anchor point 22, so that the position of the face key point 24 in the face image to be detected can be determined, and the position of the face key point 24 in the face image 21 to be detected is determined.

With continued reference to fig. 1, the trained detection network model may be used to determine the occlusion state of the face key point while obtaining the position of the face key point. Specifically, after the face image to be detected is input into the trained detection network model, the trained detection network model can also calculate an occlusion state regression value of each face key point, and the occlusion state regression value is used for indicating the occlusion state of each face key point in the face image to be detected.

Further, the occlusion state regression value of each face key point is compared with a preset threshold value, if the occlusion state regression value is greater than the preset threshold value, the corresponding face key point is determined to be not occluded, and if the occlusion state regression value does not exceed the preset threshold value, the corresponding face key point is determined to be occluded.

It should be noted that the preset threshold may be received from the outside by the terminal, may be predetermined manually, or may be determined by calculation performed by the terminal, but is not limited thereto. Preferably, the preset threshold is 0.5, when the regression value of the occlusion state calculated by the trained detection network model is greater than 0.5, it is determined that the corresponding key point of the face is not occluded, and if the regression value of the occlusion state is not greater than 0.5, it is determined that the corresponding key point of the face is occluded.

In a non-limiting embodiment of the present invention, the preset threshold is determined by a terminal, that is, before comparing the occlusion state regression value with the preset threshold, the method may further include determining the preset threshold. The preset threshold may be determined by the detection network model.

Specifically, determining the preset threshold may include: the method comprises the following steps: setting an initial value of the preset threshold value and acquiring a plurality of verification face images, wherein the number of face key points in the verification face images is the same as the number of corresponding face key points in the sample face images; step two: for each verification face image, adopting the trained detection network model to sequentially detect whether each face key point is shielded or not, and judging whether the detection result of each face key point is accurate or not; step three: counting the detection accuracy of the detection network model; step four: and comparing the accuracy with a preset accuracy threshold, if the accuracy is smaller than the preset accuracy threshold, adjusting the preset threshold, and returning to the second step until the accuracy is not lower than the preset accuracy threshold. Wherein the preset accuracy threshold may be previously received by the terminal from the outside.

Further, when the training data further includes the position of the face region of each sample face image, the trained detection network model is adopted to detect the face image to be detected, and the face region in the face image to be detected can also be determined together.

Specifically, when the detection network model is trained by using the training data, the second detection network unit in the detection network model may be trained by using the position of the face region of each sample face image in the training data. After the network face image to be detected is input into the trained detection network model, the second detection network unit can be adopted to determine the face area in the face image to be detected.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a detection network model according to an embodiment of the present invention, where the detection network model 31 shown in fig. 3 includes a backbone network 32, a feature pyramid network 33, and a prediction network 34, and the backbone network 32, the feature pyramid network 33, and the prediction network 34 all include a preset number of levels. The backbone network 32 and the feature pyramid network 33 are used for extracting features in the face image to be detected, and the prediction network 34 detects the positions and the shielding states of key points of the face in the face image to be detected based on the features extracted by the backbone network 32 and the feature pyramid network 33.

Further, when the trained detection network model 31 is used for detecting a face image to be detected, the features of the face image to be detected extracted from the ith layer of the backbone network 32 and the upsampling features of the ith layer of the feature pyramid network 33 are fused on the ith layer of the feature pyramid network to obtain the fusion features of the ith layer of the feature pyramid network 33; detecting the position and the shielding state of a key point of the face in the face image to be detected according to the fusion feature of the ith layer of the feature pyramid network 33 on the ith layer of the prediction network 34; wherein i is more than or equal to 1 and less than or equal to N, and i and N are preset positive integers.

Specifically, different levels correspond to different resolutions, with higher levels (i.e., larger i) having lower resolution but richer semantic information, and lower levels (i.e., smaller i) having higher resolution but less semantic information.

In the embodiment of the invention, the backbone network 32 is adopted to extract the features of the face image to be detected at different levels, specifically, the backbone network 32 performs downsampling on the face image to be detected according to a path from bottom to top (the resolution of the face image to be detected is decreased progressively), and extracts the features of the face image to be detected at different resolutions. The feature pyramid network 33 may perform upsampling according to a top-down path (resolution of the face image to be detected is increased) to obtain an upsampled feature of each level. At each level of the feature pyramid network 33, the features extracted from the corresponding level of the backbone network 32 may be combined with the upsampled features of the level in the feature pyramid network 33 to obtain the fused features of the level, where the upsampled features of the level in the feature pyramid network 33 are obtained by sampling the fused features of the previous level. Therefore, each level of the feature pyramid network 33 can combine features with low resolution and strong semantics with features with high resolution and weak semantics, so that each level of the feature pyramid network 33 has features with high resolution and strong semantics, and the accuracy of the detection result is improved.

Further, the positions and the shielding states of the face key points in the face image to be detected are detected in each level in the prediction network 34 according to the fusion features of the corresponding levels in the feature pyramid network 33, and finally the positions and the shielding states of the face key points in the face image to be detected are determined by adopting a non-maximum suppression method according to the positions and the shielding states obtained in each level.

Further, the position of the face region in the face image to be detected may also be detected in each level in the prediction network 34 according to the fusion feature of the corresponding level in the feature pyramid network 33, and finally, the face-related region in the face image to be detected is determined by using a non-maximum suppression method according to the result obtained in each level.

With reference to fig. 1, after the trained detection network model is used to determine the positions and the shielding states of the face key points in the face image to be detected, the first identification pattern may be used to mark each shielded face key point in the face image to be detected, the second identification pattern may be used to mark each non-shielded face key point, or the third identification pattern may be used to mark a face region in the face image to be detected after the face region of the face image to be detected is determined.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating an effect of a face detection method according to an embodiment of the present invention. After the terminal detects the face image 41 to be detected by using the method shown in fig. 1, the positions and the shielding states of 5 key points (a left eye 42, a right eye 43, a nose tip 44, a left mouth corner 45 and a right mouth corner 46) in the face image 41 to be detected can be obtained and respectively marked in the face image to be detected, wherein the nose tip 44, the left mouth corner 45 and the right mouth corner 46 are shielded and marked by using a first identification graph; the left and

right eyes

42, 43 are unobstructed and marked with a second logo graphic. In addition, a third identification pattern is used to mark a face region 47 in the face image to be detected.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a face detection apparatus according to an embodiment of the present invention. The face detection device in the embodiment of the present invention may include an obtaining module 51, a training module 52, and a detection module 53.

The obtaining module 51 is configured to obtain training data, where the training data includes a plurality of sample face images and face key point data of each sample face image, and the face key point data includes positions and shielding states of each face key point; the training module 52 is configured to train a single detection network model by using the training data to obtain a trained detection network model; the detection module 53 is configured to detect a to-be-detected face image by using the trained detection network model, so as to determine the position and the shielding state of a face key point in the to-be-detected face image.

For more details of the working principle and the working mode of the face detection apparatus, reference may be made to the related descriptions in fig. 1 to fig. 4, which are not repeated herein.

The embodiment of the present invention further provides a storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the above-mentioned face detection method. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.

The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with a computer program which can run on the processor. The processor, when running the computer program, may perform the steps of the face detection method shown in fig. 1. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document indicates that the former and latter related objects are in an "or" relationship.

The "plurality" appearing in the embodiments of the present application means two or more.

The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent the order or the particular limitation of the number of the devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. A face detection method, comprising:

acquiring training data, wherein the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the positions and the shielding states of all face key points;

training a single detection network model by using the training data to obtain a trained detection network model;

and detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.

2. The method of claim 1, wherein the training data further includes a location of a face region of each sample face image, the method further comprising:

and when the trained detection network model is adopted to detect the face image to be detected, determining the face area of the face image to be detected.

3. The face detection method of claim 1, wherein each face keypoint has a corresponding anchor point, and the position of each face keypoint is determined as follows:

for each anchor point, determining the position of the anchor point according to the positions of corresponding face key points in the plurality of sample face images;

and calculating the offset of each face key point relative to the corresponding anchor point to obtain the position of the face key point.

4. The face detection method according to claim 3, wherein detecting the face image to be detected by using the trained detection network model to determine the position and the shielding state of the face key point in the face image to be detected comprises:

determining key point data of a face to be detected by adopting the trained detection network model, wherein the key point data of the face to be detected comprises the offset of the key point of the face in the image of the face to be detected relative to the anchor point corresponding to the key point;

and superposing the offset of the face key point in the face image to be detected relative to the anchor point corresponding to the face image to be detected at the position of the anchor point so as to determine the position of the face key point in the face image to be detected.

5. The face detection method according to claim 4, wherein the face key point data to be detected further includes an occlusion state regression value of a face key point in the face image to be detected, and the detecting the face image to be detected by using the trained detection network model further includes:

and comparing the occlusion state regression value with a preset threshold value, and if the occlusion state regression value is larger than the preset threshold value, determining that the key point of the face is not occluded.

6. The method according to claim 5, wherein before comparing the occlusion state regression value with a preset threshold, the method further comprises: determining the preset threshold value;

determining the preset threshold comprises:

the method comprises the following steps: setting an initial value of the preset threshold value and acquiring a plurality of verification face images;

step two: for each verification face image, detecting whether the face key points in the verification face image are shielded by adopting the trained detection network model, and judging whether the detection result is accurate;

step three: counting the detection accuracy of the detection network model;

step four: and comparing the accuracy with a preset accuracy threshold, if the accuracy is smaller than the preset accuracy threshold, adjusting the preset threshold, and returning to the second step until the accuracy is not lower than the preset accuracy threshold.

7. The method of claim 1, wherein training a single detection network model using the training data to obtain a trained detection network model comprises:

constructing a loss function, wherein the loss function is used for calculating regression loss values of face key points in each sample face image, and the regression loss values of the face key points comprise position regression loss values and occlusion state regression loss values;

and training the detection network model according to the loss function and the training data until the position regression loss value and the shielding state regression loss value are both smaller than a preset loss value.

8. The method of claim 1, wherein the detection network model comprises a backbone network, a feature pyramid network and a prediction network, the backbone network, the feature pyramid network and the prediction network each comprise a preset number of levels, and the detecting the image of the face to be detected by using the trained detection network model comprises:

fusing the features of the face image to be detected extracted from the ith layer of the backbone network and the up-sampling features of the ith layer of the feature pyramid network on the ith layer of the feature pyramid network to obtain the fusion features of the ith layer of the feature pyramid network;

detecting the position and the shielding state of a face key point in the face image to be detected according to the fusion feature of the ith layer of the feature pyramid network on the ith layer of the prediction network;

wherein i is more than or equal to 1 and less than or equal to N, and i and N are preset positive integers.

9. An apparatus for face detection, the apparatus comprising:

the system comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring training data, the training data comprises a plurality of sample face images and face key point data of each sample face image, and the face key point data comprises the position and the shielding state of each face key point;

the training module is used for training a single detection network model by adopting the training data to obtain a trained detection network model;

and the detection module is used for detecting the face image to be detected by adopting the trained detection network model so as to determine the position and the shielding state of the key point of the face in the face image to be detected.

10. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, performs the steps of the face detection method according to any one of claims 1 to 8.

11. A terminal comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the face detection method according to any of claims 1 to 8.