CN111160169B

CN111160169B - Face detection method, device, equipment and computer readable storage medium

Info

Publication number: CN111160169B
Application number: CN201911313661.9A
Authority: CN
Inventors: 熊军
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2024-03-15
Anticipated expiration: 2039-12-18
Also published as: CN111160169A

Abstract

The invention provides a face detection method, a device, equipment and a computer readable storage medium, wherein the face detection method comprises the following steps: acquiring a sample image in real time; training the sample image based on a target object detection algorithm to obtain a human head detection network model; inputting a target image to be detected into a human head detection network model to obtain human head boundary box information, and intercepting a sub-image containing the human head on the target image according to the human head boundary box information; and carrying out skin color clustering on the sub-images, and calculating the skin ratio of the sub-images to realize head detection. The face detection method provided by the invention is used for detecting the head of a person in real time based on the target object detection algorithm, and the detected head of a person picture is further used for obtaining the skin occupation area based on the skin color clustering model to judge whether the face is a face or a non-face, so that the face detection method can be used for a head counting scene and a face recognition scene, and can improve the detection speed and accuracy.

Description

Face detection method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of face recognition technologies, and in particular, to a face detection method, device, apparatus, and computer readable storage medium.

Background

Detection of human activity has played a key role in many applications such as video automatic monitoring, human-computer interaction, etc. In automatic recognition systems implemented with computers, detection of a person's head is considered one of the most effective means of detecting person activity. Head detection has been applied in many fields today, such as detection for pedestrians and detection for passenger traffic. For passenger flow detection, a camera is usually installed above the head of a person to shoot from top to bottom. The main task of human head detection is to determine the size and position of the human head portion. The real-time image acquired by the camera is used as input, and the automatic identification system processes the real-time image and outputs mathematical description information about whether the head exists in the image or the number, the position and the like of the head.

The camera shoots from top to bottom, and the obtained head image is a quasi-circular target. Human head detection methods based on such environments have been developed to a great extent, each of which has its limitations. For example, four edge points are randomly extracted, and whether a possible circular target exists is judged by calculating the distance between the four edge points, and the method has low hit rate under the environment with high noise-interference ratio, so that the detection speed is influenced. The fuzzy C-means clustering method is difficult to use in the case that the number of clusters is randomly changed because the number of clusters is first determined. The most common method for detecting round-like targets is the Hough Transform (HT). HT is widely used because it is insensitive to noise and robust to discontinuous edges. However, the Hough transformation not only occupies a large amount of memory, but also has prominent real-time problem of HT for detecting circular targets with three-dimensional parameter space. Many algorithms have been developed later to increase HT speed and reduce memory requirements, such as Random Hough Transform (RHT), hough transform (GHT) based on gradient information, and the like. These improved Hough transforms are difficult to solve in real-time when the noise-to-interference ratio is large.

Based on the above-mentioned drawbacks of the prior art methods, it is necessary to provide a new face detection method, apparatus, device and computer readable storage medium to meet the requirements of efficient and accurate face detection and recognition.

Disclosure of Invention

The invention provides a face detection method, a device, equipment and a computer readable storage medium, and mainly aims to provide a face detection method which is simple and high in accuracy to replace the existing face detection method.

In order to achieve the above object, the present invention further provides a face detection method, which includes the following steps:

acquiring a sample image in real time;

training the sample image based on a target object detection algorithm to obtain a human head detection network model, wherein the human head detection network model is used for detecting human heads in a target image to be detected;

inputting a target image to be detected into the human head detection network model to obtain human head boundary box information, and intercepting a sub-image containing the human head on the target image according to the human head boundary box information;

performing skin color clustering on the sub-images, and connecting parts containing skin colors into a connected area;

calculating the skin ratio of the sub-images, and if the skin ratio is larger than a first preset threshold value, detecting the communication area;

and judging whether the distribution of the communication area in the sub-image meets a preset rule, if so, judging that the target image is a head, otherwise, judging that the target image is a non-head.

Preferably, the method for training the sample image based on the target object detection algorithm to obtain the human head detection network model includes:

sequentially preprocessing the sample images acquired in real time to obtain a plurality of processed sample images so as to form a pedestrian sample set;

randomly dividing the pedestrian sample set into a training set and a verification set according to a certain proportion;

acquiring head marking information of the pre-marking in the verification set, and manually marking the head by using a marking tool to generate a corresponding tag file;

target prediction is carried out through the training set;

and comparing the result of the training set target prediction with a tag file of a verification set to realize the training set parameter verification and model calibration.

Preferably, the method for predicting the target through the training set comprises the following steps:

dividing the sample image into a plurality of grids for predicting probability values and regression values and determining a boundary box;

suppressing and screening the boundary box by using the confidence coefficient and the non-maximum value; each bounding box corresponds to a confidence level, non-maximum suppression is performed by setting a confidence level threshold, bounding boxes with confidence levels lower than the threshold are removed, and the union of bounding boxes with confidence levels higher than the threshold is taken as a prediction result.

Preferably, the method for parameter verification and model calibration through the verification set comprises the following steps:

the human head is detected through a human head detection network model, and the obtained human head boundary frame information for prediction is compared with human head boundary frame information for reference to obtain a mean square value and an error value of the human head boundary frame information for prediction and the human head boundary frame information for reference;

and optimizing parameters of the human head detection network model by taking the mean square value and the error value as loss functions.

Preferably, comparing the parameters of the optimized human head detection network model with the parameters marked in the verification set, and observing whether the human head is detected correctly;

if the accuracy and recall ratio of the human head detection network model reach preset indexes, inputting a target image to be detected into the human head detection network model for human head detection;

if the accuracy and recall ratio of the human head detection network model do not reach the preset indexes, a new training set is added to train the human head detection network model until the accuracy and recall ratio of the human head detection network model reach the preset indexes.

Preferably, the method for clustering the skin color of the sub-image comprises the following steps:

converting the sub-image into an HSV color space;

respectively calculating histograms H1, H2 and H3 of H, S, V three channels in the color sample, and carrying out normalization processing on the histograms H1, H2 and H3;

dividing the sub-image into small areas, respectively calculating H, S, V three-channel histograms h1, h2 and h3 for each small area, and carrying out normalization processing on the h1, h2 and h 3;

adopting Euclidean distance standard, and comparing the histograms H1, H2 and H3 of the sub-images after normalization processing with the histograms corresponding to the small areas one by one in similarity;

and acquiring a preset threshold value of similarity, wherein pixels in an area with similarity higher than the threshold value are set as 255, and pixels in an area with similarity lower than the threshold value are set as 0.

Preferably, the method further comprises: and judging the direction of the head according to the skin ratio in the communication area, and identifying whether the head is a face area or a hindbrain area.

In order to achieve the above object, the present invention also provides a face detection apparatus comprising a memory and a processor, the memory storing a face detection program executable on the processor, the face detection program implementing the steps of the face detection method as described above when executed by the processor.

In order to achieve the above object, the present invention also provides a face detection apparatus including the face detection device as described above.

In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having a face detection program stored thereon, the face detection program being executable by one or more processors to implement the steps of the face detection method as described above.

The face detection method provided by the invention adopts the yolov3 based on the target object detection algorithm for real-time face detection, and obtains the skin occupation area based on the skin color clustering model to judge whether the face is a face or a non-face, so that the face detection method can be used for a face counting scene and a face recognition scene, and greatly improves the detection speed and accuracy.

The technical effects are described as follows:

1. based on a deep learning network of a target object detection algorithm YOLOv3, the network weight of a YOLOv3 model is used for labeling a large amount of non-manually labeled image data, and intelligent labeling of an image recognition item data set is completed through image data labeling and training processes for a plurality of times, and meanwhile, an enhanced model corresponding to the image recognition item is obtained.

2. By means of skin color clustering, the skin color duty ratio is calculated to judge whether the face is a non-face, the complex mathematical formula and the abstract space conversion concept in Hough conversion are avoided, the accuracy is improved, and the method has the characteristics of being small in memory occupation and high in instantaneity.

3. The accurate detection of the face orientation is realized through skin color clustering and the construction of the face orientation filtering.

Drawings

Fig. 1 is a flowchart of a face detection method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an internal structure of a face detection apparatus according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a face detection program in a face detection apparatus according to an embodiment of the present invention.

The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.

The invention provides a face detection method, which adopts a target object detection algorithm yolov3 for real-time human head detection, and obtains a skin occupation area based on a skin color clustering model for the detected human head picture to judge whether a face is a non-face or not. The following description is made in connection with specific embodiments:

embodiment one:

referring to fig. 1, a flow chart of a face detection method according to an embodiment of the invention is shown. The method may be performed by an apparatus, which may be implemented in software and/or hardware.

In this embodiment, the face detection method includes:

step S1: acquiring a sample image of a human head picture in real time, and training the sample image based on a target object detection algorithm yolov3 to obtain a trained human head detection network model;

the method specifically comprises the following steps:

1) Creating a sample set and a tag file

1.1 Sample collection: preprocessing the acquired sample image and manufacturing a pedestrian sample set;

1.2 The size of the collected sample image is converted into 2048 multiplied by 2048, and a sample set formed by the sample image is randomly divided into a training set and a verification set according to a certain proportion;

1.3 Acquiring head labeling information of the pre-labeling in the verification set, and generating a corresponding label file; the human head labeling information is labeled by an expert in advance, and aims to generate a human head detection network model, wherein the human head detection network model is a central coordinate of a boundary frame, and is respectively the width and the height of the boundary frame, class represents the type of a target object, class=0 represents the background, class=1 represents human head recognition;

2) The method comprises the following specific steps of creating and training a human head detection network model based on a target object detection algorithm YOLOv 3:

2.1 5 levels of YOLOv3 network model) were created separately: the method comprises the steps of a convolution layer, a skip connection layer, an up-sampling layer, a routing layer and a YOLOv3 detection layer, and setting network parameters;

2.2 Inputting the training set to conduct target prediction; the specific process comprises the following steps:

al) inputting a training set, dividing a sample image into 16 multiplied by 16 grids, predicting a probability value and 3 bounding boxes by each grid, and predicting five regression values by each bounding box, wherein the five regression values comprise coordinates of a central point of the bounding box, width and height of the bounding box, probability of the bounding box containing a human head and accuracy of the position of the bounding box;

a2 Using confidence and non-maxima suppression filter bounding boxes): confidence Pr (Object) e (0,

1) The probability of including a human head for the bounding box; al is the area of the overlapping part between the predicted bounding box and the artificially marked bounding box, and a2 is the area of the union of the predicted bounding box and the artificially marked bounding box; step al) predicting to obtain a plurality of boundary frames, wherein each boundary frame corresponds to a confidence level, performing non-maximum suppression by setting a confidence level threshold value, removing boundary frames with confidence levels lower than the threshold value, and taking a union of boundary frames with confidence levels higher than the threshold value as a prediction result;

2.3 Update YOLOv3 network model parameters: obtaining human head boundary frame information for reference through manual marking;

detecting the head of a person through a YOLOv3 network model, and comparing the obtained head boundary frame information for prediction with the head boundary frame information for reference to obtain a mean square value and an error value of the head boundary frame information for prediction and the head boundary frame information for reference;

optimizing the parameters of the YOLOv3 network model by taking the mean square value and the error value as a loss function;

the loss function is defined as follows:

wherein, (xi, yi), wi, hi, ci, pi (C) respectively represent the central coordinate, width, height and IoU values of the boundary box obtained through YOLOv3 network prediction, and the probability that the target object in the boundary box is a human head, and respectively correspond to the manually marked values. Specifically, the default manually marked head value is 1, and the background value is 0; lambda coord is the coordinate error weight, lambda noobj is IoU error weight; judging whether the jth boundary box of the grid i is responsible for predicting the target object, and judging whether the center of the target object is in the grid i or not, wherein the specific definition is as follows: the error obtained by the loss function calculation is counter-propagated, and one training is completed; adjusting network parameters, repeating the step 2.3) until the network converges;

2.4 Inputting the verification set into the trained YOLOv3 network model, and comparing the output parameters with the parameters marked by the label file in the verification set to judge whether the head of a person is correctly detected; the accuracy and recall of the resulting YOLOv3 network model is defined as follows: if the accuracy and recall ratio of the YOLOv3 network model reach the indexes, carrying out the subsequent steps of detecting the head of a person and extracting the characteristics, such as step S2; if the accuracy and recall ratio of the YOLOv3 network model do not reach the preset indexes, a new training set is added to train the YOLOv3 network model until the accuracy and recall ratio of the YOLOv3 network model reach the preset indexes.

Step S2: and extracting and inputting a target image to be detected, detecting a human head by using a trained YOLOv3 network model to obtain human head boundary box information, and intercepting a boundary box where the human head is positioned from the target image by the obtained human head boundary box information to obtain a sub-image.

Step S3: the sub-images are subjected to skin color clustering, and head features are extracted, and specifically, in the embodiment, the skin color clustering comprises the following steps:

3.1 -converting said sub-image into an HSV color space (Hue, saturation, value), wherein H represents Hue, S represents Saturation, V represents brightness;

3.2 The sub-image is subjected to threshold segmentation by adopting a color histogram comparison method, and the specific steps are as follows: b1 Respectively calculating histograms H1, H2 and H3 of H, S, V three channels in the color sample, and normalizing the H1, H2 and H3 so as to compare the histograms with the histograms corresponding to the sub-images; b2 Dividing the sub-image into small areas I, respectively calculating H, S, V three-channel histograms h1, h2 and h3 for each small area I, and carrying out normalization processing on the h1, h2 and h 3; b3 Comparing the histograms H1, H2 and H3 of the sub-images after normalization treatment with the corresponding histograms in the small area one by adopting the Euclidean distance standard, namely comparing H1 with H1, comparing H2 with H2, and comparing H3 with H3, wherein the larger the numerical value of the similarity A is, the larger the similarity is; b4 Setting a threshold value of similarity, setting 255 as pixels in a region with similarity higher than the threshold value, and setting 0 as pixels in a region with similarity lower than the threshold value;

3.3 A dilation operation is performed on the image, connecting the portions containing skin tone into connected regions.

Step S4: according to human head boundary frame information and human head characteristics, the skin ratio of the sub-image is calculated through simulation, whether the sub-image is a human face or a non-human face is judged according to the calculated skin ratio, feedback of detection is carried out, and a human face detection result is obtained, and the method specifically comprises the following steps:

4.1 Detecting a skin region in the target image based on the human head bounding box information; if the duty ratio of the skin area in the whole area of the target image is larger than a first preset threshold value, detecting a connected area in the target image;

4.2 Judging whether the distribution of the connected domain in the sub-images meets a preset rule according to the head characteristics, if so, judging that the target image is a head, otherwise, judging that the target image is a non-head. In this embodiment, the preset rule means that the skin position and the skin duty ratio in the connected area conform to the distribution rule of the face on the head.

4.3 The orientation of the head is judged according to the skin ratio in the communication area, if the skin ratio in the communication area is larger than a preset value, the face area is judged, and otherwise, the hindbrain scoop is judged. Meanwhile, detection of the head and the face is realized, and feedback of detection is further carried out to verify the face detection result.

Step S5: the human head orientation is further judged according to the human face detection result, and the method specifically comprises the following steps:

5.1 In order to reflect the orientation offset between the head and the camera caused by the relative motion of the pedestrian and the camera, a motion blur construction filter f is used, the filter f construction procedure being as follows:

constructing a two-dimensional matrix f, wherein the size of the two-dimensional matrix f is required to be capable of exactly accommodating a line segment l with a length of len and a slope of tan theta, and a=len_cos theta+1 and b=len_sin theta on the premise that the size of the matrix is axb; motion blur with a blur angle θ means motion of len pixels in a direction with an angle θ to the horizontal

For a position (i, j) in the matrix f, the minimum distance n_d of the position from the line segment l is calculated:

N_D＝jcosθ-isinθ

calculating a coefficient value at (i, j) from the minimum distance n_d:

f(I,j)＝max(1-N_D,0)；

5.2 Normalized processing f):

5.3 Filtering the image):

and respectively constructing different filters at a plurality of angles by taking the size of a plurality of pixels as the blurring length to carry out filtering processing on the image, adding the obtained multi-scale sampling image into a data set, and enabling all labels to be identical with the original image.

5.4 And repeating the training process of the YOLOV3 model, so that the face orientation can be accurately identified.

Example two

The present embodiment is substantially the same as the first embodiment, and a face detection method is characterized in that the method includes the following steps:

step S1: performing head detection in real time based on a target object detection algorithm YOLOv3 to obtain a head picture;

step S2: detecting a human head by using the trained YOLOv3 network model to obtain human head boundary frame information, and intercepting a boundary frame where the human head is positioned on the human head picture by using the human head boundary frame information to obtain a sub-image;

step S3: clustering the skin colors of the sub-images, and extracting head features;

step S4: according to human head boundary frame information and human head characteristics, simulating and calculating to obtain the skin occupation area; and according to the judgment of the human face or the non-human face obtained by the calculated skin occupation area, feedback of detection is carried out to obtain a human face detection result.

The difference is that in this embodiment, the human head feature extraction in step S3 is an RBG value of the color sample of the human head, and the skin occupation area obtained by following the RBG value skin color clustering model can determine whether the human head is a human face.

The face detection method provided by the invention adopts the yolov3 based on the target object detection algorithm, can detect the head in real time, obtains the skin occupation area based on the skin color clustering model for the detected head picture, judges whether the face is a face or not, can be used for a head counting scene and a face recognition scene, and can improve the detection speed and accuracy.

The invention also provides a face detection device. Referring to fig. 2, an internal structure of a face detection apparatus according to an embodiment of the present invention is shown.

In this embodiment, the face detection device may be a PC (Personal Computer ), or may be a terminal device such as a smart phone, a tablet computer, or a portable computer. The face detection apparatus comprises at least a memory 11, a processor 12, a network interface 13 and a communication bus 14.

The memory 11 includes at least one type of computer-readable storage medium including flash memory, a hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disk, optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the face detection apparatus, such as a hard disk of the face detection apparatus. The memory 11 may in other embodiments also be an external storage device of the face detection apparatus, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the face detection apparatus. Further, the memory 11 may also include both an internal storage unit and an external storage device of the face detection apparatus. The memory 11 may be used not only for storing application software installed in the face detection apparatus and various types of data, such as codes of face detection programs, but also for temporarily storing data that has been output or is to be output.

The processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chip for running program code or processing data stored in the memory 11, such as executing a face detection program or the like.

The network interface 13 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used to establish a communication connection between the face detection apparatus and other electronic devices.

The communication bus 14 is used to enable connection communications between these components.

Fig. 2 shows only a face detection apparatus having components 11 to 14 and a face detection program, and it will be understood by those skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the face detection apparatus, and may include fewer or more components than shown, or may combine certain components, or may be arranged in different components.

In the face detection apparatus embodiment shown in fig. 2, the memory 11 stores a face detection program; the processor 12 performs the following steps when executing the face detection program stored in the memory 11:

performing head detection in real time based on a target object detection algorithm YOLOv3 to obtain a head picture;

detecting a human head by using the trained YOLOv3 network model to obtain human head boundary frame information, and intercepting a boundary frame where the human head is positioned on the human head picture by using the human head boundary frame information to obtain a sub-image;

clustering the skin colors of the sub-images to extract head features;

according to human head boundary frame information and human head characteristics, simulating and calculating to obtain the skin occupation area;

judging whether the face is a human face or not according to the skin occupied area, and obtaining a human face detection result.

In this embodiment, the face detection program includes a series of functional modules configured by computer program instructions executable by a computer processor, and is stored in the memory 11 shown in fig. 2, so as to implement the face detection method of the present invention. For example, referring to fig. 3, a schematic program module of a face detection program in an embodiment of the face detection apparatus of the present invention is shown. In this embodiment, the face detection program may be divided into a head detection module 10, a head feature extraction module 20, and a face detection module 30, exemplarily:

the head detection module 10 is configured to perform head detection in real time based on a target object detection algorithm YOLOv3 to obtain a head picture;

the head detection module 10 is further configured to detect a head by using a trained YOLOv3 network model, so as to obtain head bounding box information;

the human head feature extraction module 20 is used for performing skin color clustering on the sub-images to extract human head features;

the face detection module 30 is configured to calculate a skin occupation area in a simulation manner, and determine whether the face is a non-face or a face according to the skin occupation area to obtain a face detection result.

The functions or operation steps implemented when the program modules such as the head detection module 10, the head feature extraction module 20, and the face detection module 30 are executed are substantially the same as those of the above embodiments, and will not be described herein.

The specific implementation of the face detection operation steps implemented by the face detection device provided by the invention is basically the same as the embodiments of the face detection device and the face detection method, and is not described here.

In addition, an embodiment of the present invention further proposes a computer-readable storage medium, on which a face detection program is stored, the face detection program being executable by one or more processors to implement the following operations:

clustering the skin colors of the sub-images to extract head features;

simulating and calculating to obtain the skin occupying area;

The specific implementation of the face detection operation steps implemented by the face detection program stored in the computer readable storage medium according to the present invention is substantially the same as the above-mentioned embodiments of the face detection apparatus and method, and will not be described here.

The invention provides a face detection method, a face detection device and a computer readable medium storing a face detection program, which can obtain technical effect description:

1. the method is characterized in that a deep learning network based on the YOLOv3 is used for annotating a large amount of non-manually annotated image data by using the network weight of the YOLOv3 model, intelligent annotation of an image recognition item dataset is completed through image data annotation and training processes for a plurality of times, and meanwhile, a strengthening model corresponding to the image recognition item is obtained.

It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A face detection method, the method comprising the steps of:

acquiring a sample image in real time, and training the sample image based on a target object detection algorithm to acquire a human head detection network model, wherein the human head detection network model is used for detecting human heads in the target image;

judging whether the distribution of the communication area in the sub-image meets a preset rule, if so, judging that the target image is a head, otherwise, judging that the target image is a non-head;

the method for clustering the skin color of the sub-images comprises the following steps: converting the sub-image into an HSV color space; respectively calculating histograms H1, H2 and H3 of H, S, V three channels in a color space, and carrying out normalization processing on the histograms H1, H2 and H3; dividing the sub-image into small areas, respectively calculating the histograms h1, h2 and h3 of the H, S, V three channels for each small area, and carrying out normalization processing on the h1, h2 and h 3; adopting Euclidean distance standard, and comparing the histograms H1, H2 and H3 of the sub-images after normalization processing with the histograms corresponding to the small areas one by one in similarity; and acquiring a preset threshold value of similarity, wherein pixels in an area with similarity higher than the threshold value are set as 255, and pixels in an area with similarity lower than the threshold value are set as 0.

2. The face detection method according to claim 1, wherein the training the sample image based on the target object detection algorithm to obtain the human head detection network model comprises:

acquiring head marking information of the pre-marking in the verification set, and generating a corresponding tag file;

target prediction is carried out through the training set;

3. The face detection method according to claim 2, wherein the method for performing target prediction by the training set is:

screening the bounding box by using confidence and non-maximum suppression;

and taking the union of the bounding boxes with confidence above the threshold as the prediction result.

4. A face detection method as claimed in claim 3 wherein the method for comparing the result of training set target prediction with the tag file of the verification set to implement the training set parameter verification and model calibration is as follows:

5. The face detection method of claim 4 wherein comparing the results of the training set target predictions with the tag files of the verification set, the method for implementing the training set parameter verification and model calibration further comprises:

comparing the parameters of the optimized human head detection network model with the parameters marked in the verification set, and observing whether the human head is detected correctly;

6. The face detection method of claim 1, further comprising: and judging the direction of the head according to the skin ratio in the communication area, and identifying whether the head is a face area or a hindbrain area.

7. A face detection apparatus comprising a memory and a processor, the memory storing a face detection program executable on the processor, the face detection program when executed by the processor implementing the steps of the face detection method according to any one of claims 1 to 6.

8. A computer-readable storage medium, having stored thereon a face detection program executable by one or more processors to implement the steps of the face detection method of any of claims 1 to 6.