CN111160169A

CN111160169A - Face detection method, device, equipment and computer readable storage medium

Info

Publication number: CN111160169A
Application number: CN201911313661.9A
Authority: CN
Inventors: 熊军
Original assignee: Ping An Life Insurance Company of China Ltd
Current assignee: Ping An Life Insurance Company of China Ltd
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2020-05-15
Anticipated expiration: 2039-12-18
Also published as: CN111160169B

Abstract

The invention provides a face detection method, a face detection device, face detection equipment and a computer readable storage medium, wherein the face detection method comprises the following steps: acquiring a sample image in real time; training the sample image based on a target object detection algorithm to obtain a human head detection network model; inputting a target image to be detected into a human head detection network model to obtain human head boundary frame information, and intercepting a subimage containing a human head on the target image according to the human head boundary frame information; and carrying out skin color clustering on the subimages, and calculating the skin occupation ratio of the subimages to realize human head detection. The human face detection method provided by the invention is used for real-time human head detection based on a target object detection algorithm, and the skin area is obtained from the detected human head picture based on a skin color clustering model to judge whether the human face or the non-human face is the human face, so that the human face detection method can be used for a human head counting scene and a human face recognition scene, and can improve the detection speed and the detection precision.

Description

Face detection method, device, equipment and computer readable storage medium

Technical Field

The present invention relates to the field of face recognition technologies, and in particular, to a face detection method, apparatus, device, and computer-readable storage medium.

Background

The detection of human activities has played a key role in many applications, such as video automatic monitoring, human-computer interaction, and the like. In automatic recognition systems implemented using computers, detecting the head of a person is considered to be one of the most effective means of detecting human activity. Today, head detection has been applied in many fields, such as detection for pedestrians and detection for passenger traffic. For detecting passenger flow, a camera is usually installed above the head of a person to shoot from top to bottom. The main task of human head detection is to determine the size and position of the human head part. The real-time image collected by the camera is used as input, the automatic identification system processes the real-time image and outputs mathematical description information about whether the head exists in the image or the number, the position and the like of the head.

The camera shoots from top to bottom, and the obtained human head image is a similar circular target. Many methods of head detection based on such environments have been developed, each of which has its limitations. For example, by randomly extracting four edge points and calculating their distances to determine whether there is a possible circular object, the method has a low hit rate in an environment with large noise interference, thereby affecting the detection speed. The fuzzy C-means clustering method is difficult to use in such a case where the number of clusters varies randomly because the number of clusters is determined first. The most common method for detecting a circular-like object is the Hough Transform (HT). HT is widely used because it is not sensitive to noise and is robust to discontinuous edges. However, the Hough transform not only occupies a large amount of memory, but also has a prominent real-time problem of HT for detecting a circular target with a three-dimensional parameter space. Many algorithms for increasing HT speed and reducing memory requirements have been developed later, such as Random Hough Transform (RHT), Hough transform (GHT) based on gradient information, and so on. These improved Hough transforms have difficulty solving real-time problems when noise interference is large.

Based on the above-mentioned drawbacks of the existing methods, it is necessary to provide a new face detection method, device, apparatus and computer-readable storage medium to meet the requirement of efficient and accurate face detection and recognition.

Disclosure of Invention

The invention provides a face detection method, a face detection device, face detection equipment and a computer readable storage medium, and mainly aims to provide a simple and high-precision face detection method to replace the existing face detection method.

In order to achieve the above object, the present invention further provides a face detection method, including the following steps:

acquiring a sample image in real time;

training the sample image based on a target object detection algorithm to obtain a human head detection network model, wherein the human head detection network model is used for detecting the human head in the target image to be detected;

inputting a target image to be detected into the human head detection network model to obtain human head boundary frame information, and intercepting a subimage containing the human head on the target image according to the human head boundary frame information;

carrying out skin color clustering on the subimages, and connecting parts containing skin colors into a communication area;

calculating the skin proportion of the sub-image, and if the skin proportion is larger than a first preset threshold value, detecting the connected region;

and judging whether the distribution of the connected region in the subimages meets a preset rule, if so, judging that the target image is a human head, and otherwise, judging that the target image is a non-human head.

Preferably, the method for training the sample image based on the target object detection algorithm to obtain the human head detection network model comprises:

sequentially preprocessing the sample images acquired in real time to obtain a plurality of processed sample images so as to form a pedestrian sample set;

randomly dividing the pedestrian sample set into a training set and a verification set according to a certain proportion;

acquiring the head marking information pre-marked in the verification set, and manually marking the head by adopting a marking tool to generate a corresponding label file;

target prediction is carried out through a training set;

and comparing the result of the target prediction of the training set with the label file of a verification set to realize the parameter verification and the model calibration of the training set.

Preferably, the method for target prediction by a training set comprises:

dividing the sample image into a plurality of grids for predicting a probability value and a regression value and determining a bounding box;

suppressing the screening bounding box by using the confidence coefficient and the non-maximum value; and each boundary box corresponds to a confidence coefficient, non-maximum value inhibition is executed by setting a confidence coefficient threshold value, the boundary boxes with the confidence coefficients lower than the threshold value are removed, and the union of the boundary boxes with the confidence coefficients higher than the threshold value is taken as a prediction result.

Preferably, the method for performing parameter verification and model calibration through the verification set comprises the following steps:

comparing the head boundary box information for prediction obtained by detecting the head through the head detection network model with the head boundary box information for reference to obtain a mean square value and an error value of the two;

and optimizing parameters of the human head detection network model by taking the mean square value and the error value as loss functions.

Preferably, the parameters of the optimized human head detection network model are compared with the parameters marked in the verification set, and whether the human head is detected correctly is observed;

if the accuracy rate and the recall ratio of the human head detection network model reach preset indexes, inputting a target image to be detected into the human head detection network model to carry out human head detection;

and if the accuracy rate and the recall ratio of the human head detection network model do not reach the preset indexes, adding a new training set to train the human head detection network model until the accuracy rate and the recall ratio of the human head detection network model reach the preset indexes.

Preferably, the method for skin color clustering of the sub-images comprises the following steps:

converting the subimages into an HSV color space;

respectively calculating histograms H1, H2 and H3 of H, S, V three channels in the color sample, and carrying out normalization processing on the histograms H1, H2 and H3;

dividing the sub-image into small areas, respectively calculating H, S, V three-channel histograms h1, h2 and h3 for each small area, and normalizing h1, h2 and h 3;

comparing the histograms H1, H2 and H3 of the sub-images after normalization with the corresponding histograms in the small region one by one according to the Euclidean distance standard;

and acquiring a preset similarity threshold, setting the pixel points in the area with the similarity higher than the threshold as 255, and setting the pixel points in the area with the similarity lower than the threshold as 0.

Preferably, the method further comprises: and judging the orientation of the human head according to the skin proportion in the connected region, and identifying the human face region or the back head region.

In order to achieve the above object, the present invention further provides a face detection apparatus, which includes a memory and a processor, wherein the memory stores a face detection program operable on the processor, and the face detection program implements the steps of the face detection method when executed by the processor.

In order to achieve the above object, the present invention further provides a face detection device, which includes the face detection apparatus as described above.

In addition, to achieve the above object, the present invention also provides a computer-readable storage medium having a face detection program stored thereon, the face detection program being executable by one or more processors to implement the steps of the face detection method as described above.

The human face detection method provided by the invention adopts a target object detection algorithm-based yolov3 for real-time human head detection, and the detected human head picture is used for obtaining the skin area based on the skin color clustering model to judge whether the human face or the non-human face is the human face, so that the method can be used for a human head counting scene and a human face recognition scene, and the detection speed and the detection precision are greatly improved.

Description of technical effects:

the method comprises the steps of firstly, carrying out deep learning network based on a target object detection algorithm YOLOv3, using network weights of a YOLOv3 model to label a large amount of image data which are not labeled artificially, completing intelligent labeling of an image identification project data set through image data labeling and training processes for a plurality of times, and simultaneously obtaining a reinforced model corresponding to the image identification project.

And secondly, calculating the skin color ratio to judge whether the face is a non-face or not through skin color clustering, avoiding complex mathematical formulas and abstract space conversion concepts in Hough transformation, improving the accuracy rate and having the characteristics of small memory occupation and high real-time property.

And thirdly, realizing accurate detection of the face orientation through skin color clustering and filtering for constructing the face orientation.

Drawings

Fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of an internal structure of a face detection apparatus according to an embodiment of the present invention;

fig. 3 is a schematic block diagram of a face detection program in the face detection apparatus according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The invention provides a face detection method, which is used for detecting a human head in real time by adopting a target object detection-based algorithm yolov3, and judging whether the human head is a face or a non-face by obtaining a skin area of a detected human head picture based on a skin color clustering model. The following description is given with reference to specific examples:

the first embodiment is as follows:

fig. 1 is a schematic flow chart of a face detection method according to an embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, the face detection method includes:

step S1: acquiring a sample image of a human head picture in real time, training the sample image based on a target object detection algorithm yolov3 to obtain a trained human head detection network model;

the method specifically comprises the following steps:

1) creating a sample set and a label file

1.1) sample collection: preprocessing the collected sample image and making a pedestrian sample set;

1.2) converting the size of the collected sample image into 2048 multiplied by 2048, and randomly dividing a sample set formed by the sample image into a training set and a verification set according to a certain proportion;

1.3) acquiring the pre-labeled head labeling information aiming at the verification set to generate a corresponding label file; the human head labeling information is labeled by an expert in advance, and aims to generate a human head detection network model, wherein the central coordinates of a boundary box are respectively the width and the height of the boundary box, class represents the type of a target object, 0 represents a background, and 1 represents human head identification;

2) a human head detection network model is created and trained based on a target object detection algorithm YOLOv3, and the method comprises the following specific steps:

2.1) create 5 levels of the YOLOv3 network model, respectively: rolling up layers, skipping a connection layer, an up-sampling layer, a routing layer, a YOLOv3 detection layer, and setting network parameters;

2.2) inputting the training set to carry out target prediction; the specific process comprises the following steps:

al) inputting a training set, dividing a sample image into 16 x 16 grids, predicting a probability value and 3 bounding boxes by each grid, and predicting five regression values by each bounding box, wherein the five regression values comprise the center point coordinates of the bounding box, the width and the height of the bounding box, the probability of the bounding box containing the head and the accuracy of the position of the bounding box;

a2) suppressing the screening bounding box with confidence and non-maxima: the confidence level pr (object) e (0,

1) the probability that the bounding box contains the head is determined; al is the area of the overlap between the predicted bounding box and the artificially labeled bounding box, and a2 is the area of the union of the predicted bounding box and the artificially labeled bounding box; step al) predicting to obtain a plurality of boundary frames, wherein each boundary frame corresponds to a confidence coefficient, non-maximum value inhibition is executed by setting a confidence coefficient threshold value, the boundary frames with the confidence coefficients lower than the threshold value are removed, and a union set of the boundary frames with the confidence coefficients higher than the threshold value is taken as a prediction result;

2.3) update the Yolov3 network model parameters: obtaining head bounding box information for reference through manual marking;

comparing the head boundary box information for prediction obtained by detecting the head through a YOLOv3 network model with the head boundary box information for reference to obtain a mean square value and an error value of the head boundary box information and the reference boundary box information;

optimizing the parameters of the YOLOv3 network model by taking the mean square value and the error value as loss functions;

the loss function is defined as follows:

wherein, (xi, yi), wi, hi, Ci, pi (c) respectively represent the center coordinate, width, height, IoU value of the bounding box obtained through the yelov 3 network prediction, and the probability that the target object in the bounding box is the head, respectively correspond to the manually labeled values. Specifically, the default human head value of the manual labeling is 1, and the background value is 0; λ coord is the coordinate error weight, λ noobj is the IoU error weight; judging whether the jth bounding box of the grid i is responsible for predicting the target object, and judging whether the center of the target object is in the grid i, wherein the specific definition is as follows: performing back propagation on the error obtained by calculating the loss function to finish one-time training; adjusting network parameters, and repeating the step 2.3) until the network converges;

2.4) inputting the verification set into the trained YOLOv3 network model, and comparing the output parameters with the parameters labeled by the label files in the verification set to judge whether the head is detected correctly; the accuracy and recall of the resulting YOLOv3 network model are defined as follows: if the accuracy rate and the recall rate of the YOLOv3 network model reach the indexes, carrying out the subsequent steps of detecting the head and extracting the features in the step S2 and the like; if the accuracy rate and the recall rate of the YOLOv3 network model do not reach the preset indexes, a new training set is added to train the YOLOv3 network model until the accuracy rate and the recall rate of the YOLOv3 network model reach the preset indexes.

Step S2: extracting and inputting a target image to be detected, detecting the human head by using the trained YOLOv3 network model to obtain human head boundary box information, and intercepting a boundary box where the human head is located from the target image according to the obtained human head boundary box information to obtain a subimage.

Step S3: specifically, in the present embodiment, the skin color clustering includes the following steps:

3.1) converting the sub-image into HSV color space (Hue, Saturation, Value), wherein H represents Hue, S represents Saturation, and V represents lightness;

3.2) performing threshold segmentation processing on the sub-image by adopting a color histogram comparison method, and specifically comprising the following steps: b1) calculating histograms H1, H2 and H3 of H, S, V three channels in the color sample respectively, and normalizing H1, H2 and H3 so as to compare the histograms with the corresponding sub-images; b2) dividing the sub-image into small areas I, respectively calculating H, S, V three-channel histograms h1, h2 and h3 for each small area I, and normalizing h1, h2 and h 3; b3) comparing the histograms H1, H2 and H3 of the sub-images after normalization with the corresponding histograms in the small regions one by adopting a Euclidean distance standard, namely comparing H1 with H1, H2 with H2 and comparing H3 with H3, wherein the larger the numerical value of the similarity A is, the larger the similarity is; b4) setting a threshold value of the similarity, setting the pixel point in the area with the similarity higher than the threshold value as 255, and setting the pixel point in the area with the similarity lower than the threshold value as 0;

3.3) performing a dilation operation on the image to connect parts containing skin color into connected areas.

Step S4: the method comprises the following steps of calculating the skin proportion of a subimage according to the head bounding box information and the head characteristic simulation, judging whether the subimage is a face or a non-face according to the calculated skin proportion, and feeding back detection to obtain a face detection result, wherein the method specifically comprises the following steps:

4.1) detecting a skin area in the target image according to the head bounding box information; if the ratio of the skin area in the whole area of the target image is larger than a first preset threshold value, detecting a connected domain in the target image;

4.2) judging whether the distribution of the connected domain in the subimages meets a preset rule or not according to the head characteristics, if so, judging that the target image is a head, otherwise, judging that the target image is a non-head. In this embodiment, the preset rule means that the skin position and the skin ratio in the communication area conform to the distribution rule of the human face on the human head.

4.3) judging the orientation of the human head according to the skin proportion in the communication area, if the skin proportion in the communication area is larger than a preset value, judging the human head as a human face area, and otherwise, judging the human head as a back head. Meanwhile, the detection of the human head and the human face is realized, and the feedback of the detection is further carried out to verify the human face detection result.

Step S5: further judging the orientation of the human head according to the human face detection result, which specifically comprises the following steps:

5.1) in order to reflect the orientation offset between the head of the person and the camera caused by the relative motion of the pedestrian and the camera, a motion blur construction filter f is adopted, and the construction process of the filter f is as follows:

constructing a two-dimensional matrix f, wherein f is a size which can just accommodate a line segment l with length len and slope tan theta, and assuming that the size of the matrix is axb, a is len cos theta +1, and b is len sin theta; motion blur with a blur angle theta means a motion len pixels in a direction with an angle theta to the horizontal

For a position (i, j) in the matrix f, the minimum distance N _ D of the position from the line segment l is calculated:

N_D＝jcosθ-isinθ

calculating the coefficient value at (i, j) from the minimum distance N _ D:

f(I,j)＝max(1-N_D,0)；

5.2) normalizing f:

5.3) filtering the image:

and taking the sizes of a plurality of pixels as fuzzy lengths, respectively constructing different filters at a plurality of angles to filter the image, adding the obtained multi-scale sampling image into a data set, wherein all labels are the same as the original image.

5.4) repeating the training process of the YOLOV3 model, and realizing accurate recognition of the face orientation.

Example two

This embodiment is basically the same as the first embodiment, and a face detection method is characterized by including the steps of:

step S1: detecting the human head in real time based on a target object detection algorithm YOLOv3 to obtain a human head picture;

step S2: detecting the head by using the trained YOLOv3 network model to obtain head bounding box information, and intercepting a bounding box where the head is located on the head picture by the head bounding box information to obtain a subimage;

step S3: carrying out skin color clustering on the subimages, and extracting head features;

step S4: calculating the skin area according to the human head bounding box information and human head characteristic simulation; and judging whether the human face or the non-human face is obtained according to the calculated skin area, and feeding back the detection to obtain a human face detection result.

The difference is that in this embodiment, the head feature extraction in step S3 is to select an RBG value of a color sample of the head, and the skin area obtained by following the RBG value and skin color clustering model can be determined as whether the face is a human face.

The human face detection method provided by the invention adopts a target object detection algorithm yolov3, can detect the human head in real time, obtains the skin area of the detected human head picture based on a skin color clustering model, judges whether the human head is a human face or a non-human face, can be used for a human head counting scene and a human face recognition scene, and can improve the detection speed and precision.

The invention also provides a face detection device. Fig. 2 is a schematic diagram of an internal structure of a face detection apparatus according to an embodiment of the present invention.

In this embodiment, the face detection device may be a PC (Personal Computer), or may be a terminal device such as a smart phone, a tablet Computer, or a portable Computer. The face detection means comprises at least a memory 11, a processor 12, a network interface 13 and a communication bus 14.

The memory 11 includes at least one type of computer-readable storage medium, including flash memory, hard disks, multimedia cards, card-type memory (e.g., SD or DX memory, etc.), magnetic memory, magnetic disks, optical disks, and the like. The memory 11 may in some embodiments be an internal storage unit of the face detection apparatus, for example a hard disk of the face detection apparatus. The memory 11 may also be an external storage device of the face detection apparatus in other embodiments, such as a plug-in hard disk provided on the face detection apparatus, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 11 may also include both an internal storage unit of the face detection apparatus and an external storage device. The memory 11 may be used not only to store application software installed in the face detection apparatus and various types of data, such as codes of a face detection program, etc., but also to temporarily store data that has been output or is to be output.

The processor 12 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip in some embodiments, and is used for running program codes stored in the memory 11 or Processing data, such as executing a face detection program.

The network interface 13 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used for establishing a communication link between the face detection apparatus and other electronic devices.

The communication bus 14 is used to enable connection communication between these components.

Fig. 2 shows only the face detection apparatus with the components 11 to 14 and the face detection program, and it will be understood by those skilled in the art that the structure shown in fig. 2 does not constitute a limitation of the face detection apparatus, and may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.

In the embodiment of the face detection apparatus shown in fig. 2, the memory 11 stores a face detection program; the processor 12 implements the following steps when executing the face detection program stored in the memory 11:

detecting the human head in real time based on a target object detection algorithm YOLOv3 to obtain a human head picture;

detecting the head by using the trained YOLOv3 network model to obtain head bounding box information, and intercepting a bounding box where the head is located on the head picture by the head bounding box information to obtain a subimage;

performing skin color clustering on the subimages to extract human head features;

calculating the skin area according to the human head bounding box information and human head characteristic simulation;

and judging whether the human face or the non-human face exists according to the skin area, and obtaining a human face detection result.

In this embodiment, the face detection program comprises a functional module formed by a series of computer program instructions executable by a computer processor, and can implement the face detection method according to the present invention, and the face detection program is stored in the memory 11 shown in fig. 2. For example, fig. 3 is a schematic diagram of program modules of a face detection program in an embodiment of the face detection apparatus of the present invention. In this embodiment, the face detection program may be divided into a human head detection module 10, a human head feature extraction module 20, and a face detection module 30, exemplarily:

the human head detection module 10 is configured to perform human head detection in real time based on a target object detection algorithm YOLOv3 to obtain a human head picture;

the human head detection module 10 is further configured to detect a human head by using the trained YOLOv3 network model to obtain human head bounding box information;

the human head feature extraction module 20 is configured to perform skin color clustering on the subimages to extract human head features;

the face detection module 30 is configured to obtain a skin area through analog calculation, and determine whether a face or a non-face is present according to the skin area, so as to obtain a face detection result.

The functions or operation steps of the above-mentioned program modules such as the human head detection module 10, the human head feature extraction module 20, and the human face detection module 30 when executed are substantially the same as those of the above-mentioned embodiments, and are not described herein again.

The specific implementation of the method for implementing the operation steps of the face detection implemented by the face detection device provided by the invention is basically the same as that of the embodiments of the face detection device and the method, and the description is not repeated here.

Furthermore, an embodiment of the present invention further provides a computer-readable storage medium, where a face detection program is stored on the computer-readable storage medium, where the face detection program is executable by one or more processors to implement the following operations:

calculating to obtain the skin area;

The specific implementation of the method of the face detection operation steps implemented by the face detection program stored in the computer-readable storage medium provided by the present invention is basically the same as that of the above-mentioned embodiments of the face detection apparatus and method, and will not be described in detail herein.

The technical effects that can be achieved by the face detection method, the face detection device and the computer-readable medium storing the face detection program provided by the invention are as follows:

the method comprises the steps of firstly, carrying out deep learning network based on YOLOv3, using network weights of a YOLOv3 model for labeling a large amount of image data which are not labeled artificially, completing intelligent labeling of an image recognition project data set through image data labeling and training processes for several times, and simultaneously obtaining a reinforced model corresponding to the image recognition project.

It should be noted that the above-mentioned numbers of the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, apparatus, article, or method that includes the element.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A face detection method is characterized by comprising the following steps:

acquiring a sample image in real time, training the sample image based on a target object detection algorithm to obtain a human head detection network model, wherein the human head detection network model is used for detecting a human head in the target image;

2. The method of detecting a human face according to claim 1, wherein the method of training the sample image to obtain the human head detection network model based on the target object detection algorithm comprises:

acquiring head labeling information pre-labeled in the verification set, and generating a corresponding label file;

performing target prediction through the training set;

3. The face detection method of claim 2, wherein the method for predicting the target through the training set comprises:

screening the bounding box with confidence and non-maximum suppression;

and taking the union of the bounding boxes with the confidence level higher than the threshold value as a prediction result.

4. The face detection method of claim 3, wherein the method for performing parameter verification and model calibration through the verification set comprises:

5. The method of claim 4, wherein the method of parameter verification and model calibration via a verification set further comprises:

comparing the parameters of the optimized human head detection network model with the parameters marked in the verification set, and observing whether the human head is correctly detected;

6. The face detection method of claim 1, wherein the method of skin color clustering the sub-images comprises:

converting the subimages into an HSV color space;

calculating histograms H1, H2 and H3 of H, S, V three channels in the color space respectively, and carrying out normalization processing on the histograms H1, H2 and H3;

dividing the sub-image into small regions, respectively calculating H, S, V three-channel histograms h1, h2 and h3 for each small region, and normalizing h1, h2 and h 3;

7. The method of claim 1, further comprising: and judging the orientation of the human head according to the skin proportion in the connected region, and identifying the human face region or the back head region.

8. A face detection apparatus, characterized in that the apparatus comprises a memory and a processor, the memory storing a face detection program operable on the processor, the face detection program, when executed by the processor, implementing the steps of the face detection method according to any one of claims 1 to 7.

9. A computer-readable storage medium having stored thereon a face detection program executable by one or more processors to perform the steps of the face detection method according to any one of claims 1 to 7.

10. A face detection apparatus comprising the face detection device of claim 8.