CN108171196B

CN108171196B - Face detection method and device

Info

Publication number: CN108171196B
Application number: CN201810018060.4A
Authority: CN
Inventors: 李志国; 刘洋; 曾建平; 朱明�; 李党
Original assignee: Shenzhen Qianhai Intellidata Technology Co ltd; Beijing Icetech Science & Technology Co ltd
Current assignee: Shenzhen Qianhai Intellidata Technology Co ltd; Beijing Icetech Science & Technology Co ltd
Priority date: 2018-01-09
Filing date: 2018-01-09
Publication date: 2021-10-26
Anticipated expiration: 2038-01-09
Also published as: CN108171196A

Abstract

The invention provides a face detection method, which comprises the following steps: inputting a video image to be detected; detecting a moving target of the video image to obtain a moving target area; carrying out blob clustering on the moving target area to obtain a target detection area; selecting face and non-face sample images, training the cascade convolution neural network, and acquiring a trained face detection model; and carrying out face detection on the target detection area by adopting a face detection model, and outputting a detection result. Compared with the prior art, the method can quickly detect the face in a complex scene, and has high detection accuracy.

Description

Face detection method and device

Technical Field

The invention relates to image processing, video monitoring and security protection, in particular to a face detection method and a face detection device.

Background

In recent years, face recognition has been a popular direction in pattern recognition due to its wide application and its high market expectation.

The face detection is used as a key link in a face recognition system, occupies most system resources, and the detection performance directly influences the final recognition precision, so that the face detection is very important.

At present, common face detection methods based on videos/images include a template matching method, a skin color model method, a neural network method, an SVM classifier method, an Adaboost classifier method and the like. However, these methods have low detection accuracy and are difficult to adapt to complex environments.

In recent years, with the rapid development of artificial intelligence, deep learning is increasingly introduced into the field of face recognition, which forms high-level features by combining bottom-level features, with less influence from environmental changes. The Convolutional Neural Network (CNN) is a model structure for deep learning, processes image information through correlation operations of a convolutional layer and a sub-sampling layer, has low sensitivity to deformation such as translation, scaling, tilting and rotation, and is more suitable for image face detection in an unconstrained environment.

However, the existing face detection method based on the convolutional neural network is generally implemented by adopting more network layers, and the operation complexity is higher.

In summary, a face detection method with low operation complexity and high detection accuracy needs to be provided.

Disclosure of Invention

In view of the above, the main objective of the present invention is to rapidly implement face detection, and the detection accuracy is high.

To achieve the above object, according to a first aspect of the present invention, there is provided a face detection method, including:

the method comprises the following steps that a video image to be detected is input;

secondly, detecting a moving target of the video image to obtain a moving target area;

thirdly, carrying out blob clustering processing on the moving target area to obtain a target detection area;

selecting face and non-face sample images, training the cascade convolution neural network, and acquiring a trained face detection model;

and fifthly, carrying out face detection on the target detection area by adopting a face detection model, and outputting a detection result.

Further, the third step includes:

a small block merging step, namely judging whether two moving target areas are merged or not according to the width, the height and the distance of any two moving target areas in a current frame video image;

filtering isolated small blocks, namely filtering moving target areas belonging to the isolated small blocks in the current frame video image according to the area of the moving target area and the distance between the moving target area and the rest moving target areas;

and an oversize block segmentation step, in the current frame video image, segmenting the moving target area belonging to the oversize block according to the ratio of the area of the moving target area to the area of the video image.

Further, the fourth step includes:

a training sample selection step, namely selecting an annotated face image as a positive sample image, selecting an annotated non-face image as a negative sample image, and selecting a part of face sample images;

a first-stage convolutional neural network training step, wherein a positive sample image, a part of face sample image and a negative sample image are adopted to train a first-stage convolutional neural network, a trained first-stage face detection model is obtained, and a difficult case negative sample image is mined by using the first-stage face detection model;

a second-stage convolutional neural network training step, wherein the second-stage convolutional neural network is trained by adopting the positive sample image, part of the face sample image and the difficult-case negative sample image output in the first-stage convolutional neural network training step to obtain a trained second-stage face detection model, and the difficult-case negative sample image is mined by utilizing the second-stage face detection model;

and a third-stage convolutional neural network training step, namely training the third-stage convolutional neural network by adopting the positive sample image, part of the face sample image and the difficult negative sample image output in the second-stage convolutional neural network training step to obtain a trained third-stage face detection model.

The difficult negative sample image is a negative sample image which is falsely detected by the face detection model.

Further, the fifth step includes:

inputting a target detection area to be detected;

a multi-scale scaling step, namely carrying out multi-scale scaling on the target detection area;

a first-stage face detection model detection step, namely, adopting a first-stage face detection model to respectively carry out face detection on target detection areas of each scale, if a face is detected, acquiring a relative position in the target detection area according to the detected face position, marking the relative position as a first face candidate frame, and turning to a second-stage face detection model detection step, otherwise, turning to a face detection result output step;

a second-level face detection model detection step, namely, carrying out face detection on the first face candidate frame by adopting a second-level face detection model, if a face is detected, acquiring a relative position in a target detection area according to the detected face position, marking the relative position as the second face candidate frame, and turning to a third-level face detection model detection step, otherwise, turning to a face detection result output step;

a third-level face detection model detection step, adopting a third-level face detection model to perform face detection on the second face candidate frame, if a face is detected, acquiring a relative position in a target detection area according to the detected face position, and marking the relative position as a face frame;

a human face detection result output step, namely outputting the position of a human face frame if the human face frame exists in the target detection area; and if the target detection area does not have the face frame, outputting no face.

Further, a face detection method according to the present invention may further include: and a sixth step of tracking the face detection result by adopting an optical flow method and outputting a face tracking result.

According to another aspect of the present invention, there is provided a face detection apparatus, including:

the video image input module is used for inputting a video image to be detected;

the moving target detection module is used for detecting a moving target of the video image to acquire a moving target area;

the blob cluster processing module is used for performing blob cluster processing on the moving target area to obtain a target detection area;

the cascade convolution neural network training module is used for selecting face and non-face sample images, training the cascade convolution neural network and obtaining a trained face detection model;

and the face detection module is used for carrying out face detection on the target detection area by adopting a face detection model and outputting a detection result.

Further, the blob processing module comprises:

the small block merging module is used for judging whether two moving target areas are merged or not according to the width, the height and the distance of any two moving target areas in the current frame video image;

the isolated small block filtering module is used for filtering the moving target area belonging to the isolated small block in the current frame video image according to the area of the moving target area and the distance between the moving target area and the rest moving target areas;

and the oversized block segmentation module is used for segmenting the moving target area belonging to the oversized block in the current frame video image according to the ratio of the area of the moving target area to the area of the video image.

Further, the cascaded convolutional neural network training module comprises:

the training sample selecting module is used for selecting the marked face image as a positive sample image, selecting the marked non-face image as a negative sample image and selecting a part of face sample image;

the first-stage convolutional neural network training module is used for training the first-stage convolutional neural network by adopting the positive sample image, part of the face sample image and the negative sample image, acquiring a trained first-stage face detection model, and mining a difficult case negative sample image by utilizing the first-stage face detection model;

the second-level convolutional neural network training module is used for training the second-level convolutional neural network by adopting the positive sample image, part of the face sample image and the difficult negative sample image output by the first-level convolutional neural network training module to obtain a trained second-level face detection model, and mining the difficult negative sample image by utilizing the second-level face detection model;

and the three-level convolutional neural network training module is used for training the third-level convolutional neural network by adopting the positive sample image, part of the face sample image and the difficult negative sample image output by the two-level convolutional neural network training module to obtain a trained third-level face detection model.

Further, the face detection module comprises:

the to-be-detected area input module is used for inputting a to-be-detected target detection area;

the multi-scale scaling module is used for carrying out multi-scale scaling on the target detection area;

the first-stage face detection model detection module is used for respectively carrying out face detection on the target detection area of each scale by adopting a first-stage face detection model, if a face is detected, acquiring the relative position in the target detection area according to the detected face position, marking the relative position as a first face candidate frame, and switching to the second-stage face detection model detection module, otherwise, switching to the face detection result output module;

the second-level face detection model detection module is used for detecting the face of the first face candidate frame by adopting a second-level face detection model, if the face is detected, the relative position in the target detection area is obtained according to the detected face position, the second face candidate frame is marked and is switched to the third-level face detection model detection module, and if the face is not detected, the third-level face detection model detection module is switched to the face detection result output module;

the third-level face detection model detection module is used for detecting the face of the second face candidate frame by adopting a third-level face detection model, and if the face is detected, the relative position in the target detection area is obtained according to the detected face position and is marked as a face frame;

the human face detection result output module is used for outputting the position of the human face frame if the human face frame exists in the target detection area; and if the target detection area does not have the face frame, outputting no face.

Further, a face detection apparatus according to the present invention may further include: and the face tracking module is used for tracking the face detection result by adopting an optical flow method and outputting a face tracking result.

Compared with the prior face detection technology, the face detection method and the face detection device of the invention adopt the cascade convolution neural network to detect the face on one hand, and effectively improve the accuracy of the face detection through the training of part of face samples and the mining of difficult negative samples; on the other hand, before the face detection of the cascade convolution neural network is carried out, a target detection area is obtained through moving target detection and blob processing, and then the face detection is only carried out in the target detection area, so that the calculation amount is greatly reduced.

Drawings

Fig. 1 shows a flow chart of a first embodiment of a face detection method according to the invention.

Fig. 2 shows a flow chart of a second embodiment of a face detection method according to the invention.

Fig. 3 shows a block diagram of a first embodiment of a face detection apparatus according to the invention.

Fig. 4 shows a block diagram of a second embodiment of a face detection apparatus according to the invention.

Detailed Description

To further clarify the structure, features and other objects of the present invention, a detailed description of the preferred embodiments will be given below with reference to the accompanying drawings, which are provided for illustration of the technical solution of the present invention and are not intended to limit the present invention.

Fig. 1 shows a flow chart of a first embodiment of a face detection method according to the invention. As shown in fig. 1, a face detection method according to the present invention includes:

a first step S1 of inputting a video image to be detected;

a second step S2, detecting a moving target of the video image to obtain a moving target area;

a third step of S3, carrying out blob clustering processing on the moving target area to obtain a target detection area;

a fourth step S4, selecting face and non-face sample images, training the cascade convolution neural network, and obtaining a trained face detection model;

and a fifth step S5, carrying out face detection on the target detection area by adopting a face detection model, and outputting a detection result.

The second step S2 can be implemented by existing moving object detection methods, including but not limited to: static difference method, gaussian background modeling method, CodeBook background modeling method, and VIBE method.

In the embodiment, in the second step S2, a mixed gaussian background modeling method, such as a moving object detection method in the document "entropy theory-based modified mixed gaussian model moving object detection algorithm, liuwanarmy, lie," computer application research, 2015,32(7): 2226-.

Further, the third step S3 includes:

a small block merging step S31, in the current frame video image, judging whether the two moving target areas are merged according to the width, height and distance of any two moving target areas;

an isolated small block filtering step S32, in the current frame video image, according to the area of the moving target area and the distance between the moving target area and the rest of the moving target areas, filtering the moving target areas belonging to the isolated small blocks;

and an excessively large block segmentation step S33 of segmenting the moving target region belonging to the excessively large block in the current frame video image according to the ratio of the area of the moving target region to the area of the video image.

Further, the small lump merging step S31 includes: for any two moving object regions B in the current frame video image_iAnd B_jCalculating a moving target region B_iWidth WB of_iAnd height HB_iCalculating a moving target region B_jWidth WB of_jAnd height HB_jSeparately calculating moving object regions B_iAnd B_jSpacing Xd in the x and y directions_ij、Xd_ijIf at the same time satisfy

And

the moving object region B_iAnd B_jAnd combining the two areas into a moving target area, wherein i, j belongs to {1, 2, …, BNum1}, i is not equal to j, BNum1 is the number of the moving target areas in the current frame video image, and W and H are the width and the height of the current frame video image.

Further, the isolated small lump filtering step S32 includes: for the current frame video image, the area of each moving target area is respectively calculated, and the maximum value SB of the area of the moving target area is obtained_maxFor any one moving target region B_pIf it is satisfied

The moving object region B is considered_pFor isolating small lumps, moving target area B_pFiltering, wherein p is the {1, 2, …, BNum2}, BNum2 is the number of moving object regions in the video image of the current frame, SB_pAs a moving target region B_pThe area of (a).

Further, the excessively large blob dividing step S33 includes: for any moving object region B in the current frame video image_qCalculating a moving target region B_qArea SB of_qRatio to the area S of the current frame video image

If it is not

The moving object region B is considered_qThe moving target area B is an oversize lump and is divided by adopting a foreground density distribution method_qAnd (2) segmenting into one or more moving object regions, wherein q belongs to {1, 2, …, BNum3}, and BNum3 is the moving object region in the video image of the current frameAnd (4) the number.

Further, the foreground density distribution segmentation method in the oversized blob segmentation step S33 includes: for moving target areas B considered as excessively large blobs_qBuild a width of

Has a height of

In the moving target area B, the sliding window is adopted_qSequentially sliding once, counting the number value of the foreground points in each sliding window area, and obtaining the maximum value WFNum of the number value of the foreground points in the sliding window area_maxThe value of the foreground point is WFNum_maxAs a divided moving object region B_q1(ii) a Moving object region B_qInner removing moving object area B_q1The other parts are used as the segmentation subareas, the area of each segmentation subarea is respectively calculated, if the area of a certain segmentation subarea is equal to the area of the motion target area B_q1Area ratio of less than

Then filter the partition subarea, if the area of a partition subarea and the moving object area B_q1The ratio of the areas is located in

In between, the sub-region is divided into sub-regions and the moving object region B_q1Merging if the area of a certain segmented sub-region is associated with the moving object region B_q1Area ratio is greater than

This segmented sub-region is taken as the segmented moving object region.

The T is₁The value range of (1) is 2-8, T₂The value range of (1) is 3-8, T₃The value range of (1) is 2-8, T₄The value range of (A) is 2-8,the T is₅The value range of (1) is 8-12, T₆The value range of (a) is 2-6. Further, said T₁The value range of (1) is 2-4, T₂The value range of (1) is 4-6, T₃The value range of (1) is 2-4, T₄Has a value range of 2 to 4, T₅The value range of (1) is 9-11, T₆The value range of (A) is 3-5.

Further, the fourth step S4 includes:

a training sample selection step S41, selecting an annotated face image as a positive sample image, selecting an annotated non-face image as a negative sample image, and selecting a part of face sample images;

a first-stage convolutional neural network training step S42, wherein the first-stage convolutional neural network is trained by adopting the positive sample image, part of the face sample image and the negative sample image to obtain a trained first-stage face detection model, and a difficult-case negative sample image is mined by utilizing the first-stage face detection model;

a second-stage convolutional neural network training step S43, wherein the second-stage convolutional neural network is trained by adopting the positive sample image, part of the face sample image and the difficult-case negative sample image output by the first-stage convolutional neural network training step S42 to obtain a trained second-stage face detection model, and the difficult-case negative sample image is mined by utilizing the second-stage face detection model;

and a third-level convolutional neural network training step S44, wherein the third-level convolutional neural network is trained by adopting the positive sample image, part of the face sample image and the difficult negative sample image output by the second-level convolutional neural network training step S43 to obtain a trained third-level face detection model.

The marked face image comprises an image of only selecting a face under the conditions of different scenes, different illumination, different postures, different skin colors and the like, the marked non-face image comprises an image without a face under the conditions of different scenes, different illumination and the like, and the marked part of the face image is an image containing a face sample image.

Further, the labeling the non-face image comprises: the body part containing the skin color characteristic, the background similar to the skin color, and other area images.

The difficult negative sample image is a negative sample image which is falsely detected by the face detection model. In the embodiment, if a negative sample image labeled with a non-human face is detected by a first-level human face detection model or a second-level human face detection model and then output as a human face, the negative sample image is regarded as a difficult negative sample image.

Further, the first-stage Convolutional Neural Network (CNN), the second-stage Convolutional Neural Network, and the third-stage Convolutional Neural Network are existing Convolutional Neural networks, and may include but are not limited to: general CNN, RCNN, SPPNET, Fast-RCNN, YOLO, SSD, etc., or combinations of the above.

In a first embodiment, the first stage convolutional neural network comprises: 4 convolutional layers, 1 pooling layer; the second stage convolutional neural network comprises: 3 convolutional layers, 3 pooling layers, 3 full-link layers; the third-stage convolutional neural network comprises: 4 convolution layers, 3 pooling layers, 3 full-link layers, wherein the last full-link layer output is 3, represents to output 3 classification results, includes: human faces, non-human faces, and partial human faces.

In a second embodiment, the first stage convolutional neural network, the second stage convolutional neural network, and the third stage convolutional neural network adopt a neural network structure in the paper of "application of convolutional neural network in face detection research, liuxiuqing, university of shanxi, 2015", where the last full-connection layer output is 3, which indicates that 3 classification results are output, and includes: human faces, non-human faces, and partial human faces.

Further, the fifth step S5 includes:

a to-be-detected region input step S51 of inputting a target detection region to be detected;

a multi-scale scaling step S52 of multi-scale scaling the target detection region;

a first-level face detection model detection step S53, adopting a first-level face detection model to respectively perform face detection on the target detection area of each scale, if a face is detected, acquiring the relative position in the target detection area according to the detected face position, marking the relative position as a first face candidate frame, and turning to a second-level face detection model detection step S54, otherwise, turning to a face detection result output step S56;

a second-level face detection model detection step S54, adopting a second-level face detection model to perform face detection on the first face candidate frame, if a face is detected, acquiring a relative position in a target detection area according to the detected face position, marking the relative position as the second face candidate frame, and turning to a third-level face detection model detection step S55, otherwise, turning to a face detection result output step S56;

a third-level face detection model detection step S55, adopting a third-level face detection model to perform face detection on the second face candidate frame, and if a face is detected, acquiring a relative position in a target detection area according to the detected face position and marking the relative position as a face frame;

a face detection result output step S56, wherein if a face frame exists in the target detection area, the position of the face frame is output; and if the target detection area does not have the face frame, outputting no face.

Fig. 2 shows a flow chart of a second embodiment of a face detection method according to the invention. As shown in fig. 2, a face detection method according to the present invention includes:

a first step S1 of inputting a video image to be detected;

a fifth step S5, adopting a face detection model to carry out face detection on the target detection area, and outputting a face detection result;

in a sixth step S6, the face detection result is tracked by an optical flow method, and a face tracking result is output.

The sixth step S6 can be implemented by an existing tracking method or algorithm based on an optical flow method.

Further, the sixth step S6 includes:

step S61 is given to the ID of the first frame of face frame, for the video image to be detected, the video image of which the face frame is detected for the first time is taken as a first frame image, and all face frames in the first frame video image are given unique ID numbers;

a current frame face frame ID obtaining step S62, calculating the position of the face frame of the ID number in the current frame by an optical flow method according to the position of the face frame of each ID number in the previous frame, comparing the face frames of the ID numbers corresponding to the previous frame and the current frame by adopting a face comparison method, if the comparison is successful, giving the ID number to the face frame of the current frame corresponding to the ID number position, if the comparison is failed, giving a new ID number to the face frame of the current frame corresponding to the ID number position, and if the current frame does not have the face frame corresponding to the ID number position, adopting a tracking result as the face frame position of the ID number until the face frame of the ID number disappears;

and a face tracking result output step S63, wherein the tracking results of the face frames with all ID numbers in the video to be detected are output.

In an embodiment, the sixth step S6 tracks all face frames of the video image to be detected by using a pyramid optical flow method, such as the method in "video tracking algorithm based on online Boosting and LK optical flows", proceedings of science and technology university in southwest, 2012,27(4):80-84 ".

Fig. 3 is a block diagram showing a first embodiment of a face detection apparatus according to the present invention. As shown in fig. 3, a face detection apparatus according to the present invention includes:

the video image input module 1 is used for inputting a video image to be detected;

the moving target detection module 2 is used for detecting a moving target of the video image to acquire a moving target area;

the blob cluster processing module 3 is used for performing blob cluster processing on the moving target area to obtain a target detection area;

the cascade convolution neural network training module 4 is used for selecting face and non-face sample images, training the cascade convolution neural network and obtaining a trained face detection model;

and the face detection module 5 is used for detecting the face of the target detection area by adopting a face detection model and outputting a detection result.

The moving object detection module 2 may be implemented by existing moving object detection methods, devices or apparatuses, including but not limited to: a moving object detection method, device or apparatus based on static difference, Gaussian background modeling, CodeBook background modeling and VIBE algorithm.

Further, the blob processing module 3 includes:

a small block merging module 31, configured to determine whether two moving target regions are merged according to the widths, heights, and distances of any two moving target regions in a current frame video image;

the isolated small block filtering module 32 is used for filtering the moving target area belonging to the isolated small block in the current frame video image according to the area of the moving target area and the distance between the moving target area and the rest moving target areas;

and an excessively large block segmentation module 33, configured to perform segmentation processing on the moving target region belonging to an excessively large block in the current frame video image according to a ratio of an area of the moving target region to an area of the video image.

Further, the small blob merging module 31 includes: for any two moving object regions B in the video image of the current frame_iAnd B_jCalculating a moving target region B_iWidth WB of_iAnd height HB_iCalculating a moving target region B_jWidth WB of_jAnd height HB_jSeparately calculating moving object regions B_iAnd B_jSpacing Xd in the x and y directions_ij、Xd_ijIf at the same time satisfy

And

Further, the isolated small block filtering module 32 includes: for the current frame video image, the area of each moving target area is respectively calculated, and the maximum value SB of the area of the moving target area is obtained_maxFor any one moving target region B_pIf it is satisfied

Further, the oversized block segmentation module 33 includes: for any moving object region B in the video image of the current frame_qCalculating a moving target region B_qArea SB of_qRatio to the area S of the current frame video image

If it is not

The moving object region B is considered_qThe moving target area B is an overlarge block and is divided by adopting a foreground density distribution division module_qAnd (3) segmenting into one or more moving object regions, wherein q belongs to {1, 2, …, BNum3}, and BNum3 is the number of the moving object regions in the video image of the current frame.

Further, the foreground in the oversized blob segmentation module 33The density distribution segmentation module includes: for moving target areas B considered as excessively large masses_qBuild a width of

Has a height of

This segmented sub-region is taken as the segmented moving object region.

The T is₁The value range of (1) is 2-8, T₂The value range of (1) is 3-8, T₃The value range of (1) is 2-8, T₄The value range of (1) is 2-8, T₅The value range of (1) is 8-12, T₆The value range of (a) is 2-6. IntoIn one step, the T₁The value range of (1) is 2-4, T₂The value range of (1) is 4-6, T₃The value range of (1) is 2-4, T₄Has a value range of 2 to 4, T₅The value range of (1) is 9-11, T₆The value range of (A) is 3-5.

Further, the cascaded convolutional neural network training module 4 includes:

a training sample selection module 41, configured to select an annotated face image as a positive sample image, select an annotated non-face image as a negative sample image, and select a part of face sample images;

the primary convolutional neural network training module 42 is configured to train the primary convolutional neural network by using the positive sample image, the partial face sample image and the negative sample image, obtain a trained primary face detection model, and mine a difficult-case negative sample image by using the primary face detection model;

the second-level convolutional neural network training module 43 is configured to train the second-level convolutional neural network by using the positive sample image, part of the face sample image and the difficult-to-sample negative sample image output by the first-level convolutional neural network training module 42, obtain a trained second-level face detection model, and mine the difficult-to-sample negative sample image by using the second-level face detection model;

and the third-level convolutional neural network training module 44 is configured to train the third-level convolutional neural network by using the positive sample image, part of the face sample image and the difficult-case negative sample image output by the second-level convolutional neural network training module 43, so as to obtain a trained third-level face detection model.

In an embodiment, the first stage convolutional neural network comprises: 4 convolutional layers, 1 pooling layer; the second stage convolutional neural network comprises: 3 convolutional layers, 3 pooling layers, 3 full-link layers; the third-stage convolutional neural network comprises: 4 convolutional layers, 3 pooling layers, 3 full-link layers.

Further, the face detection module 5 includes:

a to-be-detected region input module 51 for inputting a target detection region to be detected;

a multi-scale scaling module 52, configured to perform multi-scale scaling on the target detection area;

a first-stage face detection model detection module 53, configured to perform face detection on target detection areas of each scale by using a first-stage face detection model, if a face is detected, obtain a relative position in the target detection area according to the detected face position, mark the relative position as a first face candidate frame, and switch to a second-stage face detection model detection module 54, otherwise switch to a face detection result output module 56;

the second-level face detection model detection module 54 performs face detection on the first face candidate frame by using a second-level face detection model, and if a face is detected, acquires a relative position in a target detection area according to the detected face position, marks the relative position as the second face candidate frame, and switches to the third-level face detection model detection module 55, otherwise, switches to the face detection result output module 56;

the third-level face detection model detection module 55 performs face detection on the second face candidate frame by using a third-level face detection model, and if a face is detected, acquires a relative position in a target detection area according to the detected face position and marks the relative position as a face frame;

a face detection result output module 56, which outputs the position of the face frame if the target detection area has the face frame; and if the target detection area does not have the face frame, outputting no face.

Fig. 4 is a block diagram showing a second embodiment of a face detection apparatus according to the present invention. As shown in fig. 4, a face detection apparatus according to the present invention includes:

the face detection module 5 is used for detecting the face of the target detection area by adopting a face detection model and outputting a detection result;

and the face tracking module 6 is used for tracking the face detection result by adopting an optical flow method and outputting a face tracking result.

The face tracking module 6 can be implemented by an existing tracking method, device or apparatus based on an optical flow method.

Further, the face tracking module 6 includes:

a first frame face frame ID assigning module 61, configured to assign a unique ID number to all face frames in a first frame video image, by using, as the first frame image, a video image in which a face frame is detected for the first time for a video image to be detected;

a current frame face frame ID obtaining module 62, configured to calculate, according to a position of a face frame of each ID number in a previous frame, a position of the face frame of the ID number in the current frame by an optical flow method, and compare, by using a face comparison method, face frames of the previous frame and the current frame corresponding to the ID number, if the comparison is successful, the ID number is assigned to the face frame of the current frame corresponding to the ID number position, if the comparison is failed, a new ID number is assigned to the face frame of the current frame corresponding to the ID number position, and if the current frame does not have a face frame corresponding to the ID number position, a tracking result is used as the face frame position of the ID number until the face frame of the ID number disappears;

and the face tracking result output module 63 is used for outputting the tracking results of all the face frames with the ID numbers in the video to be detected.

While the foregoing is directed to the preferred embodiment of the present invention, and is not intended to limit the scope of the invention, it will be understood that the invention is not limited to the embodiments described herein, which are described to assist those skilled in the art in practicing the invention. Further modifications and improvements may readily occur to those skilled in the art without departing from the spirit and scope of the invention, and it is intended that the invention be limited only by the terms and scope of the appended claims, as including all alternatives and equivalents which may be included within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A face detection method is characterized by comprising the following steps:

a fifth step of adopting a face detection model to carry out face detection on the target detection area and outputting a detection result;

the third step includes:

2. The method of claim 1, wherein the blob merging step comprises: for any two moving object regions B in the current frame video image_iAnd B_jCalculating a moving target region B_iWidth WB of_iAnd height HB_iCalculating a moving target region B_jWidth WB of_jAnd height HB_jSeparately calculating moving object regions B_iAnd B_jSpacing Xd in the x and y directions_ij、Xd_ijIf at the same time satisfy

And

the moving object region B_iAnd B_jMerging the two video images into a moving target area, wherein i, j belongs to {1, 2, …, BNum1}, i is not equal to j, BNum1 is the number of the moving target areas in the current frame video image, and W and H are the width and the height of the current frame video image;

the T is₁The value range of (1) is 2-8.

3. The method of claim 1, wherein the isolated blob filtering step comprises: for the current frame video image, the area of each moving target area is respectively calculated, and the maximum value SB of the area of the moving target area is obtained_maxFor any one moving target region B_pIf it is satisfied

The moving object region B is considered_pFor isolating small lumps, moving target area B_pFiltering, wherein p is the {1, 2, …, BNum2}, BNum2 is the number of moving object regions in the video image of the current frame, SB_pAs a moving target region B_pThe area of (d);

the T is₂The value range of (a) is 3-8.

4. The method of claim 1, wherein the oversized blob segmenting step comprises: for any moving object region B in the current frame video image_qCalculating a moving target region B_qArea SB of_qRatio to the area S of the current frame video image

If it is not

The moving object region B is considered_qThe moving target area B is an oversize lump and is divided by adopting a foreground density distribution method_qDividing the video image into one or more moving target areas, wherein q belongs to {1, 2, …, BNum3}, and BNum3 is the number of the moving target areas in the video image of the current frame;

the T is₃The value range of (1) is 2-8.

5. The method of claim 4, wherein the foreground density distribution segmentation method comprises: for moving target areas B considered as excessively large blobs_qBuild a width of

Has a height of

In between, the sub-region is divided into sub-regions and the moving object region B_q1Combining, e.g.Area of some divided sub-region and moving target region B_q1Area ratio is greater than

Taking the divided sub-region as a divided moving target region;

the T is₄The value range of (1) is 2-8, T₅The value range of (1) is 8-12, T₆The value range of (a) is 2-6.

6. The method of claim 1, wherein the fourth step comprises:

a third-stage convolutional neural network training step, wherein a third-stage convolutional neural network is trained by adopting a positive sample image, a part of face sample image and a difficult negative sample image output in the second-stage convolutional neural network training step to obtain a trained third-stage face detection model;

7. The method of claim 1, wherein the fifth step comprises:

inputting a target detection area to be detected;

8. The method of claim 1, wherein the method comprises: and a sixth step of tracking the face detection result by adopting an optical flow method and outputting a face tracking result.

9. A face detection apparatus, comprising:

the face detection module is used for detecting the face of the target detection area by adopting a face detection model and outputting a detection result;

the blob processing module comprises:

10. The apparatus of claim 9, wherein the cascaded convolutional neural network training module comprises:

the three-level convolutional neural network training module is used for training the third-level convolutional neural network by adopting the positive sample image, part of the face sample image and the difficult negative sample image output by the two-level convolutional neural network training module to obtain a trained third-level face detection model;

11. The apparatus of claim 9, wherein the face detection module comprises:

12. The apparatus of claim 9, wherein the apparatus comprises: and the face tracking module is used for tracking the face detection result by adopting an optical flow method and outputting a face tracking result.