CN109271848B

CN109271848B - Face detection method, face detection device and storage medium

Info

Publication number: CN109271848B
Application number: CN201810866324.1A
Authority: CN
Inventors: 孙晓航; 袁誉乐; 曾强; 高飞
Original assignee: Shenzhen Tian'a Intelligent Technology Co ltd
Current assignee: Shenzhen Tian'a Intelligent Technology Co ltd
Priority date: 2018-08-01
Filing date: 2018-08-01
Publication date: 2022-04-15
Anticipated expiration: 2038-08-01
Also published as: CN109271848A

Abstract

A face detection method, a face detection device and a storage medium. On the first hand, the human face detection method adopts a selection mechanism, and selects a better processing method from human face recognition processing and human face tracking processing through the previous detection result, thereby being beneficial to enhancing the practical effect of the human face detection method; in the second aspect, because the lightweight deep neural network for face recognition is introduced in the face recognition processing process, the face region can be effectively recognized and positioned, and the detection accuracy is favorably improved; in the third aspect, the face confidence is introduced, so that the problem of drift generated in the face tracking stage is solved, the tracking deviation is corrected, and the output accuracy of the face region is improved; in the fourth aspect, the ROI prediction method is added on the basis of the lightweight deep neural network construction, so that the situation that time consumption is long due to face recognition of the whole image can be avoided, the execution speed of face recognition processing is improved, and the system overhead is reduced.

Description

Face detection method, face detection device and storage medium

Technical Field

The present invention relates to a face detection technology, and in particular, to a face detection method, a face detection apparatus, and a storage medium.

Background

With the development of electronic technology, face detection and recognition become the most potential means of biometric authentication, and an automatic face recognition system is required to have a certain recognition capability for general images, so that a series of problems faced by the system make face detection start to be an important research subject. Currently, face detection is a key link in an automatic face recognition system, the application background of the face detection is far beyond the scope of the face recognition system, and the face detection has important application value in the aspects of content-based retrieval, digital video processing, video detection and the like.

The face detection is a necessary preprocessing step in the fields of face beautification, face special effects, face recognition, face attribute analysis, fatigue driving detection and the like, so that the face detection has high commercial value and application value. However, in practical application, due to the influence of factors such as facial expression change, hair shielding, ornament shielding, ambient light change, body angle change, imaging conditions and the like, the face detection still faces a great technical challenge, and a related face detection algorithm can provide guarantee for the practical application effect of the face detection only when being further improved.

Currently, face detection algorithms can be simply divided into: face detection based on skin color, face detection based on geometric features, face detection based on statistical learning, and face detection based on deep learning. In addition, the human face detection based on the deep learning mostly achieves the detection purpose by means of a deep neural network, and in comparison, the human face detection method is high in detection accuracy, obvious in optimization effect and wide in development prospect. For example, a method for generating a candidate region by a sliding window is abandoned based on the face detection of the RCNN series, and a propofol method is used, and although the method can obtain a high detection rate, the method has the defects of complex deep neural network structure and low detection speed; face feature extraction and classification in face detection based on the cascade CNN mode are often completed by CNN in a unified manner, 6 CNNs are required to be arranged in a cascade structure, and 3 CNNs are used for carrying out classification judgment on faces and non-faces, so that time consumption of classification judgment is increased, and quick realization effect of face detection is not facilitated.

Disclosure of Invention

The invention mainly solves the technical problem of how to improve the detection speed and the detection precision of the face detection based on deep learning. In order to solve the above technical problems, the present application provides a face detection method and a device thereof.

According to a first aspect, an embodiment provides a face detection method, including the following steps:

acquiring an image to be detected in an image sequence;

selecting a processing mode of the image to be detected according to a face detection result in the image sequence at the previous time, carrying out face tracking processing on the image to be detected when the face is detected in the image sequence at the previous time, and carrying out face recognition processing on the image to be detected if the face is not detected in the image sequence at the previous time;

and outputting the human face area in the image to be detected according to the processing result.

The method for selecting the processing mode of the image to be detected according to the face detection result in the image sequence at the previous time comprises the following steps:

and sequentially processing each frame of image in the image sequence, taking a face detection result of a previous frame of image of the image to be detected as a previous face detection result, and selecting a processing mode of the image to be detected according to the previous face detection result.

The human face detection result of the last frame of image to be detected is used as the previous human face detection result, and the method comprises the following steps:

acquiring a face region output in a previous frame of image of the image to be detected;

inputting the face region output by the previous frame of image into a deep neural network for face confidence calculation to obtain a face confidence;

and comparing the face confidence with a preset threshold, wherein when the face confidence exceeds the preset threshold, the face detection result of the previous frame of image is the detected face, otherwise, the face detection result of the previous frame of image is the undetected face.

The step of inputting the face region of the previous frame of image into a deep neural network for face confidence calculation to obtain a face confidence includes:

zooming the face area of the previous frame of image to obtain a zoomed image;

inputting the zoomed image into a deep neural network for face confidence calculation to obtain a face confidence; the deep neural network for face confidence calculation comprises one or more bottleneck convolution units, and the bottleneck convolution units are used for performing convolution processing operation on input images.

The right treat waiting to examine the image and carry out face identification and handle, include:

performing down-sampling processing on the image to be detected to obtain a plurality of images with different sizes;

and inputting the images with different sizes into a lightweight deep neural network for face recognition so as to detect a face region from the image to be detected.

The right the image to be detected is carried out face tracking processing, including:

acquiring a face region detected in the image sequence at the previous time;

and performing KCF target tracking processing on the face area detected in the image sequence in the image to be detected at the previous time to obtain the face area in the image to be detected.

Before the face tracking processing is carried out on the image to be detected, the method also comprises a frame number judging step, wherein the frame number judging step comprises the following steps:

and counting the detected frame images when the face region is detected in the image sequence, carrying out ROI region calculation on the image to be detected when the counting result exceeds a preset frame number, and carrying out face tracking processing on the image to be detected and clearing the counting result to carry out the next round of counting.

The ROI area calculation of the image to be detected comprises the following steps:

performing ROI (region of interest) region calculation on the image to be detected according to the face region detected in the image sequence at the previous time to obtain an estimated region of a face in the image to be detected;

inputting the estimated region of the face in the image to be detected into a lightweight deep neural network for face recognition so as to detect the face region from the image to be detected.

The lightweight deep neural network for face recognition comprises:

the BP-Net network is used for obtaining a candidate region of a human face in an input image;

the BR-Net network is used for training the candidate region of the face and removing a non-face region from the candidate region;

and the BO-Net network is used for positioning key parts of the human face in the candidate region without the non-human face region and obtaining the human face region according to the positioning result of the key parts of the human face.

And a network cascade structure is formed among the BP-Net network, the BR-Net network and the BO-Net network, each network comprises one or more bottleneck convolution units, and the bottleneck convolution units are used for performing convolution processing operation on an input image.

According to a second aspect, an embodiment provides a face detection apparatus, comprising:

the image acquisition unit is used for acquiring an image to be detected in an image sequence;

the judging unit is used for selecting the processing mode of the image to be detected according to the face detection result in the image sequence at the previous time;

the face recognition processing unit is used for carrying out face recognition processing on the image to be detected when no face region is detected in the image sequence at the current time;

the face tracking processing unit is used for carrying out face tracking processing on the image to be detected when a face region is detected in the image sequence at the present time;

and the output unit is used for outputting the human face area in the image to be detected according to the processing result.

According to a third aspect, an embodiment provides a computer-readable storage medium, characterized in that it comprises a program executable by a processor to implement the method according to the first aspect.

The beneficial effect of this application is:

a face detection method, a face detection apparatus, and a storage medium according to the above embodiments. On the first hand, a selection mechanism is added in the proposed face detection method, and a better processing method is selected from face recognition processing and face tracking processing according to the previous detection result, so that the practical effect of the face detection method is enhanced; in the second aspect, because the lightweight deep neural network for face recognition is introduced in the face recognition processing process, the face region is effectively recognized and positioned through the BP-Net network, the BR-Net network and the BO-Net network, and the face detection accuracy is favorably improved; in the third aspect, a KCF target tracking algorithm is introduced in the face tracking process, so that the detection process of the face area is rapid, and the output efficiency of the face area is improved; in the fourth aspect, because a face confidence detection method is introduced in the face region output process, the drift problem generated in the face tracking stage is solved, and the face region tracking deviation is corrected, so that the output accuracy of the face region is improved; in the fifth aspect, the method for predicting the ROI is added on the basis of the lightweight deep neural network construction, and the possible area of the face in the next frame of image can be predicted according to the position of the face in the previous frame of image, so that the face identification can be carried out on the possible area. In addition, the face detection device has the advantages of simple structure and stable algorithm, and is favorable for face detection operation combined with an embedded hardware platform.

Drawings

FIG. 1 is a block diagram of a face detection apparatus according to an embodiment;

FIG. 2 is a flow chart of a face detection method according to an embodiment;

FIG. 3 is a flow chart of face confidence determination;

FIG. 4 is a flow chart of a face recognition process;

FIG. 5 is a flow chart of a face tracking process;

FIG. 6 is a block diagram of a face detection apparatus according to another embodiment;

FIG. 7 is a flow chart of a face detection method according to another embodiment;

FIG. 8 is a structure of a deep neural network for face confidence calculation;

FIG. 9 is a structure of a BP-Net network;

FIG. 10 is a structure of a BR-Net network;

FIG. 11 is a structure of a BO-Net network;

fig. 12 is a structure of bottleneck convolution.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.

Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.

The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).

The first embodiment is as follows:

referring to fig. 1, the present application discloses a face detection apparatus 1, which includes an image acquisition unit 11, a determination unit 12, a face recognition processing unit 13, a face tracking unit 14, and an output unit 15, which are respectively described below.

The image acquiring unit 11 is configured to acquire an image to be detected in an image sequence, in an embodiment, the image acquiring unit 11 acquires a frame of image from a video stream, and uses the frame of image as the image to be detected, where the video stream may be a video shot by a monitor probe in a public place or a video shot by an electronic device such as a mobile phone or a camera, and the shot video includes a video shot in real time and a video archived in the past.

The judging unit 12 is configured to select a processing mode of the image to be detected according to a face detection result in the image sequence at the previous time. In an embodiment, the face recognition apparatus 1 sequentially processes each frame of image in the image sequence, that is, according to the time sequence of the images in the video stream, one frame of image is obtained each time and face detection is performed on the frame of image (the face detection process may refer to the face detection method in the following text), the face detection result of the previous frame of image of the image to be detected is used as the previous face detection result, and the processing mode of the image to be detected is selected according to the previous face detection result. It should be noted that, when the image to be detected is the first frame image in the video sequence, the previous face detection process is erroneous, or a face region is not output in the previous face detection result, the determining unit 12 determines whether the previous face detection result is negative, that is, a face is not detected. The implementation process of the determining unit 12 can refer to the following face detection method.

The face recognition processing unit 13 is configured to perform face recognition processing on the image to be detected when no face region is detected in the image sequence at the present time (that is, the judgment result of the judging unit 12 is no). In an embodiment, the face recognition processing unit 13 performs down-sampling processing on an image to be detected to obtain a plurality of images with different sizes; and inputting the images with different sizes into a lightweight deep neural network for face recognition so as to detect a face region from the image to be detected.

The face tracking processing unit 14 is configured to, when a face region is currently detected in the image sequence (that is, the determination result of the determining unit 12 is yes), perform face tracking processing on the image to be detected. In one embodiment, the face tracking processing unit 14 obtains a face region detected in the image sequence at the previous time; and performing KCF target tracking processing on the face region detected in the image sequence in the image to be detected at the previous time to obtain the face region in the image to be detected.

The output unit 15 is used for outputting the face region in the image to be detected according to the processing result. In one embodiment, the output unit 15 performs rectangular marking on the face region and displays the rectangular marking of the face region on the image to be detected.

It should be understood by those skilled in the art that the face detection apparatus 1 has a fast processing speed (typically, a processing speed of several tens of frames per second to several hundreds of frames per second) for each frame of image in the video stream, and then the output unit 15 can continuously output the face region, and when the user observes the video stream and the face region through the display interface, the user will see the effect of dynamic movement of the rectangular mark of the face region in the video stream.

Accordingly, referring to fig. 2, the present application further discloses a face detection method, which includes steps S100 to S500, which are described below.

Step S100, an image to be detected is obtained in an image sequence. In one embodiment, the image acquiring unit 11 acquires a frame of image in a video stream, and takes the frame of image as the image to be detected.

And S200, selecting a processing mode of the image to be detected according to the face detection result in the image sequence at the previous time. In an embodiment, the determining unit 12 sequentially processes each frame of image in the image sequence, uses the face detection result of the previous frame of image of the image to be detected as the previous face detection result, and selects the processing mode of the image to be detected according to the previous face detection result, as shown in fig. 3, where the step S200 may include steps S210-S270.

In step S210, the determining unit 12 obtains the previous frame image of the image to be detected in the image sequence.

In step S220, the determining unit 12 determines whether a face region is output in the previous frame of image, if so, the process goes to step S230, otherwise, the process goes to step S270. It should be noted that, when the image to be detected is the first frame image in the video sequence and a previous face detection process is wrong, the determining unit 12 also determines that a face was not detected in the previous time.

In step S230, the face region output from the previous frame of image is scaled to obtain a scaled image, and preferably, the face region output from the previous frame of image is scaled to an image with 12 × 12 pixels.

In step S240, the scaled image is input to a deep neural network (which may be represented by the symbol FCNET) for face confidence calculation, so as to obtain a face confidence (which may be represented by the symbol C). In an embodiment, see fig. 8, the face confidence computed deep neural network FCNET includes one or more bottleneck convolution units (preferably four bottleneck convolution units, two 16-channel and two 24-channel, respectively), where the bottleneck convolution units are mainly used for performing convolution processing operation on the input image; the specific structure of each bottleneck convolution unit can be shown in fig. 12, where BN is a normalization processing function, which is used for performing normalization processing on each neuron, and belongs to the prior art; RELU is an activation function for ensuring the efficiency of the training process, which belongs to the prior art and will not be described in detail here. In order to improve the accuracy of the calculation process of the face confidence, in this embodiment, a 3 × 3 × 3 filter, a 32-channel 2d convolution structure (the 2d convolution structure is mainly used for feature extraction), and a 1 × 1 convolution operation unit are further added to the deep neural network for face confidence calculation.

Step S250, comparing the face confidence C with a preset threshold (which may be represented by FT), and if the face confidence C is greater than the preset threshold FT, entering step S260, otherwise entering step S270. It should be noted that, for accurate determination, the threshold value preset in the present embodiment is preferably 0.93.

In step S260, the face detection result of the previous frame of image is considered as the detected face, that is, the judgment result of the judgment unit 12 is yes.

In step S270, the face detection result of the previous frame image is considered as the undetected face, that is, the determination structure of the determining unit 12 is no.

And step S300, when no face region is detected in the image sequence at the current time, carrying out face recognition processing on the image to be detected. In an embodiment, the face recognition processing unit 13 performs down-sampling on the image to be detected to obtain a plurality of images with different sizes, and inputs each of the images with different sizes to a lightweight deep neural network for face recognition to detect a face region from the image to be detected, so that the step S300 may include steps S310 to S330, which are respectively described as follows.

In step S310, the face recognition processing unit 13 performs downsampling on the image to be detected to form an image pyramid, and preferably divides the image pyramid into three levels to form images with a resolution of 48 × 48, a resolution of 24 × 24, and a resolution of 12 × 12, so as to obtain a plurality of images with different sizes. The images with different resolutions are used for adapting to the input requirements of different network structures.

In step S320, the face recognition processing unit 13 inputs each of the images of different sizes to a lightweight deep neural network (which may be denoted by the symbol bfaceet) for face recognition. The lightweight deep neural network bfaceet is a neural network with a small number of network formation layers, and is usually a network with 10 layers or less.

In one embodiment, the lightweight deep neural network BFACENET includes a BP-Net network, a BR-Net network, and a BO-Net network. The convolution framework of the BP-Net network is shown in table 1, and the convolution structure corresponding to table 1 can be seen in fig. 9.

TABLE 1 convolution framework for BP-Net networks

Input device	Convolution operation	"expansion" multiple t	Number of channels c	Number of cells n	Span s
						12x12x3	Conv2d	-	8	1	2
6x6x8	Convolution unit	6	16	2	2
						3x3x16	Convolution unit	6	24	2	1
3x3x24	3x3 convolution unit	-	32	1	1
						1x1x32	Conv2d 1x1	-	16	1	-

In table 1 above, t represents the expansion factor, c represents the number of output channels, n represents the number of cells, and s represents the span. Each convolution unit preferably employs a bottleneck convolution structure, and the basic composition of the bottleneck convolution structure can be seen in fig. 12. It should be noted that, in this embodiment, the BP-Net network is mainly used to obtain a face candidate window and a regression vector in an input 12 × 12 resolution image, so as to obtain a candidate region of a face.

The convolution framework of the BR-Net network is shown in table 2, and the convolution structure corresponding to table 2 can be seen in fig. 10.

TABLE 2 convolution framework of BR-Net network

Input device	Convolution operation	"expansion" times t	Number of channels c	Number of cells n	Span s
						24x24x3	Conv2d	-	8	1	2
12x12x8	Convolution unit	6	16	2	2
						6x6x16	Convolution unit	6	24	2	2
3x3x24	3x3 convolution unit	-	32	2	1
						1x1x32	Conv2d 1x1	-	96	1	-

It should be noted that the BR-Net network is mainly used for training a candidate region of a face and removing a non-face region from the candidate region. In one embodiment, the BR-Net network trains the candidate regions of the face according to the input 24 × 24 resolution images, thereby removing the non-face regions.

The convolution frame of the BO-Net network is shown in table 3, and the convolution structure corresponding to table 3 is shown in fig. 11.

TABLE 3 convolution framework for BO-Net networks

Input device	Convolution operation	"expansion" times t	Number of channels c	Number of cells n	Span s
						48x48x3	Conv2d	-	8	1	2
12x12x8	Convolution unit	6	16	2	2
						6x6x16	Convolution unit	6	24	2	2
3x3x24	3x3 convolution unit	-	48	2	1
						1x1x48	Conv2d 1x1	-	128	1	-

The BO-Net network is mainly used for locating key parts of a face in a candidate region from which a non-face region is removed, and obtaining a face region according to a locating result of the key parts of the face. In a specific embodiment, the BO-Net network locates key parts of a human face in a candidate region according to an input image with a resolution of 12 × 12, determines the human face from five key parts of the human body including the center of two eyes, the nose and two corners of the mouth, and obtains a human face region.

Those skilled in the art can understand that a network cascade structure is formed among a BP-Net network, a BR-Net network and a BO-Net network in the lightweight deep neural network bfaceet adopted in the embodiment, each network includes one or more BottleNeck convolution units (which may be represented by a symbol BottleNeck) used for performing convolution processing operation on an input image, and the BottleNeck convolution unit BottleNeck has a simple structure, so that the reduction of parameters of a constructed network is facilitated, and the operation speed of face detection is increased.

The objective function for training the relevant deep neural network model in this embodiment is:

formulae (1-1) to (1-4), y_iSample labels, p, representing faces_iTo obtain the probability of the face; det is a face classification task (or a regression task for face detection), box is a bounding box regression task (or a judgment task for face), and mark represents a key point positioning task; is a direct change_jThe weight of the loss of the three tasks of face classification, bounding box regression and key point positioning at the current stage (preferably ∈ in the embodiment)_det＝0.5、∝_box＝0.25、∝_mark＝0.25)；

Indicating scalar for whether it is a face1 represents that a face exists, and 0 represents that no face exists; i. j represents the sequence number of the current task, the upper mark represents the current task category, and the lower mark represents the stage of the current task; l is a loss function; the double absolute value symbol represents the quadratic norm calculation; in { } denotes set operation.

It should be noted that, as can be seen from the formulas (1-1) to (1-4), the result of the upper layer sub-network is used by the sub-network of the next layer, so as to achieve the effect of mutual cascade connection among the BP-Net network, the BR-Net network and the BO-Net network.

It should be noted that the face classification in fig. 9, 10 and 11 is a 1x1x2 vector, that is, it has a result representation of 1 or 0; the bounding box regression is a 1x1x4 vector, mainly outputting the coordinates of the bounding box; the key point location is a 1x1x10 vector, and mainly outputs the coordinates of 5 key points of the face.

Step S330. The face recognition processing unit 13 detects a face region from the image to be detected. In an embodiment, the face recognition processing unit 13 obtains a face region according to the key part of the face located in the BO-Net network, as the detected face region.

And step S400, when a face region is detected in the image sequence at the previous time, carrying out face tracking processing on the image to be detected. In an embodiment, the face tracking processing unit 14 obtains a face region detected in the image sequence at the previous time, and performs KCF target tracking processing on the face region detected in the image sequence at the previous time in the image to be detected, so as to obtain the face region in the image to be detected, then, the step S400 may include steps S410-S430.

Step S410, a face region detected in the image sequence at the previous time is obtained. In a specific embodiment, in the case where a face region was detected at the previous time, the face tracking processing unit 14 acquires a face region detected in an image of the frame immediately preceding the image to be detected in the video stream.

And step S420, performing KCF target tracking processing in the image to be detected. In a specific embodiment, the face tracking processing unit 14 inputs the face region detected in the previous frame of image and the image to be detected into the KCF target tracking processing algorithm together, and performs target tracking on the face region detected in the previous frame of image in the image to be detected, so as to obtain the face region in the image to be detected.

It should be noted that an algorithm for KCF target tracking processing is commonly used in the field of image processing, and is often used for tracking and analyzing a target object in an image, and this method generally trains a target detector in the tracking process, uses the target detector to detect whether a predicted position of a next frame is a target, and then uses a new detection result to update a training set to update the target detector. When training the target detector, the target area is generally selected as a positive sample, the area around the target is a negative sample, and the area closer to the target is more likely to be a positive sample. By formally applying the characteristic of the KCF target tracking processing algorithm, the embodiment achieves the purpose of tracking the face area in the image to be detected according to the face area in the previous frame of image. Since the KCF target tracking processing algorithm is a prior art, it will not be described in detail here.

And step S430, acquiring a face region in the image to be detected. In a specific embodiment, the face target tracking processing unit 14 identifies a face region obtained according to a KCF target tracking processing algorithm, so as to obtain the face region in the image to be detected.

It should be noted that, in the process of tracking the face region, the KCF target tracking processing algorithm inevitably generates drift, which may cause the accuracy of target tracking to decrease, and in order to avoid this situation, before the face target tracking processing is performed on the next frame image of the image to be detected, the face confidence calculation in step S200 is performed on the face region in the image to be detected, and the drift problem caused by the KCF target tracking algorithm can be effectively prevented by the face confidence, so that a more accurate judgment basis is provided for the judgment result in step S200.

And S500, outputting the human face area in the image to be detected according to the processing result. In one embodiment, referring to fig. 1, the output unit 15 performs rectangular labeling on the face region detected by the face recognition processing unit 13 in step S330 or the face region detected by the face tracking processing unit 14 in step S430, continuously outputs each frame image in the image sequence, outputs the current frame image and the rectangular label of the face region on the image if the face region is detected in the output current frame image, and outputs only the current frame image if the face region is not detected in the output current frame image.

Example two:

referring to fig. 6, the present application further discloses another embodiment of a face detection apparatus 2, which includes the face detection apparatus 1 in the first embodiment, and further includes a frame number judgment unit 16 and an ROI region calculation unit 17, which are respectively described below.

The frame number judging unit 16 is located between the detecting and judging unit 12 and the face tracking processing unit 14, and is configured to count each detected frame image when a face region is detected in the image sequence, perform ROI region calculation on the image to be detected when a counting result exceeds a preset frame number (the preset frame number may be represented by a symbol T, and is preferably a numerical value in a range of 48 to 128), and otherwise perform face tracking processing on the image to be detected and remove the counting result for the next round of counting.

The ROI region calculating unit 17 is connected to the frame number judging unit 16 and the face recognition processing unit 13, and configured to perform ROI region calculation on the image to be detected according to the face region detected in the image sequence at the previous time when the frame number judging unit 16 judges that the calculation result exceeds the preset frame number, so as to obtain an estimated region of the face in the image to be detected. Then, the ROI region calculating unit 17 inputs the estimated region of the face in the image to be detected into the face recognition processing unit 13, so as to perform a processing process of the lightweight deep neural network for face recognition in the estimated region of the face, thereby detecting the face region in the image to be detected.

In the field of image processing, a region of interest (ROI) is a region selected from an image, and this region is to be regarded as a focus of image analysis, and is defined for further processing.

It should be noted that, the KCF target tracking processing algorithm adopted in the face tracking processing unit 14 may cause a drift problem when continuously processing images for a long time, which will seriously affect the detection effect of the face region, and the frame number judging unit 16 may perform ROI region calculation on the image to be detected after the face tracking processing unit 14 continuously performs certain face tracking processing times, so as to perform face recognition processing in the ROI region, which is beneficial to quickly detect an accurate face region in a smaller image region, thereby achieving the purpose of performing position correction on the face region, and avoiding the possibility of errors and the drift problem when the face tracking processing unit 14 performs KCF target tracking processing on the face region.

Referring to fig. 7, the present embodiment correspondingly discloses another face detection method, which includes steps S100 to S600.

The face detection method in the second embodiment is different from the face detection method in the first embodiment by the addition of the step S600, and the step S600 may include steps S610 to S630, which are described below.

Step S610, which is located before step S400, can be referred to as a frame number determination step. In one embodiment, the frame number judging step includes:

when the face region is detected in the image sequence (i.e. when the first judgment result of the detection judgment unit 12 is yes), the detected frame images are counted, and when the counting result exceeds the preset frame number T (preferably, a value in the range of 48 to 128 is adopted), the step S620 is performed, otherwise, the step S400 is performed.

Note that, in order to continue the counting judgment function in step S610, when proceeding to step S400, the frame number judgment unit 16 clears the counting result to perform the next round of counting, that is, when the judgment result of the detection judgment unit 12 is yes again, the frame number judgment unit 16 restarts counting.

Step S620, according to the face region detected in the image sequence in the previous time, the ROI region calculation is carried out on the image to be detected, and the estimated region of the face in the image to be detected is obtained. In one embodiment, the calculation process is:

ROI_W＝T1*FACE_W (2-3)

ROI_H＝T2*FACE_H (2-4)

in the above formula, ROI_XIs the x coordinate of the pixel point at the upper left corner of the region of interest, ROI_YIs the y coordinate of the pixel point at the upper left corner of the region of interest, then (FACE)_X，FACE_Y) Is the coordinate of the pixel point at the upper left corner of the face region detected in the previous frame of image. FACE_WAnd FACE_HRespectively, the width and height of the face region detected in the previous frame image, ROI_WAnd ROI_HThe height and width of the region of interest to be calculated; t1 and T2 are thresholds that are set by the user (empirically, T1 is preferably set to 2.5 and T2 is preferably set to 1.6). If ROI_X、 ROI_Y、FACE_WAnd FACE_HIf the value of (2) exceeds the image boundary, the coordinate value of the image boundary is used as the real value.

Step S630, inputting the estimated region of the face in the image to be detected into a lightweight deep neural network for face recognition, so as to detect the face region from the image to be detected. In an embodiment, the ROI region calculating unit 17 inputs the estimated region of the face in the image to be detected into the face identifying unit 13, and inputs the estimated region into the lightweight deep neural network bfaceet for face identification, so as to obtain the face region in the image to be detected. The configuration and calculation method of the lightweight deep neural network for face recognition may refer to step S300 in the first embodiment, and will not be described here.

Those skilled in the art will appreciate that the method using the lightweight deep neural network in step S300 requires scanning the whole image to be detected, and locating the face region by traversing the whole image, thereby increasing the invalid traversal time. In step S630, the characteristic of time correlation in the image sequence is utilized, and the method of adopting the lightweight deep neural network only needs to scan the image in the face estimation region, so that the traversal range of the face detection is reduced from the whole image to the estimation range, the useless face sliding can be reduced, the interference of the non-face region is removed, and the operation speed of the algorithm is further improved.

Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.

The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims

1. A face detection method is characterized by comprising the following steps:

acquiring an image to be detected in an image sequence;

selecting a processing mode of the image to be detected according to a face detection result in the image sequence at the previous time, counting each detected frame image when the face is detected in the image sequence at the previous time, performing region-of-interest calculation on the image to be detected when the counting result exceeds a preset frame number, and performing face tracking processing on the image to be detected and clearing the counting result to perform the next round of counting when the counting result does not exceed the preset frame number;

when no human face is detected in the image sequence at the present time, carrying out human face recognition processing on the image to be detected;

2. The method for detecting human face according to claim 1, wherein the selecting the processing mode of the image to be detected according to the human face detection result in the image sequence at the previous time comprises:

3. The method of claim 2, wherein the using the face detection result of the previous frame of image of the image to be detected as the previous face detection result comprises:

4. The method of claim 3, wherein the inputting the face region of the previous frame of image into a deep neural network for face confidence calculation to obtain the face confidence comprises:

zooming the face area of the previous frame of image to obtain a zoomed image;

5. The method for detecting human face according to claim 1, wherein said performing human face recognition processing on the image to be detected comprises:

6. The method for detecting human face according to claim 1, wherein said performing human face tracking processing on the image to be detected comprises:

acquiring a face region detected in the image sequence at the previous time;

7. The face detection method of claim 1, wherein the performing of the region-of-interest calculation on the image to be detected comprises:

calculating the region of interest of the image to be detected according to the face region detected in the image sequence at the previous time to obtain an estimated region of the face in the image to be detected;

8. The face detection method of claim 5 or 7, wherein the lightweight deep neural network for face recognition comprises:

9. The face detection method of claim 8, wherein a network cascade structure is formed among the BP-Net network, the BR-Net network and the BO-Net network, and each network includes one or more bottleneck convolution units, and the bottleneck convolution units are configured to perform convolution processing operations on an input image.

10. A face detection device is characterized by comprising an image acquisition unit, a judgment unit, a face recognition processing unit, a frame number judgment unit, an ROI region calculation unit, a face tracking processing unit and an output unit:

the face recognition processing unit is used for carrying out face recognition processing on the image to be detected when no face region is detected in the image sequence at the present time;

the frame number judging unit is used for counting the detected frame images when a human face area is detected in the image sequence and judging whether the counting result exceeds a preset frame number;

when the counting result exceeds the preset frame number, the ROI is calculated on the image to be detected through the ROI area calculation unit, otherwise, the face tracking processing unit is used for carrying out face tracking processing on the image to be detected, and the frame number judgment unit is used for clearing the counting result to carry out the next round of counting;

11. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1-9.