CN109271848B - Face detection method, face detection device and storage medium - Google Patents

Face detection method, face detection device and storage medium Download PDF

Info

Publication number
CN109271848B
CN109271848B CN201810866324.1A CN201810866324A CN109271848B CN 109271848 B CN109271848 B CN 109271848B CN 201810866324 A CN201810866324 A CN 201810866324A CN 109271848 B CN109271848 B CN 109271848B
Authority
CN
China
Prior art keywords
face
image
detected
region
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810866324.1A
Other languages
Chinese (zh)
Other versions
CN109271848A (en
Inventor
孙晓航
袁誉乐
曾强
高飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Tian'a Intelligent Technology Co ltd
Original Assignee
Shenzhen Tian'a Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Tian'a Intelligent Technology Co ltd filed Critical Shenzhen Tian'a Intelligent Technology Co ltd
Priority to CN201810866324.1A priority Critical patent/CN109271848B/en
Publication of CN109271848A publication Critical patent/CN109271848A/en
Application granted granted Critical
Publication of CN109271848B publication Critical patent/CN109271848B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A face detection method, a face detection device and a storage medium. On the first hand, the human face detection method adopts a selection mechanism, and selects a better processing method from human face recognition processing and human face tracking processing through the previous detection result, thereby being beneficial to enhancing the practical effect of the human face detection method; in the second aspect, because the lightweight deep neural network for face recognition is introduced in the face recognition processing process, the face region can be effectively recognized and positioned, and the detection accuracy is favorably improved; in the third aspect, the face confidence is introduced, so that the problem of drift generated in the face tracking stage is solved, the tracking deviation is corrected, and the output accuracy of the face region is improved; in the fourth aspect, the ROI prediction method is added on the basis of the lightweight deep neural network construction, so that the situation that time consumption is long due to face recognition of the whole image can be avoided, the execution speed of face recognition processing is improved, and the system overhead is reduced.

Description

Face detection method, face detection device and storage medium
Technical Field
The present invention relates to a face detection technology, and in particular, to a face detection method, a face detection apparatus, and a storage medium.
Background
With the development of electronic technology, face detection and recognition become the most potential means of biometric authentication, and an automatic face recognition system is required to have a certain recognition capability for general images, so that a series of problems faced by the system make face detection start to be an important research subject. Currently, face detection is a key link in an automatic face recognition system, the application background of the face detection is far beyond the scope of the face recognition system, and the face detection has important application value in the aspects of content-based retrieval, digital video processing, video detection and the like.
The face detection is a necessary preprocessing step in the fields of face beautification, face special effects, face recognition, face attribute analysis, fatigue driving detection and the like, so that the face detection has high commercial value and application value. However, in practical application, due to the influence of factors such as facial expression change, hair shielding, ornament shielding, ambient light change, body angle change, imaging conditions and the like, the face detection still faces a great technical challenge, and a related face detection algorithm can provide guarantee for the practical application effect of the face detection only when being further improved.
Currently, face detection algorithms can be simply divided into: face detection based on skin color, face detection based on geometric features, face detection based on statistical learning, and face detection based on deep learning. In addition, the human face detection based on the deep learning mostly achieves the detection purpose by means of a deep neural network, and in comparison, the human face detection method is high in detection accuracy, obvious in optimization effect and wide in development prospect. For example, a method for generating a candidate region by a sliding window is abandoned based on the face detection of the RCNN series, and a propofol method is used, and although the method can obtain a high detection rate, the method has the defects of complex deep neural network structure and low detection speed; face feature extraction and classification in face detection based on the cascade CNN mode are often completed by CNN in a unified manner, 6 CNNs are required to be arranged in a cascade structure, and 3 CNNs are used for carrying out classification judgment on faces and non-faces, so that time consumption of classification judgment is increased, and quick realization effect of face detection is not facilitated.
Disclosure of Invention
The invention mainly solves the technical problem of how to improve the detection speed and the detection precision of the face detection based on deep learning. In order to solve the above technical problems, the present application provides a face detection method and a device thereof.
According to a first aspect, an embodiment provides a face detection method, including the following steps:
acquiring an image to be detected in an image sequence;
selecting a processing mode of the image to be detected according to a face detection result in the image sequence at the previous time, carrying out face tracking processing on the image to be detected when the face is detected in the image sequence at the previous time, and carrying out face recognition processing on the image to be detected if the face is not detected in the image sequence at the previous time;
and outputting the human face area in the image to be detected according to the processing result.
The method for selecting the processing mode of the image to be detected according to the face detection result in the image sequence at the previous time comprises the following steps:
and sequentially processing each frame of image in the image sequence, taking a face detection result of a previous frame of image of the image to be detected as a previous face detection result, and selecting a processing mode of the image to be detected according to the previous face detection result.
The human face detection result of the last frame of image to be detected is used as the previous human face detection result, and the method comprises the following steps:
acquiring a face region output in a previous frame of image of the image to be detected;
inputting the face region output by the previous frame of image into a deep neural network for face confidence calculation to obtain a face confidence;
and comparing the face confidence with a preset threshold, wherein when the face confidence exceeds the preset threshold, the face detection result of the previous frame of image is the detected face, otherwise, the face detection result of the previous frame of image is the undetected face.
The step of inputting the face region of the previous frame of image into a deep neural network for face confidence calculation to obtain a face confidence includes:
zooming the face area of the previous frame of image to obtain a zoomed image;
inputting the zoomed image into a deep neural network for face confidence calculation to obtain a face confidence; the deep neural network for face confidence calculation comprises one or more bottleneck convolution units, and the bottleneck convolution units are used for performing convolution processing operation on input images.
The right treat waiting to examine the image and carry out face identification and handle, include:
performing down-sampling processing on the image to be detected to obtain a plurality of images with different sizes;
and inputting the images with different sizes into a lightweight deep neural network for face recognition so as to detect a face region from the image to be detected.
The right the image to be detected is carried out face tracking processing, including:
acquiring a face region detected in the image sequence at the previous time;
and performing KCF target tracking processing on the face area detected in the image sequence in the image to be detected at the previous time to obtain the face area in the image to be detected.
Before the face tracking processing is carried out on the image to be detected, the method also comprises a frame number judging step, wherein the frame number judging step comprises the following steps:
and counting the detected frame images when the face region is detected in the image sequence, carrying out ROI region calculation on the image to be detected when the counting result exceeds a preset frame number, and carrying out face tracking processing on the image to be detected and clearing the counting result to carry out the next round of counting.
The ROI area calculation of the image to be detected comprises the following steps:
performing ROI (region of interest) region calculation on the image to be detected according to the face region detected in the image sequence at the previous time to obtain an estimated region of a face in the image to be detected;
inputting the estimated region of the face in the image to be detected into a lightweight deep neural network for face recognition so as to detect the face region from the image to be detected.
The lightweight deep neural network for face recognition comprises:
the BP-Net network is used for obtaining a candidate region of a human face in an input image;
the BR-Net network is used for training the candidate region of the face and removing a non-face region from the candidate region;
and the BO-Net network is used for positioning key parts of the human face in the candidate region without the non-human face region and obtaining the human face region according to the positioning result of the key parts of the human face.
And a network cascade structure is formed among the BP-Net network, the BR-Net network and the BO-Net network, each network comprises one or more bottleneck convolution units, and the bottleneck convolution units are used for performing convolution processing operation on an input image.
According to a second aspect, an embodiment provides a face detection apparatus, comprising:
the image acquisition unit is used for acquiring an image to be detected in an image sequence;
the judging unit is used for selecting the processing mode of the image to be detected according to the face detection result in the image sequence at the previous time;
the face recognition processing unit is used for carrying out face recognition processing on the image to be detected when no face region is detected in the image sequence at the current time;
the face tracking processing unit is used for carrying out face tracking processing on the image to be detected when a face region is detected in the image sequence at the present time;
and the output unit is used for outputting the human face area in the image to be detected according to the processing result.
According to a third aspect, an embodiment provides a computer-readable storage medium, characterized in that it comprises a program executable by a processor to implement the method according to the first aspect.
The beneficial effect of this application is:
a face detection method, a face detection apparatus, and a storage medium according to the above embodiments. On the first hand, a selection mechanism is added in the proposed face detection method, and a better processing method is selected from face recognition processing and face tracking processing according to the previous detection result, so that the practical effect of the face detection method is enhanced; in the second aspect, because the lightweight deep neural network for face recognition is introduced in the face recognition processing process, the face region is effectively recognized and positioned through the BP-Net network, the BR-Net network and the BO-Net network, and the face detection accuracy is favorably improved; in the third aspect, a KCF target tracking algorithm is introduced in the face tracking process, so that the detection process of the face area is rapid, and the output efficiency of the face area is improved; in the fourth aspect, because a face confidence detection method is introduced in the face region output process, the drift problem generated in the face tracking stage is solved, and the face region tracking deviation is corrected, so that the output accuracy of the face region is improved; in the fifth aspect, the method for predicting the ROI is added on the basis of the lightweight deep neural network construction, and the possible area of the face in the next frame of image can be predicted according to the position of the face in the previous frame of image, so that the face identification can be carried out on the possible area. In addition, the face detection device has the advantages of simple structure and stable algorithm, and is favorable for face detection operation combined with an embedded hardware platform.
Drawings
FIG. 1 is a block diagram of a face detection apparatus according to an embodiment;
FIG. 2 is a flow chart of a face detection method according to an embodiment;
FIG. 3 is a flow chart of face confidence determination;
FIG. 4 is a flow chart of a face recognition process;
FIG. 5 is a flow chart of a face tracking process;
FIG. 6 is a block diagram of a face detection apparatus according to another embodiment;
FIG. 7 is a flow chart of a face detection method according to another embodiment;
FIG. 8 is a structure of a deep neural network for face confidence calculation;
FIG. 9 is a structure of a BP-Net network;
FIG. 10 is a structure of a BR-Net network;
FIG. 11 is a structure of a BO-Net network;
fig. 12 is a structure of bottleneck convolution.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. Wherein like elements in different embodiments are numbered with like associated elements. In the following description, numerous details are set forth in order to provide a better understanding of the present application. However, those skilled in the art will readily recognize that some of the features may be omitted or replaced with other elements, materials, methods in different instances. In some instances, certain operations related to the present application have not been shown or described in detail in order to avoid obscuring the core of the present application from excessive description, and it is not necessary for those skilled in the art to describe these operations in detail, so that they may be fully understood from the description in the specification and the general knowledge in the art.
Furthermore, the features, operations, or characteristics described in the specification may be combined in any suitable manner to form various embodiments. Also, the various steps or actions in the method descriptions may be transposed or transposed in order, as will be apparent to one of ordinary skill in the art. Thus, the various sequences in the specification and drawings are for the purpose of describing certain embodiments only and are not intended to imply a required sequence unless otherwise indicated where such sequence must be followed.
The numbering of the components as such, e.g., "first", "second", etc., is used herein only to distinguish the objects as described, and does not have any sequential or technical meaning. The term "connected" and "coupled" when used in this application, unless otherwise indicated, includes both direct and indirect connections (couplings).
The first embodiment is as follows:
referring to fig. 1, the present application discloses a face detection apparatus 1, which includes an image acquisition unit 11, a determination unit 12, a face recognition processing unit 13, a face tracking unit 14, and an output unit 15, which are respectively described below.
The image acquiring unit 11 is configured to acquire an image to be detected in an image sequence, in an embodiment, the image acquiring unit 11 acquires a frame of image from a video stream, and uses the frame of image as the image to be detected, where the video stream may be a video shot by a monitor probe in a public place or a video shot by an electronic device such as a mobile phone or a camera, and the shot video includes a video shot in real time and a video archived in the past.
The judging unit 12 is configured to select a processing mode of the image to be detected according to a face detection result in the image sequence at the previous time. In an embodiment, the face recognition apparatus 1 sequentially processes each frame of image in the image sequence, that is, according to the time sequence of the images in the video stream, one frame of image is obtained each time and face detection is performed on the frame of image (the face detection process may refer to the face detection method in the following text), the face detection result of the previous frame of image of the image to be detected is used as the previous face detection result, and the processing mode of the image to be detected is selected according to the previous face detection result. It should be noted that, when the image to be detected is the first frame image in the video sequence, the previous face detection process is erroneous, or a face region is not output in the previous face detection result, the determining unit 12 determines whether the previous face detection result is negative, that is, a face is not detected. The implementation process of the determining unit 12 can refer to the following face detection method.
The face recognition processing unit 13 is configured to perform face recognition processing on the image to be detected when no face region is detected in the image sequence at the present time (that is, the judgment result of the judging unit 12 is no). In an embodiment, the face recognition processing unit 13 performs down-sampling processing on an image to be detected to obtain a plurality of images with different sizes; and inputting the images with different sizes into a lightweight deep neural network for face recognition so as to detect a face region from the image to be detected.
The face tracking processing unit 14 is configured to, when a face region is currently detected in the image sequence (that is, the determination result of the determining unit 12 is yes), perform face tracking processing on the image to be detected. In one embodiment, the face tracking processing unit 14 obtains a face region detected in the image sequence at the previous time; and performing KCF target tracking processing on the face region detected in the image sequence in the image to be detected at the previous time to obtain the face region in the image to be detected.
The output unit 15 is used for outputting the face region in the image to be detected according to the processing result. In one embodiment, the output unit 15 performs rectangular marking on the face region and displays the rectangular marking of the face region on the image to be detected.
It should be understood by those skilled in the art that the face detection apparatus 1 has a fast processing speed (typically, a processing speed of several tens of frames per second to several hundreds of frames per second) for each frame of image in the video stream, and then the output unit 15 can continuously output the face region, and when the user observes the video stream and the face region through the display interface, the user will see the effect of dynamic movement of the rectangular mark of the face region in the video stream.
Accordingly, referring to fig. 2, the present application further discloses a face detection method, which includes steps S100 to S500, which are described below.
Step S100, an image to be detected is obtained in an image sequence. In one embodiment, the image acquiring unit 11 acquires a frame of image in a video stream, and takes the frame of image as the image to be detected.
And S200, selecting a processing mode of the image to be detected according to the face detection result in the image sequence at the previous time. In an embodiment, the determining unit 12 sequentially processes each frame of image in the image sequence, uses the face detection result of the previous frame of image of the image to be detected as the previous face detection result, and selects the processing mode of the image to be detected according to the previous face detection result, as shown in fig. 3, where the step S200 may include steps S210-S270.
In step S210, the determining unit 12 obtains the previous frame image of the image to be detected in the image sequence.
In step S220, the determining unit 12 determines whether a face region is output in the previous frame of image, if so, the process goes to step S230, otherwise, the process goes to step S270. It should be noted that, when the image to be detected is the first frame image in the video sequence and a previous face detection process is wrong, the determining unit 12 also determines that a face was not detected in the previous time.
In step S230, the face region output from the previous frame of image is scaled to obtain a scaled image, and preferably, the face region output from the previous frame of image is scaled to an image with 12 × 12 pixels.
In step S240, the scaled image is input to a deep neural network (which may be represented by the symbol FCNET) for face confidence calculation, so as to obtain a face confidence (which may be represented by the symbol C). In an embodiment, see fig. 8, the face confidence computed deep neural network FCNET includes one or more bottleneck convolution units (preferably four bottleneck convolution units, two 16-channel and two 24-channel, respectively), where the bottleneck convolution units are mainly used for performing convolution processing operation on the input image; the specific structure of each bottleneck convolution unit can be shown in fig. 12, where BN is a normalization processing function, which is used for performing normalization processing on each neuron, and belongs to the prior art; RELU is an activation function for ensuring the efficiency of the training process, which belongs to the prior art and will not be described in detail here. In order to improve the accuracy of the calculation process of the face confidence, in this embodiment, a 3 × 3 × 3 filter, a 32-channel 2d convolution structure (the 2d convolution structure is mainly used for feature extraction), and a 1 × 1 convolution operation unit are further added to the deep neural network for face confidence calculation.
Step S250, comparing the face confidence C with a preset threshold (which may be represented by FT), and if the face confidence C is greater than the preset threshold FT, entering step S260, otherwise entering step S270. It should be noted that, for accurate determination, the threshold value preset in the present embodiment is preferably 0.93.
In step S260, the face detection result of the previous frame of image is considered as the detected face, that is, the judgment result of the judgment unit 12 is yes.
In step S270, the face detection result of the previous frame image is considered as the undetected face, that is, the determination structure of the determining unit 12 is no.
And step S300, when no face region is detected in the image sequence at the current time, carrying out face recognition processing on the image to be detected. In an embodiment, the face recognition processing unit 13 performs down-sampling on the image to be detected to obtain a plurality of images with different sizes, and inputs each of the images with different sizes to a lightweight deep neural network for face recognition to detect a face region from the image to be detected, so that the step S300 may include steps S310 to S330, which are respectively described as follows.
In step S310, the face recognition processing unit 13 performs downsampling on the image to be detected to form an image pyramid, and preferably divides the image pyramid into three levels to form images with a resolution of 48 × 48, a resolution of 24 × 24, and a resolution of 12 × 12, so as to obtain a plurality of images with different sizes. The images with different resolutions are used for adapting to the input requirements of different network structures.
In step S320, the face recognition processing unit 13 inputs each of the images of different sizes to a lightweight deep neural network (which may be denoted by the symbol bfaceet) for face recognition. The lightweight deep neural network bfaceet is a neural network with a small number of network formation layers, and is usually a network with 10 layers or less.
In one embodiment, the lightweight deep neural network BFACENET includes a BP-Net network, a BR-Net network, and a BO-Net network. The convolution framework of the BP-Net network is shown in table 1, and the convolution structure corresponding to table 1 can be seen in fig. 9.
TABLE 1 convolution framework for BP-Net networks
Input device Convolution operation "expansion" multiple t Number of channels c Number of cells n Span s
12x12x3 Conv2d - 8 1 2
6x6x8 Convolution unit 6 16 2 2
3x3x16 Convolution unit 6 24 2 1
3x3x24 3x3 convolution unit - 32 1 1
1x1x32 Conv2d 1x1 - 16 1 -
In table 1 above, t represents the expansion factor, c represents the number of output channels, n represents the number of cells, and s represents the span. Each convolution unit preferably employs a bottleneck convolution structure, and the basic composition of the bottleneck convolution structure can be seen in fig. 12. It should be noted that, in this embodiment, the BP-Net network is mainly used to obtain a face candidate window and a regression vector in an input 12 × 12 resolution image, so as to obtain a candidate region of a face.
The convolution framework of the BR-Net network is shown in table 2, and the convolution structure corresponding to table 2 can be seen in fig. 10.
TABLE 2 convolution framework of BR-Net network
Input device Convolution operation "expansion" times t Number of channels c Number of cells n Span s
24x24x3 Conv2d - 8 1 2
12x12x8 Convolution unit 6 16 2 2
6x6x16 Convolution unit 6 24 2 2
3x3x24 3x3 convolution unit - 32 2 1
1x1x32 Conv2d 1x1 - 96 1 -
It should be noted that the BR-Net network is mainly used for training a candidate region of a face and removing a non-face region from the candidate region. In one embodiment, the BR-Net network trains the candidate regions of the face according to the input 24 × 24 resolution images, thereby removing the non-face regions.
The convolution frame of the BO-Net network is shown in table 3, and the convolution structure corresponding to table 3 is shown in fig. 11.
TABLE 3 convolution framework for BO-Net networks
Input device Convolution operation "expansion" times t Number of channels c Number of cells n Span s
48x48x3 Conv2d - 8 1 2
12x12x8 Convolution unit 6 16 2 2
6x6x16 Convolution unit 6 24 2 2
3x3x24 3x3 convolution unit - 48 2 1
1x1x48 Conv2d 1x1 - 128 1 -
The BO-Net network is mainly used for locating key parts of a face in a candidate region from which a non-face region is removed, and obtaining a face region according to a locating result of the key parts of the face. In a specific embodiment, the BO-Net network locates key parts of a human face in a candidate region according to an input image with a resolution of 12 × 12, determines the human face from five key parts of the human body including the center of two eyes, the nose and two corners of the mouth, and obtains a human face region.
Those skilled in the art can understand that a network cascade structure is formed among a BP-Net network, a BR-Net network and a BO-Net network in the lightweight deep neural network bfaceet adopted in the embodiment, each network includes one or more BottleNeck convolution units (which may be represented by a symbol BottleNeck) used for performing convolution processing operation on an input image, and the BottleNeck convolution unit BottleNeck has a simple structure, so that the reduction of parameters of a constructed network is facilitated, and the operation speed of face detection is increased.
The objective function for training the relevant deep neural network model in this embodiment is:
Figure GDA0003420424670000081
Figure GDA0003420424670000082
Figure GDA0003420424670000091
Figure GDA0003420424670000092
formulae (1-1) to (1-4), yiSample labels, p, representing facesiTo obtain the probability of the face; det is a face classification task (or a regression task for face detection), box is a bounding box regression task (or a judgment task for face), and mark represents a key point positioning task; is a direct changejThe weight of the loss of the three tasks of face classification, bounding box regression and key point positioning at the current stage (preferably ∈ in the embodiment)det=0.5、∝box=0.25、∝mark=0.25);
Figure GDA0003420424670000093
Indicating scalar for whether it is a face1 represents that a face exists, and 0 represents that no face exists; i. j represents the sequence number of the current task, the upper mark represents the current task category, and the lower mark represents the stage of the current task; l is a loss function; the double absolute value symbol represents the quadratic norm calculation; in { } denotes set operation.
It should be noted that, as can be seen from the formulas (1-1) to (1-4), the result of the upper layer sub-network is used by the sub-network of the next layer, so as to achieve the effect of mutual cascade connection among the BP-Net network, the BR-Net network and the BO-Net network.
It should be noted that the face classification in fig. 9, 10 and 11 is a 1x1x2 vector, that is, it has a result representation of 1 or 0; the bounding box regression is a 1x1x4 vector, mainly outputting the coordinates of the bounding box; the key point location is a 1x1x10 vector, and mainly outputs the coordinates of 5 key points of the face.
Step S330. The face recognition processing unit 13 detects a face region from the image to be detected. In an embodiment, the face recognition processing unit 13 obtains a face region according to the key part of the face located in the BO-Net network, as the detected face region.
And step S400, when a face region is detected in the image sequence at the previous time, carrying out face tracking processing on the image to be detected. In an embodiment, the face tracking processing unit 14 obtains a face region detected in the image sequence at the previous time, and performs KCF target tracking processing on the face region detected in the image sequence at the previous time in the image to be detected, so as to obtain the face region in the image to be detected, then, the step S400 may include steps S410-S430.
Step S410, a face region detected in the image sequence at the previous time is obtained. In a specific embodiment, in the case where a face region was detected at the previous time, the face tracking processing unit 14 acquires a face region detected in an image of the frame immediately preceding the image to be detected in the video stream.
And step S420, performing KCF target tracking processing in the image to be detected. In a specific embodiment, the face tracking processing unit 14 inputs the face region detected in the previous frame of image and the image to be detected into the KCF target tracking processing algorithm together, and performs target tracking on the face region detected in the previous frame of image in the image to be detected, so as to obtain the face region in the image to be detected.
It should be noted that an algorithm for KCF target tracking processing is commonly used in the field of image processing, and is often used for tracking and analyzing a target object in an image, and this method generally trains a target detector in the tracking process, uses the target detector to detect whether a predicted position of a next frame is a target, and then uses a new detection result to update a training set to update the target detector. When training the target detector, the target area is generally selected as a positive sample, the area around the target is a negative sample, and the area closer to the target is more likely to be a positive sample. By formally applying the characteristic of the KCF target tracking processing algorithm, the embodiment achieves the purpose of tracking the face area in the image to be detected according to the face area in the previous frame of image. Since the KCF target tracking processing algorithm is a prior art, it will not be described in detail here.
And step S430, acquiring a face region in the image to be detected. In a specific embodiment, the face target tracking processing unit 14 identifies a face region obtained according to a KCF target tracking processing algorithm, so as to obtain the face region in the image to be detected.
It should be noted that, in the process of tracking the face region, the KCF target tracking processing algorithm inevitably generates drift, which may cause the accuracy of target tracking to decrease, and in order to avoid this situation, before the face target tracking processing is performed on the next frame image of the image to be detected, the face confidence calculation in step S200 is performed on the face region in the image to be detected, and the drift problem caused by the KCF target tracking algorithm can be effectively prevented by the face confidence, so that a more accurate judgment basis is provided for the judgment result in step S200.
And S500, outputting the human face area in the image to be detected according to the processing result. In one embodiment, referring to fig. 1, the output unit 15 performs rectangular labeling on the face region detected by the face recognition processing unit 13 in step S330 or the face region detected by the face tracking processing unit 14 in step S430, continuously outputs each frame image in the image sequence, outputs the current frame image and the rectangular label of the face region on the image if the face region is detected in the output current frame image, and outputs only the current frame image if the face region is not detected in the output current frame image.
Example two:
referring to fig. 6, the present application further discloses another embodiment of a face detection apparatus 2, which includes the face detection apparatus 1 in the first embodiment, and further includes a frame number judgment unit 16 and an ROI region calculation unit 17, which are respectively described below.
The frame number judging unit 16 is located between the detecting and judging unit 12 and the face tracking processing unit 14, and is configured to count each detected frame image when a face region is detected in the image sequence, perform ROI region calculation on the image to be detected when a counting result exceeds a preset frame number (the preset frame number may be represented by a symbol T, and is preferably a numerical value in a range of 48 to 128), and otherwise perform face tracking processing on the image to be detected and remove the counting result for the next round of counting.
The ROI region calculating unit 17 is connected to the frame number judging unit 16 and the face recognition processing unit 13, and configured to perform ROI region calculation on the image to be detected according to the face region detected in the image sequence at the previous time when the frame number judging unit 16 judges that the calculation result exceeds the preset frame number, so as to obtain an estimated region of the face in the image to be detected. Then, the ROI region calculating unit 17 inputs the estimated region of the face in the image to be detected into the face recognition processing unit 13, so as to perform a processing process of the lightweight deep neural network for face recognition in the estimated region of the face, thereby detecting the face region in the image to be detected.
In the field of image processing, a region of interest (ROI) is a region selected from an image, and this region is to be regarded as a focus of image analysis, and is defined for further processing.
It should be noted that, the KCF target tracking processing algorithm adopted in the face tracking processing unit 14 may cause a drift problem when continuously processing images for a long time, which will seriously affect the detection effect of the face region, and the frame number judging unit 16 may perform ROI region calculation on the image to be detected after the face tracking processing unit 14 continuously performs certain face tracking processing times, so as to perform face recognition processing in the ROI region, which is beneficial to quickly detect an accurate face region in a smaller image region, thereby achieving the purpose of performing position correction on the face region, and avoiding the possibility of errors and the drift problem when the face tracking processing unit 14 performs KCF target tracking processing on the face region.
Referring to fig. 7, the present embodiment correspondingly discloses another face detection method, which includes steps S100 to S600.
The face detection method in the second embodiment is different from the face detection method in the first embodiment by the addition of the step S600, and the step S600 may include steps S610 to S630, which are described below.
Step S610, which is located before step S400, can be referred to as a frame number determination step. In one embodiment, the frame number judging step includes:
when the face region is detected in the image sequence (i.e. when the first judgment result of the detection judgment unit 12 is yes), the detected frame images are counted, and when the counting result exceeds the preset frame number T (preferably, a value in the range of 48 to 128 is adopted), the step S620 is performed, otherwise, the step S400 is performed.
Note that, in order to continue the counting judgment function in step S610, when proceeding to step S400, the frame number judgment unit 16 clears the counting result to perform the next round of counting, that is, when the judgment result of the detection judgment unit 12 is yes again, the frame number judgment unit 16 restarts counting.
Step S620, according to the face region detected in the image sequence in the previous time, the ROI region calculation is carried out on the image to be detected, and the estimated region of the face in the image to be detected is obtained. In one embodiment, the calculation process is:
Figure GDA0003420424670000121
Figure GDA0003420424670000122
ROIW=T1*FACEW (2-3)
ROIH=T2*FACEH (2-4)
in the above formula, ROIXIs the x coordinate of the pixel point at the upper left corner of the region of interest, ROIYIs the y coordinate of the pixel point at the upper left corner of the region of interest, then (FACE)X,FACEY) Is the coordinate of the pixel point at the upper left corner of the face region detected in the previous frame of image. FACEWAnd FACEHRespectively, the width and height of the face region detected in the previous frame image, ROIWAnd ROIHThe height and width of the region of interest to be calculated; t1 and T2 are thresholds that are set by the user (empirically, T1 is preferably set to 2.5 and T2 is preferably set to 1.6). If ROIX、 ROIY、FACEWAnd FACEHIf the value of (2) exceeds the image boundary, the coordinate value of the image boundary is used as the real value.
Step S630, inputting the estimated region of the face in the image to be detected into a lightweight deep neural network for face recognition, so as to detect the face region from the image to be detected. In an embodiment, the ROI region calculating unit 17 inputs the estimated region of the face in the image to be detected into the face identifying unit 13, and inputs the estimated region into the lightweight deep neural network bfaceet for face identification, so as to obtain the face region in the image to be detected. The configuration and calculation method of the lightweight deep neural network for face recognition may refer to step S300 in the first embodiment, and will not be described here.
Those skilled in the art will appreciate that the method using the lightweight deep neural network in step S300 requires scanning the whole image to be detected, and locating the face region by traversing the whole image, thereby increasing the invalid traversal time. In step S630, the characteristic of time correlation in the image sequence is utilized, and the method of adopting the lightweight deep neural network only needs to scan the image in the face estimation region, so that the traversal range of the face detection is reduced from the whole image to the estimation range, the useless face sliding can be reduced, the interference of the non-face region is removed, and the operation speed of the algorithm is further improved.
Those skilled in the art will appreciate that all or part of the functions of the various methods in the above embodiments may be implemented by hardware, or may be implemented by computer programs. When all or part of the functions of the above embodiments are implemented by a computer program, the program may be stored in a computer-readable storage medium, and the storage medium may include: a read only memory, a random access memory, a magnetic disk, an optical disk, a hard disk, etc., and the program is executed by a computer to realize the above functions. For example, the program may be stored in a memory of the device, and when the program in the memory is executed by the processor, all or part of the functions described above may be implemented. In addition, when all or part of the functions in the above embodiments are implemented by a computer program, the program may be stored in a storage medium such as a server, another computer, a magnetic disk, an optical disk, a flash disk, or a removable hard disk, and may be downloaded or copied to a memory of a local device, or may be version-updated in a system of the local device, and when the program in the memory is executed by a processor, all or part of the functions in the above embodiments may be implemented.
The present invention has been described in terms of specific examples, which are provided to aid understanding of the invention and are not intended to be limiting. For a person skilled in the art to which the invention pertains, several simple deductions, modifications or substitutions may be made according to the idea of the invention.

Claims (11)

1. A face detection method is characterized by comprising the following steps:
acquiring an image to be detected in an image sequence;
selecting a processing mode of the image to be detected according to a face detection result in the image sequence at the previous time, counting each detected frame image when the face is detected in the image sequence at the previous time, performing region-of-interest calculation on the image to be detected when the counting result exceeds a preset frame number, and performing face tracking processing on the image to be detected and clearing the counting result to perform the next round of counting when the counting result does not exceed the preset frame number;
when no human face is detected in the image sequence at the present time, carrying out human face recognition processing on the image to be detected;
and outputting the human face area in the image to be detected according to the processing result.
2. The method for detecting human face according to claim 1, wherein the selecting the processing mode of the image to be detected according to the human face detection result in the image sequence at the previous time comprises:
and sequentially processing each frame of image in the image sequence, taking a face detection result of a previous frame of image of the image to be detected as a previous face detection result, and selecting a processing mode of the image to be detected according to the previous face detection result.
3. The method of claim 2, wherein the using the face detection result of the previous frame of image of the image to be detected as the previous face detection result comprises:
acquiring a face region output in a previous frame of image of the image to be detected;
inputting the face region output by the previous frame of image into a deep neural network for face confidence calculation to obtain a face confidence;
and comparing the face confidence with a preset threshold, wherein when the face confidence exceeds the preset threshold, the face detection result of the previous frame of image is the detected face, otherwise, the face detection result of the previous frame of image is the undetected face.
4. The method of claim 3, wherein the inputting the face region of the previous frame of image into a deep neural network for face confidence calculation to obtain the face confidence comprises:
zooming the face area of the previous frame of image to obtain a zoomed image;
inputting the zoomed image into a deep neural network for face confidence calculation to obtain a face confidence; the deep neural network for face confidence calculation comprises one or more bottleneck convolution units, and the bottleneck convolution units are used for performing convolution processing operation on input images.
5. The method for detecting human face according to claim 1, wherein said performing human face recognition processing on the image to be detected comprises:
performing down-sampling processing on the image to be detected to obtain a plurality of images with different sizes;
and inputting the images with different sizes into a lightweight deep neural network for face recognition so as to detect a face region from the image to be detected.
6. The method for detecting human face according to claim 1, wherein said performing human face tracking processing on the image to be detected comprises:
acquiring a face region detected in the image sequence at the previous time;
and performing KCF target tracking processing on the face area detected in the image sequence in the image to be detected at the previous time to obtain the face area in the image to be detected.
7. The face detection method of claim 1, wherein the performing of the region-of-interest calculation on the image to be detected comprises:
calculating the region of interest of the image to be detected according to the face region detected in the image sequence at the previous time to obtain an estimated region of the face in the image to be detected;
inputting the estimated region of the face in the image to be detected into a lightweight deep neural network for face recognition so as to detect the face region from the image to be detected.
8. The face detection method of claim 5 or 7, wherein the lightweight deep neural network for face recognition comprises:
the BP-Net network is used for obtaining a candidate region of a human face in an input image;
the BR-Net network is used for training the candidate region of the face and removing a non-face region from the candidate region;
and the BO-Net network is used for positioning key parts of the human face in the candidate region without the non-human face region and obtaining the human face region according to the positioning result of the key parts of the human face.
9. The face detection method of claim 8, wherein a network cascade structure is formed among the BP-Net network, the BR-Net network and the BO-Net network, and each network includes one or more bottleneck convolution units, and the bottleneck convolution units are configured to perform convolution processing operations on an input image.
10. A face detection device is characterized by comprising an image acquisition unit, a judgment unit, a face recognition processing unit, a frame number judgment unit, an ROI region calculation unit, a face tracking processing unit and an output unit:
the image acquisition unit is used for acquiring an image to be detected in an image sequence;
the judging unit is used for selecting the processing mode of the image to be detected according to the face detection result in the image sequence at the previous time;
the face recognition processing unit is used for carrying out face recognition processing on the image to be detected when no face region is detected in the image sequence at the present time;
the frame number judging unit is used for counting the detected frame images when a human face area is detected in the image sequence and judging whether the counting result exceeds a preset frame number;
when the counting result exceeds the preset frame number, the ROI is calculated on the image to be detected through the ROI area calculation unit, otherwise, the face tracking processing unit is used for carrying out face tracking processing on the image to be detected, and the frame number judgment unit is used for clearing the counting result to carry out the next round of counting;
and the output unit is used for outputting the human face area in the image to be detected according to the processing result.
11. A computer-readable storage medium, characterized by comprising a program executable by a processor to implement the method of any one of claims 1-9.
CN201810866324.1A 2018-08-01 2018-08-01 Face detection method, face detection device and storage medium Active CN109271848B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810866324.1A CN109271848B (en) 2018-08-01 2018-08-01 Face detection method, face detection device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810866324.1A CN109271848B (en) 2018-08-01 2018-08-01 Face detection method, face detection device and storage medium

Publications (2)

Publication Number Publication Date
CN109271848A CN109271848A (en) 2019-01-25
CN109271848B true CN109271848B (en) 2022-04-15

Family

ID=65152962

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810866324.1A Active CN109271848B (en) 2018-08-01 2018-08-01 Face detection method, face detection device and storage medium

Country Status (1)

Country Link
CN (1) CN109271848B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110348348B (en) * 2019-06-30 2021-08-31 华中科技大学 Quick identification method and early warning system for entrance identities of participants
TWI749370B (en) 2019-09-16 2021-12-11 緯創資通股份有限公司 Face recognition method and computer system using the same
KR102340988B1 (en) * 2019-10-04 2021-12-17 에스케이텔레콤 주식회사 Method and Apparatus for Detecting Objects from High Resolution Image
CN110991287A (en) * 2019-11-23 2020-04-10 深圳市恩钛控股有限公司 Real-time video stream face detection tracking method and detection tracking system
CN111339936A (en) * 2020-02-25 2020-06-26 杭州涂鸦信息技术有限公司 Face tracking method and system
CN111694980A (en) * 2020-06-13 2020-09-22 德沃康科技集团有限公司 Robust family child learning state visual supervision method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794264A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and system of real time detecting and continuous tracing human face in video frequency sequence
CN104036237A (en) * 2014-05-28 2014-09-10 南京大学 Detection method of rotating human face based on online prediction
CN106874867A (en) * 2017-02-14 2017-06-20 江苏科技大学 A kind of face self-adapting detecting and tracking for merging the colour of skin and profile screening
CN107967456A (en) * 2017-11-27 2018-04-27 电子科技大学 A kind of multiple neural network cascade identification face method based on face key point
CN108090918A (en) * 2018-02-12 2018-05-29 天津天地伟业信息系统集成有限公司 A kind of Real-time Human Face Tracking based on the twin network of the full convolution of depth
CN108197604A (en) * 2018-01-31 2018-06-22 上海敏识网络科技有限公司 Fast face positioning and tracing method based on embedded device
CN108229442A (en) * 2018-02-07 2018-06-29 西南科技大学 Face fast and stable detection method in image sequence based on MS-KCF

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1794264A (en) * 2005-12-31 2006-06-28 北京中星微电子有限公司 Method and system of real time detecting and continuous tracing human face in video frequency sequence
CN104036237A (en) * 2014-05-28 2014-09-10 南京大学 Detection method of rotating human face based on online prediction
CN106874867A (en) * 2017-02-14 2017-06-20 江苏科技大学 A kind of face self-adapting detecting and tracking for merging the colour of skin and profile screening
CN107967456A (en) * 2017-11-27 2018-04-27 电子科技大学 A kind of multiple neural network cascade identification face method based on face key point
CN108197604A (en) * 2018-01-31 2018-06-22 上海敏识网络科技有限公司 Fast face positioning and tracing method based on embedded device
CN108229442A (en) * 2018-02-07 2018-06-29 西南科技大学 Face fast and stable detection method in image sequence based on MS-KCF
CN108090918A (en) * 2018-02-12 2018-05-29 天津天地伟业信息系统集成有限公司 A kind of Real-time Human Face Tracking based on the twin network of the full convolution of depth

Also Published As

Publication number Publication date
CN109271848A (en) 2019-01-25

Similar Documents

Publication Publication Date Title
CN109271848B (en) Face detection method, face detection device and storage medium
Hall et al. Probabilistic object detection: Definition and evaluation
US20200167554A1 (en) Gesture Recognition Method, Apparatus, And Device
US10753881B2 (en) Methods and systems for crack detection
CN109670429B (en) Method and system for detecting multiple targets of human faces of surveillance videos based on instance segmentation
CN106960195B (en) Crowd counting method and device based on deep learning
US20200005022A1 (en) Method, terminal, and storage medium for tracking facial critical area
CN110334569B (en) Passenger flow volume in-out identification method, device, equipment and storage medium
Chen et al. Comprehensive regularization in a bi-directional predictive network for video anomaly detection
US10506174B2 (en) Information processing apparatus and method for identifying objects and instructing a capturing apparatus, and storage medium for performing the processes
CN108197604A (en) Fast face positioning and tracing method based on embedded device
AU2020272936B2 (en) Methods and systems for crack detection using a fully convolutional network
Sun et al. Small aerial target detection for airborne infrared detection systems using LightGBM and trajectory constraints
Zavan et al. Benchmarking parts based face processing in-the-wild for gender recognition and head pose estimation
KR101690050B1 (en) Intelligent video security system
CN111311602A (en) Lip image segmentation device and method for traditional Chinese medicine facial diagnosis
CN113780145A (en) Sperm morphology detection method, sperm morphology detection device, computer equipment and storage medium
JP6028972B2 (en) Image processing apparatus, image processing method, and image processing program
Marks et al. SIPEC: the deep-learning Swiss knife for behavioral data analysis
CN111667419A (en) Moving target ghost eliminating method and system based on Vibe algorithm
JP5958557B2 (en) Object recognition method and object recognition apparatus
CN110826495A (en) Body left and right limb consistency tracking and distinguishing method and system based on face orientation
Chen et al. SiamCPN: Visual tracking with the Siamese center-prediction network
CN114170625A (en) Context-aware and noise-robust pedestrian searching method
Wahyono et al. A Comparison of Deep Learning Methods for Vision-based Fire Detection in Surveillance System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant