CN108875624B - Face detection method based on multi-scale cascade dense connection neural network - Google Patents

Face detection method based on multi-scale cascade dense connection neural network Download PDF

Info

Publication number
CN108875624B
CN108875624B CN201810605067.6A CN201810605067A CN108875624B CN 108875624 B CN108875624 B CN 108875624B CN 201810605067 A CN201810605067 A CN 201810605067A CN 108875624 B CN108875624 B CN 108875624B
Authority
CN
China
Prior art keywords
network
face
dense connection
convolution
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810605067.6A
Other languages
Chinese (zh)
Other versions
CN108875624A (en
Inventor
秦华标
黄波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201810605067.6A priority Critical patent/CN108875624B/en
Publication of CN108875624A publication Critical patent/CN108875624A/en
Application granted granted Critical
Publication of CN108875624B publication Critical patent/CN108875624B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a face detection method based on a multi-scale cascade dense connection neural network, belongs to the field of image processing and computer vision, and is suitable for intelligent systems such as face recognition, facial expression recognition, driver fatigue detection and the like. The invention comprises a construction method of a regional nomination network and a construction method of a multi-level dense connection convolution network model, which specifically comprises the following steps: collecting face pictures marked with face rectangular frame (bounding box) information to form a training data set conforming to input conditions of each sub-network; constructing a cascade dense connection neural network with strong generalization capability; respectively training each sub-network by utilizing a training data set, and obtaining an integral network model; and finally, detecting the multi-pose human face in the picture by using the integral network model. According to the invention, a dense connection mode is introduced into the network, so that the network can fully extract the face characteristic information, and the accuracy of face detection under multiple postures is improved.

Description

Face detection method based on multi-scale cascade dense connection neural network
Technical Field
The invention belongs to the field of image processing and computer vision, and particularly relates to a face detection method based on a multi-scale cascade dense connection neural network.
Background
The human face image contains rich information, and the research and analysis of the human face image are important directions and research hotspots in the field of computer vision. For example, in various artificial intelligence applications such as face recognition, crowd monitoring, photography, man-machine interaction, fatigue driving and the like, face detection is the key first step in the technologies, and only when a face is detected, the later analysis and research can be valuable.
In recent decades, a large number of scholars have intensively studied multi-pose face detection algorithms, and generally, the multi-pose face detection algorithms are mainly classified into the following two categories: a traditional machine learning based method and a deep learning based method.
The traditional machine learning algorithm generally obtains a classifier through a large amount of sample training to judge whether the human face or the non-human face exists. In the testing phase, the most common approach is to use a sliding window algorithm. First, the input image is scaled to various sizes, creating an image pyramid. Then, for each position of each layer of image in the pyramid, a picture of a fixed size, called a window, is taken. Next, features are extracted in this window. And finally, judging whether the window is a human face or not by using a trained classifier. Generally, the number of windows to be classified by a face detection algorithm is very large, and for a picture with a resolution of 640 × 480, there are about hundreds of thousands of windows, and how to accurately process the windows in a short time is a problem that needs to be considered by each face detection algorithm. In addition, in the feature extraction process, the traditional machine learning algorithm extracts manual features, such as: haar (Haar) feature, Local Binary Pattern (LBP) feature, Histogram of gradients (HOG) feature. Because the prior knowledge of designers is added into the manual features, the accuracy rate of the manual features is higher only for faces under certain specific backgrounds, and the manual features are difficult to apply to complex conditions such as multi-pose faces in three-dimensional space. The method based on deep learning has a dominant position in the field of current face detection, and the main neural network architectures include a Convolutional Neural Network (CNN), a Deep Belief Network (DBN), and an Auto-encoder (Auto-encoder), wherein the convolutional neural network is used most successfully in face detection. For example, based on cascaded convolutional neural networks (Cascade CNNs) and multitask convolutional neural networks (MTCNN), the networks automatically extract stable human face features by adopting convolutional layers, and the detection effect is greatly improved compared with the traditional machine learning algorithm. However, the current deep learning-based face detection model is often driven by data, a training data set is fitted by using a network, the generalization performance is weak, and the face under multiple postures is difficult to detect under the condition that no multi-posture training data set participates in training.
Therefore, a multi-pose face detection algorithm with higher generalization performance needs to be provided, and the face detection rate can be improved under the condition that no multi-pose face data set participates in training.
Disclosure of Invention
The invention aims to solve the problem that face detection is easily influenced by posture change, and provides a face detection method based on a multi-scale cascade dense connection neural network. The invention designs a cascade dense connection network with stronger feature extraction capability and generalization capability, trains the network model by utilizing the collected and processed training data set, and finally detects the human face by utilizing the trained model, thereby realizing the algorithm which can achieve good effect on the human face under multiple postures.
The invention is realized by at least one of the following technical solutions.
A face detection method based on a multi-scale cascade dense connection neural network comprises a construction method of a regional nomination network and a construction method of a multi-level dense connection convolution network model:
the construction method of the regional nomination network comprises the following steps: performing score prediction and frame prediction possibly including a face region on a plurality of convolution layers of the regional nomination network; then eliminating the area blocks with the scores smaller than a set threshold value, and carrying out non-maximum value inhibition on the remaining area blocks to obtain the final area possibly containing the face; finally, the face area obtained by prediction is sent into a second-level dense connection convolution network;
the construction method of the multilevel dense connection convolution network model comprises the following steps: continuously extracting more abstract features of the human face by using the convolutional layers, and simultaneously connecting the features extracted by the lower convolutional layers with the features extracted by the higher convolutional layers; then, accessing a global average pooling layer in the last layer of convolution layer, and performing fine classification and frame regression on the face region predicted by the previous layer; and finally, the remaining face regions are sent to a third-level dense connection convolution network for more fine classification and frame regression, so that the final face regions are obtained through prediction.
Further, different convolution layers of the area nomination network are utilized to extract more candidate areas containing the human faces with high quality (namely, the human face area accounts for as much as possible in the candidate areas), and missing detection caused by too few extracted candidate areas is prevented; respectively connecting a classification layer and a regression layer to the last two convolution layers of the regional nomination network to predict the face region score and carry out frame regression; finally the elimination score is lower than the threshold value T1The remaining candidate frames are subjected to non-maximum suppression to obtain a final prediction result; t is1The value range is 0-1.
Furthermore, a global average pooling layer is introduced to replace a traditional full-connection layer to classify and regress the human face; the global average pooling layer is accessed after the last convolution layer of each level of dense connection network, the overall average value of each feature image output by the previous layer of convolution network is calculated, the local information of the human face is fully learned, and overfitting caused by introducing space structure information is avoided; and finally, after the average pooling layer, a multi-classification (softmax) layer is accessed to classify and regress the face region predicted by the previous stage.
Furthermore, the face features are extracted by constructing a cascaded convolution dense connection network and are subjected to fine classification and regression, a plurality of dense connection blocks can be arranged in each level of dense connection network, each dense connection block is composed of a plurality of convolution layers, and the convolution layers of the same dense connection block can generate feature maps with the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer; the second-level network and the third-level network respectively consist of dense connection convolution networks comprising two dense connection blocks and three dense connection blocks, and the face area of the first-level prediction is eliminated step by step and refined in position; the transition layer includes a convolutional layer and a pooling layer.
Further implemented, the face detection method based on the multi-scale cascade dense connection neural network comprises the following steps: (1) collecting the face picture marked with the face rectangular frame information to form an initial training data set D1By using D1Generating a sub-training data set D that conforms to a first-level network input format2(ii) a (2) Designing a region nomination network capable of extracting more high-quality candidate regions and utilizing a sub-training data set D2Training the sub-network model and then collecting the initial training data set D1Sending the sub-network model for detection, and generating training data D of the next stage from the detection result3(ii) a (3) Designing a cascade (two-stage) dense connection network with stronger feature extraction capability and generalization capability, and connecting D3Sending the network into the first stage of dense connection network to train and generate sub-network model, and then sending D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for detection, and generating a training data set D of a next-level dense connection network according to the detection result4Reuse of D4Training a second-stage cascaded dense connection network; (4) and detecting the multi-pose human face in the picture to be tested by utilizing the network model obtained by training.
Further, the step (1) includes: a face data set D1Sub-training data set D preprocessed to conform to first-level network input format in cascade network2The resolution is 12 × 12. The sub-training data set contains three types of training pictures: face images, partial face images, non-face images. The label information of the three types of pictures is made as follows: the face image is labeled as 1, the partial face image is labeled as-1, and the non-face image is labeled as 0. The face and part of the face image are also marked with face rectangular frame information, and the face rectangular frame information of the non-face image is marked with-1.
Further, the step (2) includes: and extracting more high-quality candidate regions containing human faces by using different convolution layers of the regional nomination network, and preventing missing detection caused by too few extracted candidate regions. The method comprises the steps that a classification layer and a regression layer are respectively connected to the last two convolution layers of a regional nomination network, and face region score prediction and frame regression are conducted; finally the elimination score is lower than the threshold value T1(T1The value range is 0-1, the candidate frame of 0.9) is taken, and the non-maximum value of the rest candidate frames is restrained, so that the final prediction result is obtained. Then using the preprocessed data set D2Training the area nomination network, and after the training is finished, D1Inputting the network for detection, and combining the rectangular frame of the face with D1Calculating an Intersection Over Unit (IOU) and an IOU (input over Unit) of the real face rectangular frame information of the corresponding picture>0.85 as a face sample, 0.55<IOU<0.7 labeled partial face samples, IOU<0.35 marking as non-human face sample, generating training of next level networkExercise data set D3,D3Has an image resolution of 24 × 24.
Further, the step (3) includes: constructing a cascaded convolution dense connection network to extract human face features and perform fine classification and regression, wherein each level of dense connection network can be provided with a plurality of dense connection blocks, each dense connection block is composed of a plurality of convolution layers, and the convolution layers of the same dense connection block can generate feature maps with the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer (a convolution layer and a pooling layer); and accessing the last convolution layer of the dense connection network to the global average pooling layer, and calculating the overall average value of each feature graph output by the previous convolution network, wherein the number of the feature graphs is consistent with the number of the classified categories. By using D3Training the dense connection network of the first stage, and after the training is completed, D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for cascade detection, and then generating a sub-training data set D4,D4The image resolution of (3) is 48 × 48, and the generation method is the same as in step (2). Finally, by D4A second level dense connectivity network is trained.
Further, the step (4) includes: and cascading the regional nomination network and the two-stage dense connection network to form a three-stage cascaded network whole. Then, pyramid scale conversion is carried out on a new picture, the conversion ratio is 0.709, the converted picture is input into a first-stage regional nomination network model, a large number of face classification scores and face rectangular frame regression vectors are generated, and the elimination score is lower than a threshold value T1(T1The value range is 0-1, the method takes the face rectangular frame of 0.9), and carries out non-maximum value inhibition on the remaining face rectangular frame so as to obtain a final prediction result; the predicted results are then input into a second level network model, again with a culling score below a threshold T2(T2The value range is 0-1, the method takes a face rectangular frame of 0.7), and then screening is carried out by utilizing a non-maximum suppression algorithmOverlapping the larger face rectangular frame; finally, inputting the prediction result into a third-level network model, outputting the score of the face classification and the face rectangular frame information, and eliminating the score lower than a threshold value T3(T3The value range is 0-1, the method takes 0.8) face rectangular frames, and then the non-maximum suppression algorithm is used for screening the face rectangular frames with larger overlap to obtain a final prediction result.
Compared with the prior art, the invention has the following advantages and effects: the invention can effectively prevent the missing detection of the human face by making the regional nomination network predict more candidate regions; meanwhile, a dense connection network with stronger feature extraction capability is introduced, and a global average pooling layer is used for replacing a full connection layer, so that the generalization capability of the network can be further improved. Therefore, the model of the invention has better effect under multiple postures.
Drawings
Fig. 1a and 1b are flow charts of a training phase and a testing phase, respectively.
Fig. 2a, 2b, and 2c are network configuration diagrams of three sub-networks, respectively.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings, but the practice of the present invention is not limited thereto. It is noted that the following processes, if not described in particular detail, are all realizable by one skilled in the art with reference to the prior art.
In the embodiment, the multi-pose face detection algorithm based on the multi-scale cascade dense connection neural network can overcome the influence caused by multi-pose to a certain extent.
In this embodiment, in the training phase, as shown in fig. 1a, the specific embodiment is as follows.
Step 1: firstly, a training subset D conforming to a first-level network input format is made2The resolution size is 12 × 12. The existing face data set D1Randomly intercepting three types of sub-image blocks: face image, partial face image, non-face image. The label information is produced as follows: the face image is labeled as 1, the partial face image is labeled as-1, and the non-face image is labeled as 0.The face and part of the face image are also marked with face rectangular frame information, and the rectangular frame information of the non-face image is marked with-1. Then a sub-training data set D of 12 x 12 size is set2Inputting the parameters into a first-level network (a regional nomination network), updating the parameters of the network by adopting a random gradient descent method, performing 22 total iterations (all training data sets after traversing are called as one iteration), setting the initial learning rate to be 0.01, when the training is completed to the 6 th round, setting the learning rate to be 0.001, and when the training reaches the 16 th round, setting the learning rate to be 0.0001 until the training is completed.
The objective function of the area nomination network is as follows:
Figure BDA0001694109110000051
wherein N represents the number of training samples, j is 1 represents a classification task, j is 2 represents a boundary box regression task, i represents the ith sample, and alphajThe weights representing the different tasks are represented by,
Figure BDA0001694109110000058
represents a sample xiIs indicative of the type of the light source,
Figure BDA0001694109110000057
the loss function representing different tasks is (2) the loss of the classification task and (3) the loss of the bounding box regression task.
Figure BDA0001694109110000052
Wherein
Figure BDA0001694109110000053
Represents a sample xiThe value of (a) is 0 or 1, 0 represents a non-face, 1 represents a face, p represents a faceiRepresenting network sample xiThe probability of a face is determined.
Figure BDA0001694109110000054
I in formula (3) represents the ith sample,
Figure BDA0001694109110000055
the bounding box position increment for each candidate window representing the network prediction,
Figure BDA0001694109110000056
the real bounding box position increment is represented by a four-dimensional real number vector.
Step 2: generating training data D of a second-level network by using the obtained regional nomination network model3And training the second level network. Firstly, D is1Sending the data to a regional nomination network for detection to obtain the prediction of the face score and the prediction of a face rectangular frame; culling score below threshold T1(T1The value range is 0-1, the method takes 0.9) face rectangular frame, carries out non-maximum value inhibition on the rest face rectangular frame to obtain the final prediction result, and carries out non-maximum value inhibition on the face rectangular frame and D in the prediction result1Calculating IOU (input output Unit) by using real face rectangular frame information of corresponding picture>0.85 as a face sample, 0.55<IOU<0.7 labeled partial face samples, IOU<0.35 marking as a non-human face sample, and generating a training data set D of a next-level network3,D3Has an image resolution of 24 × 24. Training data set D to be generated3And sending to a third-level network for training, wherein 18 training rounds are performed totally, the initial learning rate is set to be 0.01, when the training round is reached to the 6 th round, the learning rate is set to be 0.001, and when the training round is reached to the 12 th round, the learning rate is set to be 0.0001 until the training is finished. The same loss function is used as in the first stage network.
And step 3: generating a training data set D of a third-level network by using the model trained in the first two levels4And finishing the training of the third-level network. Will D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for cascade detection, and generating a sub-training data set D by adopting the same method as the step 24,D4Has an image resolution of 48 × 48. Finally, by D4Training the second-level dense connection network for 18 training rounds, setting the initial learning rate to be 0.01, setting the learning rate to be 0.001 when the training round is 6, and setting the learning rate to be 0.0001 when the training round is 12 until the training is finished. The same loss function is used as in the first stage network.
In this embodiment, in the testing stage, as shown in fig. 1b, a new picture is subjected to pyramid scale transformation with a transformation ratio of 0.709, the transformed picture is input into the first-stage regional nomination network model to generate a large number of face classification scores and face rectangular frame regression vectors, and the culling score is lower than the threshold T1(T1The value range is 0-1, the method takes the face rectangular frame of 0.9), and carries out non-maximum value inhibition on the remaining face rectangular frame so as to obtain a final prediction result; the predicted results are then input into a second level network model, again with a culling score below a threshold T2(T2The value range is 0-1, the method takes 0.7) of the face rectangular frame, and then the non-maximum value suppression algorithm is utilized to screen the face rectangular frame with larger overlap; finally, inputting the prediction result into a third-level network model, outputting face classification scores and face rectangular frame information, and eliminating scores lower than a threshold value T3(T3The value range is 0-1, the method takes 0.8) face rectangular frames, and then the non-maximum suppression algorithm is used for screening the face rectangular frames with larger overlap to obtain a final prediction result.

Claims (7)

1. A face detection method based on a multi-scale cascade dense connection neural network is characterized by comprising a construction method of a regional nomination network and a construction method of a multi-level dense connection convolution network model:
the construction method of the regional nomination network comprises the following steps: performing score prediction and frame prediction possibly including a face region on a plurality of convolution layers of the regional nomination network; then eliminating the area blocks with the scores smaller than a set threshold value, and carrying out non-maximum value inhibition on the remaining area blocks to obtain the final area possibly containing the face; finally, the face area obtained by prediction is sent into a second-level dense connection convolution network;
the construction method of the multilevel dense connection convolution network model comprises the following steps: continuously extracting more abstract features of the human face by using the convolutional layers, and simultaneously connecting the features extracted by the lower convolutional layers with the features extracted by the higher convolutional layers; then, accessing a global average pooling layer in the last layer of convolution layer, and performing fine classification and frame regression on the face region predicted by the previous layer; finally, the remaining face regions are sent to a third-level dense connection convolution network for more fine classification and frame regression, so that the final face regions are obtained through prediction;
the method specifically comprises the following steps:
step (1), collecting the face picture marked with face rectangular frame information to form an initial training data set D1By using D1Generating a sub-training data set D that conforms to a first-level network input format2
Step (2), designing a region nomination network model capable of extracting more high-quality candidate regions, and utilizing a sub-training data set D2Training the region nomination network model, and then collecting an initial training data set D1Sending the data into the region nomination network model for detection, and generating training data D of the next level according to the detection result3(ii) a The method specifically comprises the following steps: extracting more high-quality candidate regions containing human faces by using different convolution layers of the region nomination network, and preventing missing detection caused by too few extracted candidate regions; respectively connecting a classification layer and a regression layer to the last two convolution layers of the regional nomination network to predict the face region score and carry out frame regression; finally the elimination score is lower than the threshold value T1The remaining candidate frames are subjected to non-maximum suppression to obtain a final prediction result; then using the preprocessed data set D2Training the area nomination network, and after the training is finished, D1Inputting the area nomination network for detection, and combining the face rectangle frame and D in the detection result1Calculating the cross-over ratio and the cross-over ratio according to the real face rectangular frame information of the corresponding picture>0.85 as a face sample, 0.55<Cross ratio of<0.7, labeled as partial face samples, cross-over ratio<0.35 marking as a non-human face sample, generating a next level networkTraining data set D of3,D3The image resolution of (a) is 24 × 24;
step (3), designing a cascading dense connection network with stronger feature extraction capability and generalization capability, and connecting D3Sending the network into the first stage of dense connection network to train and generate sub-network model, and then sending D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for detection, and generating a training data set D of a next-level dense connection network according to the detection result4Reuse of D4Training a second-stage cascaded dense connection network;
and (4) detecting the multi-pose human face in the picture to be tested by using the network model obtained by training.
2. The face detection method based on the multi-scale cascade dense connection neural network as claimed in claim 1, characterized in that more high-quality candidate regions containing faces are extracted by using different convolution layers of the regional nomination network, thereby preventing missing detection caused by too few extracted candidate regions; respectively connecting a classification layer and a regression layer to the last two convolution layers of the regional nomination network to predict the face region score and carry out frame regression; finally the elimination score is lower than the threshold value T1The remaining candidate frames are subjected to non-maximum suppression to obtain a final prediction result; t is1The value range is 0-1.
3. The face detection method based on the multi-scale cascaded densely-connected neural network as claimed in claim 1, characterized in that a global average pooling layer is introduced to replace a traditional full-connected layer for face classification and regression; the global average pooling layer is accessed after the last convolution layer of each level of dense connection network, the overall average value of each feature image output by the previous layer of convolution network is calculated, the local information of the human face is fully learned, and overfitting caused by introducing space structure information is avoided; and finally, after the average pooling layer, a softmax layer is accessed to classify and regress the face region predicted by the previous stage.
4. The face detection method based on the multi-scale cascaded dense connection neural network as claimed in claim 3, characterized in that a cascaded convolutional dense connection network is constructed to extract face features and perform fine classification and regression, there can be a plurality of dense connection blocks in each level of dense connection network, each dense connection block is composed of a plurality of convolutional layers, convolutional layers of the same dense connection block must be able to generate feature maps of the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer; the second-level network and the third-level network respectively consist of dense connection convolution networks comprising two dense connection blocks and three dense connection blocks, and the face area of the first-level prediction is eliminated step by step and refined in position; the transition layer includes a convolutional layer and a pooling layer.
5. The face detection method based on the multi-scale cascaded dense-connected neural network as claimed in claim 1, wherein the step (1) specifically comprises: a face data set D1Sub-training data set D preprocessed to conform to first-level network input format in cascade network2The resolution is 12 × 12; the sub-training data set D2There are three types of training pictures: face images, partial face images, non-face images; label information corresponding to the three types of pictures is produced as follows: the face image is marked as 1, part of the face image is marked as-1, and the non-face image is marked as 0; the face and part of the face image are also marked with face rectangular frame information, and the face rectangular frame information of the non-face image is marked with-1.
6. The face detection method based on the multi-scale cascaded densely-connected neural network as claimed in claim 1, wherein the step (3) specifically comprises: constructing a cascade convolution dense connection network to extract the human face features and carrying out fine classification and regression, wherein each level of dense connection network is provided with a plurality of dense connection blocks, and each dense connection block is constructed by a plurality of convolution layersTherefore, the convolutional layers of the same dense connection block must be capable of generating feature maps of the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer; accessing the last convolution layer of the dense connection network to a global average pooling layer, and calculating an overall average value of each feature graph fe feature graph output by the previous convolution network, wherein the number of the feature graphs is consistent with the number of classified categories; by using D3Training the dense connection network of the first stage, and after the training is completed, D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for cascade detection, and then generating a sub-training data set D4,D4The resolution of the image of (3) is 48 × 48, and the generation method is the same as that of the step (2); finally, by D4A second level dense connectivity network is trained.
7. The face detection method based on the multi-scale cascaded densely-connected neural network as claimed in claim 1, wherein the step (4) specifically comprises: cascading the regional nomination network and the two-stage dense connection network to form a three-stage cascaded network whole; then, pyramid scale conversion is carried out on a new picture, the conversion ratio is 0.709, the converted picture is input into a first-stage regional nomination network model, a large number of face classification scores and face rectangular frame regression vectors are generated, and the elimination score is lower than a threshold value T1The remaining face rectangular frames are subjected to non-maximum value suppression to obtain a final prediction result; the predicted results are then input into a second level network model, again with a culling score below a threshold T2The human face rectangular frame is screened by using a non-maximum suppression algorithm; finally, inputting the prediction result into a third-level network model, outputting the score of the face rectangular frame and the face rectangular frame information, and eliminating the score lower than a threshold value T3The face rectangular frame is screened by using a non-maximum suppression algorithm to obtain a final prediction result T1、T2、T3The value of (1) is 0-1.
CN201810605067.6A 2018-06-13 2018-06-13 Face detection method based on multi-scale cascade dense connection neural network Active CN108875624B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810605067.6A CN108875624B (en) 2018-06-13 2018-06-13 Face detection method based on multi-scale cascade dense connection neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810605067.6A CN108875624B (en) 2018-06-13 2018-06-13 Face detection method based on multi-scale cascade dense connection neural network

Publications (2)

Publication Number Publication Date
CN108875624A CN108875624A (en) 2018-11-23
CN108875624B true CN108875624B (en) 2022-03-25

Family

ID=64338103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810605067.6A Active CN108875624B (en) 2018-06-13 2018-06-13 Face detection method based on multi-scale cascade dense connection neural network

Country Status (1)

Country Link
CN (1) CN108875624B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543648B (en) * 2018-11-30 2022-06-17 公安部交通管理科学研究所 Method for extracting face in car passing picture
CN109886286B (en) * 2019-01-03 2021-07-23 武汉精测电子集团股份有限公司 Target detection method based on cascade detector, target detection model and system
CN110059584B (en) * 2019-03-28 2023-06-02 中山大学 Event naming method combining boundary distribution and correction
CN110163102A (en) * 2019-04-18 2019-08-23 麦克奥迪(厦门)医疗诊断系统有限公司 A kind of cervical cell image classification recognition methods based on convolutional neural networks
CN110335244A (en) * 2019-05-17 2019-10-15 杭州数据点金科技有限公司 A kind of tire X-ray defect detection method based on more Iterative classification devices
CN111986278B (en) 2019-05-22 2024-02-06 富士通株式会社 Image encoding device, probability model generating device, and image compression system
CN112001205B (en) * 2019-05-27 2023-10-31 北京君正集成电路股份有限公司 Network model sample acquisition method for secondary face detection
CN111027382B (en) * 2019-11-06 2023-06-23 华中师范大学 Attention mechanism-based lightweight face detection method and model
CN110866484B (en) * 2019-11-11 2022-09-09 珠海全志科技股份有限公司 Driver face detection method, computer device and computer readable storage medium
CN111080576B (en) * 2019-11-26 2023-09-26 京东科技信息技术有限公司 Key point detection method and device and storage medium
CN110859624A (en) * 2019-12-11 2020-03-06 北京航空航天大学 Brain age deep learning prediction system based on structural magnetic resonance image
CN113051960A (en) * 2019-12-26 2021-06-29 深圳市光鉴科技有限公司 Depth map face detection method, system, device and storage medium
CN111274886B (en) * 2020-01-13 2023-09-19 天地伟业技术有限公司 Deep learning-based pedestrian red light running illegal behavior analysis method and system
CN111368707B (en) * 2020-03-02 2023-04-07 佛山科学技术学院 Face detection method, system, device and medium based on feature pyramid and dense block
CN111428661A (en) * 2020-03-28 2020-07-17 北京工业大学 Method for processing face image based on intelligent human-computer interaction
CN113569991B (en) * 2021-08-26 2024-05-28 深圳市捷顺科技实业股份有限公司 Person evidence comparison model training method, computer equipment and computer storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107247956B (en) * 2016-10-09 2020-03-27 成都快眼科技有限公司 Rapid target detection method based on grid judgment
CN107748858A (en) * 2017-06-15 2018-03-02 华南理工大学 A kind of multi-pose eye locating method based on concatenated convolutional neutral net
CN107688786A (en) * 2017-08-30 2018-02-13 南京理工大学 A kind of method for detecting human face based on concatenated convolutional neutral net
CN108090918A (en) * 2018-02-12 2018-05-29 天津天地伟业信息系统集成有限公司 A kind of Real-time Human Face Tracking based on the twin network of the full convolution of depth

Also Published As

Publication number Publication date
CN108875624A (en) 2018-11-23

Similar Documents

Publication Publication Date Title
CN108875624B (en) Face detection method based on multi-scale cascade dense connection neural network
Gupta et al. Sequential modeling of deep features for breast cancer histopathological image classification
CN108416266B (en) Method for rapidly identifying video behaviors by extracting moving object through optical flow
US8379994B2 (en) Digital image analysis utilizing multiple human labels
WO2018052587A1 (en) Method and system for cell image segmentation using multi-stage convolutional neural networks
Wu et al. U-GAN: Generative adversarial networks with U-Net for retinal vessel segmentation
CN104504365A (en) System and method for smiling face recognition in video sequence
CN110414367B (en) Time sequence behavior detection method based on GAN and SSN
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
CN111126115A (en) Violence sorting behavior identification method and device
CN106971158A (en) A kind of pedestrian detection method based on CoLBP symbiosis feature Yu GSS features
CN113095199B (en) High-speed pedestrian identification method and device
CN115240024A (en) Method and system for segmenting extraterrestrial pictures by combining self-supervised learning and semi-supervised learning
CN110728238A (en) Personnel re-detection method of fusion type neural network
CN118212572A (en) Road damage detection method based on improvement YOLOv7
CN113205060A (en) Human body action detection method adopting circulatory neural network to judge according to bone morphology
CN113221683A (en) Expression recognition method based on CNN model in teaching scene
CN113221667A (en) Face and mask attribute classification method and system based on deep learning
Fan et al. BFNet: Brain-like feedback network for object detection under severe weather
CN112487926A (en) Scenic spot feeding behavior identification method based on space-time diagram convolutional network
CN112418358A (en) Vehicle multi-attribute classification method for strengthening deep fusion network
Han et al. Feature fusion and adversary occlusion networks for object detection
CN111401209A (en) Action recognition method based on deep learning
Shi et al. SSFD: A face detector using a single-scale feature map
Alsaedi et al. Design and Simulation of Smart Parking System Using Image Segmentation and CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant