CN108875624B - Face detection method based on multi-scale cascade dense connection neural network - Google Patents
Face detection method based on multi-scale cascade dense connection neural network Download PDFInfo
- Publication number
- CN108875624B CN108875624B CN201810605067.6A CN201810605067A CN108875624B CN 108875624 B CN108875624 B CN 108875624B CN 201810605067 A CN201810605067 A CN 201810605067A CN 108875624 B CN108875624 B CN 108875624B
- Authority
- CN
- China
- Prior art keywords
- network
- face
- dense connection
- convolution
- layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a face detection method based on a multi-scale cascade dense connection neural network, belongs to the field of image processing and computer vision, and is suitable for intelligent systems such as face recognition, facial expression recognition, driver fatigue detection and the like. The invention comprises a construction method of a regional nomination network and a construction method of a multi-level dense connection convolution network model, which specifically comprises the following steps: collecting face pictures marked with face rectangular frame (bounding box) information to form a training data set conforming to input conditions of each sub-network; constructing a cascade dense connection neural network with strong generalization capability; respectively training each sub-network by utilizing a training data set, and obtaining an integral network model; and finally, detecting the multi-pose human face in the picture by using the integral network model. According to the invention, a dense connection mode is introduced into the network, so that the network can fully extract the face characteristic information, and the accuracy of face detection under multiple postures is improved.
Description
Technical Field
The invention belongs to the field of image processing and computer vision, and particularly relates to a face detection method based on a multi-scale cascade dense connection neural network.
Background
The human face image contains rich information, and the research and analysis of the human face image are important directions and research hotspots in the field of computer vision. For example, in various artificial intelligence applications such as face recognition, crowd monitoring, photography, man-machine interaction, fatigue driving and the like, face detection is the key first step in the technologies, and only when a face is detected, the later analysis and research can be valuable.
In recent decades, a large number of scholars have intensively studied multi-pose face detection algorithms, and generally, the multi-pose face detection algorithms are mainly classified into the following two categories: a traditional machine learning based method and a deep learning based method.
The traditional machine learning algorithm generally obtains a classifier through a large amount of sample training to judge whether the human face or the non-human face exists. In the testing phase, the most common approach is to use a sliding window algorithm. First, the input image is scaled to various sizes, creating an image pyramid. Then, for each position of each layer of image in the pyramid, a picture of a fixed size, called a window, is taken. Next, features are extracted in this window. And finally, judging whether the window is a human face or not by using a trained classifier. Generally, the number of windows to be classified by a face detection algorithm is very large, and for a picture with a resolution of 640 × 480, there are about hundreds of thousands of windows, and how to accurately process the windows in a short time is a problem that needs to be considered by each face detection algorithm. In addition, in the feature extraction process, the traditional machine learning algorithm extracts manual features, such as: haar (Haar) feature, Local Binary Pattern (LBP) feature, Histogram of gradients (HOG) feature. Because the prior knowledge of designers is added into the manual features, the accuracy rate of the manual features is higher only for faces under certain specific backgrounds, and the manual features are difficult to apply to complex conditions such as multi-pose faces in three-dimensional space. The method based on deep learning has a dominant position in the field of current face detection, and the main neural network architectures include a Convolutional Neural Network (CNN), a Deep Belief Network (DBN), and an Auto-encoder (Auto-encoder), wherein the convolutional neural network is used most successfully in face detection. For example, based on cascaded convolutional neural networks (Cascade CNNs) and multitask convolutional neural networks (MTCNN), the networks automatically extract stable human face features by adopting convolutional layers, and the detection effect is greatly improved compared with the traditional machine learning algorithm. However, the current deep learning-based face detection model is often driven by data, a training data set is fitted by using a network, the generalization performance is weak, and the face under multiple postures is difficult to detect under the condition that no multi-posture training data set participates in training.
Therefore, a multi-pose face detection algorithm with higher generalization performance needs to be provided, and the face detection rate can be improved under the condition that no multi-pose face data set participates in training.
Disclosure of Invention
The invention aims to solve the problem that face detection is easily influenced by posture change, and provides a face detection method based on a multi-scale cascade dense connection neural network. The invention designs a cascade dense connection network with stronger feature extraction capability and generalization capability, trains the network model by utilizing the collected and processed training data set, and finally detects the human face by utilizing the trained model, thereby realizing the algorithm which can achieve good effect on the human face under multiple postures.
The invention is realized by at least one of the following technical solutions.
A face detection method based on a multi-scale cascade dense connection neural network comprises a construction method of a regional nomination network and a construction method of a multi-level dense connection convolution network model:
the construction method of the regional nomination network comprises the following steps: performing score prediction and frame prediction possibly including a face region on a plurality of convolution layers of the regional nomination network; then eliminating the area blocks with the scores smaller than a set threshold value, and carrying out non-maximum value inhibition on the remaining area blocks to obtain the final area possibly containing the face; finally, the face area obtained by prediction is sent into a second-level dense connection convolution network;
the construction method of the multilevel dense connection convolution network model comprises the following steps: continuously extracting more abstract features of the human face by using the convolutional layers, and simultaneously connecting the features extracted by the lower convolutional layers with the features extracted by the higher convolutional layers; then, accessing a global average pooling layer in the last layer of convolution layer, and performing fine classification and frame regression on the face region predicted by the previous layer; and finally, the remaining face regions are sent to a third-level dense connection convolution network for more fine classification and frame regression, so that the final face regions are obtained through prediction.
Further, different convolution layers of the area nomination network are utilized to extract more candidate areas containing the human faces with high quality (namely, the human face area accounts for as much as possible in the candidate areas), and missing detection caused by too few extracted candidate areas is prevented; respectively connecting a classification layer and a regression layer to the last two convolution layers of the regional nomination network to predict the face region score and carry out frame regression; finally the elimination score is lower than the threshold value T1The remaining candidate frames are subjected to non-maximum suppression to obtain a final prediction result; t is1The value range is 0-1.
Furthermore, a global average pooling layer is introduced to replace a traditional full-connection layer to classify and regress the human face; the global average pooling layer is accessed after the last convolution layer of each level of dense connection network, the overall average value of each feature image output by the previous layer of convolution network is calculated, the local information of the human face is fully learned, and overfitting caused by introducing space structure information is avoided; and finally, after the average pooling layer, a multi-classification (softmax) layer is accessed to classify and regress the face region predicted by the previous stage.
Furthermore, the face features are extracted by constructing a cascaded convolution dense connection network and are subjected to fine classification and regression, a plurality of dense connection blocks can be arranged in each level of dense connection network, each dense connection block is composed of a plurality of convolution layers, and the convolution layers of the same dense connection block can generate feature maps with the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer; the second-level network and the third-level network respectively consist of dense connection convolution networks comprising two dense connection blocks and three dense connection blocks, and the face area of the first-level prediction is eliminated step by step and refined in position; the transition layer includes a convolutional layer and a pooling layer.
Further implemented, the face detection method based on the multi-scale cascade dense connection neural network comprises the following steps: (1) collecting the face picture marked with the face rectangular frame information to form an initial training data set D1By using D1Generating a sub-training data set D that conforms to a first-level network input format2(ii) a (2) Designing a region nomination network capable of extracting more high-quality candidate regions and utilizing a sub-training data set D2Training the sub-network model and then collecting the initial training data set D1Sending the sub-network model for detection, and generating training data D of the next stage from the detection result3(ii) a (3) Designing a cascade (two-stage) dense connection network with stronger feature extraction capability and generalization capability, and connecting D3Sending the network into the first stage of dense connection network to train and generate sub-network model, and then sending D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for detection, and generating a training data set D of a next-level dense connection network according to the detection result4Reuse of D4Training a second-stage cascaded dense connection network; (4) and detecting the multi-pose human face in the picture to be tested by utilizing the network model obtained by training.
Further, the step (1) includes: a face data set D1Sub-training data set D preprocessed to conform to first-level network input format in cascade network2The resolution is 12 × 12. The sub-training data set contains three types of training pictures: face images, partial face images, non-face images. The label information of the three types of pictures is made as follows: the face image is labeled as 1, the partial face image is labeled as-1, and the non-face image is labeled as 0. The face and part of the face image are also marked with face rectangular frame information, and the face rectangular frame information of the non-face image is marked with-1.
Further, the step (2) includes: and extracting more high-quality candidate regions containing human faces by using different convolution layers of the regional nomination network, and preventing missing detection caused by too few extracted candidate regions. The method comprises the steps that a classification layer and a regression layer are respectively connected to the last two convolution layers of a regional nomination network, and face region score prediction and frame regression are conducted; finally the elimination score is lower than the threshold value T1(T1The value range is 0-1, the candidate frame of 0.9) is taken, and the non-maximum value of the rest candidate frames is restrained, so that the final prediction result is obtained. Then using the preprocessed data set D2Training the area nomination network, and after the training is finished, D1Inputting the network for detection, and combining the rectangular frame of the face with D1Calculating an Intersection Over Unit (IOU) and an IOU (input over Unit) of the real face rectangular frame information of the corresponding picture>0.85 as a face sample, 0.55<IOU<0.7 labeled partial face samples, IOU<0.35 marking as non-human face sample, generating training of next level networkExercise data set D3,D3Has an image resolution of 24 × 24.
Further, the step (3) includes: constructing a cascaded convolution dense connection network to extract human face features and perform fine classification and regression, wherein each level of dense connection network can be provided with a plurality of dense connection blocks, each dense connection block is composed of a plurality of convolution layers, and the convolution layers of the same dense connection block can generate feature maps with the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer (a convolution layer and a pooling layer); and accessing the last convolution layer of the dense connection network to the global average pooling layer, and calculating the overall average value of each feature graph output by the previous convolution network, wherein the number of the feature graphs is consistent with the number of the classified categories. By using D3Training the dense connection network of the first stage, and after the training is completed, D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for cascade detection, and then generating a sub-training data set D4,D4The image resolution of (3) is 48 × 48, and the generation method is the same as in step (2). Finally, by D4A second level dense connectivity network is trained.
Further, the step (4) includes: and cascading the regional nomination network and the two-stage dense connection network to form a three-stage cascaded network whole. Then, pyramid scale conversion is carried out on a new picture, the conversion ratio is 0.709, the converted picture is input into a first-stage regional nomination network model, a large number of face classification scores and face rectangular frame regression vectors are generated, and the elimination score is lower than a threshold value T1(T1The value range is 0-1, the method takes the face rectangular frame of 0.9), and carries out non-maximum value inhibition on the remaining face rectangular frame so as to obtain a final prediction result; the predicted results are then input into a second level network model, again with a culling score below a threshold T2(T2The value range is 0-1, the method takes a face rectangular frame of 0.7), and then screening is carried out by utilizing a non-maximum suppression algorithmOverlapping the larger face rectangular frame; finally, inputting the prediction result into a third-level network model, outputting the score of the face classification and the face rectangular frame information, and eliminating the score lower than a threshold value T3(T3The value range is 0-1, the method takes 0.8) face rectangular frames, and then the non-maximum suppression algorithm is used for screening the face rectangular frames with larger overlap to obtain a final prediction result.
Compared with the prior art, the invention has the following advantages and effects: the invention can effectively prevent the missing detection of the human face by making the regional nomination network predict more candidate regions; meanwhile, a dense connection network with stronger feature extraction capability is introduced, and a global average pooling layer is used for replacing a full connection layer, so that the generalization capability of the network can be further improved. Therefore, the model of the invention has better effect under multiple postures.
Drawings
Fig. 1a and 1b are flow charts of a training phase and a testing phase, respectively.
Fig. 2a, 2b, and 2c are network configuration diagrams of three sub-networks, respectively.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings, but the practice of the present invention is not limited thereto. It is noted that the following processes, if not described in particular detail, are all realizable by one skilled in the art with reference to the prior art.
In the embodiment, the multi-pose face detection algorithm based on the multi-scale cascade dense connection neural network can overcome the influence caused by multi-pose to a certain extent.
In this embodiment, in the training phase, as shown in fig. 1a, the specific embodiment is as follows.
Step 1: firstly, a training subset D conforming to a first-level network input format is made2The resolution size is 12 × 12. The existing face data set D1Randomly intercepting three types of sub-image blocks: face image, partial face image, non-face image. The label information is produced as follows: the face image is labeled as 1, the partial face image is labeled as-1, and the non-face image is labeled as 0.The face and part of the face image are also marked with face rectangular frame information, and the rectangular frame information of the non-face image is marked with-1. Then a sub-training data set D of 12 x 12 size is set2Inputting the parameters into a first-level network (a regional nomination network), updating the parameters of the network by adopting a random gradient descent method, performing 22 total iterations (all training data sets after traversing are called as one iteration), setting the initial learning rate to be 0.01, when the training is completed to the 6 th round, setting the learning rate to be 0.001, and when the training reaches the 16 th round, setting the learning rate to be 0.0001 until the training is completed.
The objective function of the area nomination network is as follows:
wherein N represents the number of training samples, j is 1 represents a classification task, j is 2 represents a boundary box regression task, i represents the ith sample, and alphajThe weights representing the different tasks are represented by,represents a sample xiIs indicative of the type of the light source,the loss function representing different tasks is (2) the loss of the classification task and (3) the loss of the bounding box regression task.
WhereinRepresents a sample xiThe value of (a) is 0 or 1, 0 represents a non-face, 1 represents a face, p represents a faceiRepresenting network sample xiThe probability of a face is determined.
I in formula (3) represents the ith sample,the bounding box position increment for each candidate window representing the network prediction,the real bounding box position increment is represented by a four-dimensional real number vector.
Step 2: generating training data D of a second-level network by using the obtained regional nomination network model3And training the second level network. Firstly, D is1Sending the data to a regional nomination network for detection to obtain the prediction of the face score and the prediction of a face rectangular frame; culling score below threshold T1(T1The value range is 0-1, the method takes 0.9) face rectangular frame, carries out non-maximum value inhibition on the rest face rectangular frame to obtain the final prediction result, and carries out non-maximum value inhibition on the face rectangular frame and D in the prediction result1Calculating IOU (input output Unit) by using real face rectangular frame information of corresponding picture>0.85 as a face sample, 0.55<IOU<0.7 labeled partial face samples, IOU<0.35 marking as a non-human face sample, and generating a training data set D of a next-level network3,D3Has an image resolution of 24 × 24. Training data set D to be generated3And sending to a third-level network for training, wherein 18 training rounds are performed totally, the initial learning rate is set to be 0.01, when the training round is reached to the 6 th round, the learning rate is set to be 0.001, and when the training round is reached to the 12 th round, the learning rate is set to be 0.0001 until the training is finished. The same loss function is used as in the first stage network.
And step 3: generating a training data set D of a third-level network by using the model trained in the first two levels4And finishing the training of the third-level network. Will D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for cascade detection, and generating a sub-training data set D by adopting the same method as the step 24,D4Has an image resolution of 48 × 48. Finally, by D4Training the second-level dense connection network for 18 training rounds, setting the initial learning rate to be 0.01, setting the learning rate to be 0.001 when the training round is 6, and setting the learning rate to be 0.0001 when the training round is 12 until the training is finished. The same loss function is used as in the first stage network.
In this embodiment, in the testing stage, as shown in fig. 1b, a new picture is subjected to pyramid scale transformation with a transformation ratio of 0.709, the transformed picture is input into the first-stage regional nomination network model to generate a large number of face classification scores and face rectangular frame regression vectors, and the culling score is lower than the threshold T1(T1The value range is 0-1, the method takes the face rectangular frame of 0.9), and carries out non-maximum value inhibition on the remaining face rectangular frame so as to obtain a final prediction result; the predicted results are then input into a second level network model, again with a culling score below a threshold T2(T2The value range is 0-1, the method takes 0.7) of the face rectangular frame, and then the non-maximum value suppression algorithm is utilized to screen the face rectangular frame with larger overlap; finally, inputting the prediction result into a third-level network model, outputting face classification scores and face rectangular frame information, and eliminating scores lower than a threshold value T3(T3The value range is 0-1, the method takes 0.8) face rectangular frames, and then the non-maximum suppression algorithm is used for screening the face rectangular frames with larger overlap to obtain a final prediction result.
Claims (7)
1. A face detection method based on a multi-scale cascade dense connection neural network is characterized by comprising a construction method of a regional nomination network and a construction method of a multi-level dense connection convolution network model:
the construction method of the regional nomination network comprises the following steps: performing score prediction and frame prediction possibly including a face region on a plurality of convolution layers of the regional nomination network; then eliminating the area blocks with the scores smaller than a set threshold value, and carrying out non-maximum value inhibition on the remaining area blocks to obtain the final area possibly containing the face; finally, the face area obtained by prediction is sent into a second-level dense connection convolution network;
the construction method of the multilevel dense connection convolution network model comprises the following steps: continuously extracting more abstract features of the human face by using the convolutional layers, and simultaneously connecting the features extracted by the lower convolutional layers with the features extracted by the higher convolutional layers; then, accessing a global average pooling layer in the last layer of convolution layer, and performing fine classification and frame regression on the face region predicted by the previous layer; finally, the remaining face regions are sent to a third-level dense connection convolution network for more fine classification and frame regression, so that the final face regions are obtained through prediction;
the method specifically comprises the following steps:
step (1), collecting the face picture marked with face rectangular frame information to form an initial training data set D1By using D1Generating a sub-training data set D that conforms to a first-level network input format2;
Step (2), designing a region nomination network model capable of extracting more high-quality candidate regions, and utilizing a sub-training data set D2Training the region nomination network model, and then collecting an initial training data set D1Sending the data into the region nomination network model for detection, and generating training data D of the next level according to the detection result3(ii) a The method specifically comprises the following steps: extracting more high-quality candidate regions containing human faces by using different convolution layers of the region nomination network, and preventing missing detection caused by too few extracted candidate regions; respectively connecting a classification layer and a regression layer to the last two convolution layers of the regional nomination network to predict the face region score and carry out frame regression; finally the elimination score is lower than the threshold value T1The remaining candidate frames are subjected to non-maximum suppression to obtain a final prediction result; then using the preprocessed data set D2Training the area nomination network, and after the training is finished, D1Inputting the area nomination network for detection, and combining the face rectangle frame and D in the detection result1Calculating the cross-over ratio and the cross-over ratio according to the real face rectangular frame information of the corresponding picture>0.85 as a face sample, 0.55<Cross ratio of<0.7, labeled as partial face samples, cross-over ratio<0.35 marking as a non-human face sample, generating a next level networkTraining data set D of3,D3The image resolution of (a) is 24 × 24;
step (3), designing a cascading dense connection network with stronger feature extraction capability and generalization capability, and connecting D3Sending the network into the first stage of dense connection network to train and generate sub-network model, and then sending D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for detection, and generating a training data set D of a next-level dense connection network according to the detection result4Reuse of D4Training a second-stage cascaded dense connection network;
and (4) detecting the multi-pose human face in the picture to be tested by using the network model obtained by training.
2. The face detection method based on the multi-scale cascade dense connection neural network as claimed in claim 1, characterized in that more high-quality candidate regions containing faces are extracted by using different convolution layers of the regional nomination network, thereby preventing missing detection caused by too few extracted candidate regions; respectively connecting a classification layer and a regression layer to the last two convolution layers of the regional nomination network to predict the face region score and carry out frame regression; finally the elimination score is lower than the threshold value T1The remaining candidate frames are subjected to non-maximum suppression to obtain a final prediction result; t is1The value range is 0-1.
3. The face detection method based on the multi-scale cascaded densely-connected neural network as claimed in claim 1, characterized in that a global average pooling layer is introduced to replace a traditional full-connected layer for face classification and regression; the global average pooling layer is accessed after the last convolution layer of each level of dense connection network, the overall average value of each feature image output by the previous layer of convolution network is calculated, the local information of the human face is fully learned, and overfitting caused by introducing space structure information is avoided; and finally, after the average pooling layer, a softmax layer is accessed to classify and regress the face region predicted by the previous stage.
4. The face detection method based on the multi-scale cascaded dense connection neural network as claimed in claim 3, characterized in that a cascaded convolutional dense connection network is constructed to extract face features and perform fine classification and regression, there can be a plurality of dense connection blocks in each level of dense connection network, each dense connection block is composed of a plurality of convolutional layers, convolutional layers of the same dense connection block must be able to generate feature maps of the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer; the second-level network and the third-level network respectively consist of dense connection convolution networks comprising two dense connection blocks and three dense connection blocks, and the face area of the first-level prediction is eliminated step by step and refined in position; the transition layer includes a convolutional layer and a pooling layer.
5. The face detection method based on the multi-scale cascaded dense-connected neural network as claimed in claim 1, wherein the step (1) specifically comprises: a face data set D1Sub-training data set D preprocessed to conform to first-level network input format in cascade network2The resolution is 12 × 12; the sub-training data set D2There are three types of training pictures: face images, partial face images, non-face images; label information corresponding to the three types of pictures is produced as follows: the face image is marked as 1, part of the face image is marked as-1, and the non-face image is marked as 0; the face and part of the face image are also marked with face rectangular frame information, and the face rectangular frame information of the non-face image is marked with-1.
6. The face detection method based on the multi-scale cascaded densely-connected neural network as claimed in claim 1, wherein the step (3) specifically comprises: constructing a cascade convolution dense connection network to extract the human face features and carrying out fine classification and regression, wherein each level of dense connection network is provided with a plurality of dense connection blocks, and each dense connection block is constructed by a plurality of convolution layersTherefore, the convolutional layers of the same dense connection block must be capable of generating feature maps of the same size; in the same dense connecting block, the input of each convolution layer is formed by connecting the characteristic graphs generated by all the convolution layers in the front; two adjacent dense connecting blocks are connected by a transition layer; accessing the last convolution layer of the dense connection network to a global average pooling layer, and calculating an overall average value of each feature graph fe feature graph output by the previous convolution network, wherein the number of the feature graphs is consistent with the number of classified categories; by using D3Training the dense connection network of the first stage, and after the training is completed, D1Sending the data into a network consisting of a regional nomination network and a first-level dense connection network for cascade detection, and then generating a sub-training data set D4,D4The resolution of the image of (3) is 48 × 48, and the generation method is the same as that of the step (2); finally, by D4A second level dense connectivity network is trained.
7. The face detection method based on the multi-scale cascaded densely-connected neural network as claimed in claim 1, wherein the step (4) specifically comprises: cascading the regional nomination network and the two-stage dense connection network to form a three-stage cascaded network whole; then, pyramid scale conversion is carried out on a new picture, the conversion ratio is 0.709, the converted picture is input into a first-stage regional nomination network model, a large number of face classification scores and face rectangular frame regression vectors are generated, and the elimination score is lower than a threshold value T1The remaining face rectangular frames are subjected to non-maximum value suppression to obtain a final prediction result; the predicted results are then input into a second level network model, again with a culling score below a threshold T2The human face rectangular frame is screened by using a non-maximum suppression algorithm; finally, inputting the prediction result into a third-level network model, outputting the score of the face rectangular frame and the face rectangular frame information, and eliminating the score lower than a threshold value T3The face rectangular frame is screened by using a non-maximum suppression algorithm to obtain a final prediction result T1、T2、T3The value of (1) is 0-1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810605067.6A CN108875624B (en) | 2018-06-13 | 2018-06-13 | Face detection method based on multi-scale cascade dense connection neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810605067.6A CN108875624B (en) | 2018-06-13 | 2018-06-13 | Face detection method based on multi-scale cascade dense connection neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108875624A CN108875624A (en) | 2018-11-23 |
CN108875624B true CN108875624B (en) | 2022-03-25 |
Family
ID=64338103
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810605067.6A Active CN108875624B (en) | 2018-06-13 | 2018-06-13 | Face detection method based on multi-scale cascade dense connection neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108875624B (en) |
Families Citing this family (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543648B (en) * | 2018-11-30 | 2022-06-17 | 公安部交通管理科学研究所 | Method for extracting face in car passing picture |
CN109886286B (en) * | 2019-01-03 | 2021-07-23 | 武汉精测电子集团股份有限公司 | Target detection method based on cascade detector, target detection model and system |
CN110059584B (en) * | 2019-03-28 | 2023-06-02 | 中山大学 | Event naming method combining boundary distribution and correction |
CN110163102A (en) * | 2019-04-18 | 2019-08-23 | 麦克奥迪(厦门)医疗诊断系统有限公司 | A kind of cervical cell image classification recognition methods based on convolutional neural networks |
CN110335244A (en) * | 2019-05-17 | 2019-10-15 | 杭州数据点金科技有限公司 | A kind of tire X-ray defect detection method based on more Iterative classification devices |
CN111986278B (en) | 2019-05-22 | 2024-02-06 | 富士通株式会社 | Image encoding device, probability model generating device, and image compression system |
CN112001205B (en) * | 2019-05-27 | 2023-10-31 | 北京君正集成电路股份有限公司 | Network model sample acquisition method for secondary face detection |
CN111027382B (en) * | 2019-11-06 | 2023-06-23 | 华中师范大学 | Attention mechanism-based lightweight face detection method and model |
CN110866484B (en) * | 2019-11-11 | 2022-09-09 | 珠海全志科技股份有限公司 | Driver face detection method, computer device and computer readable storage medium |
CN111080576B (en) * | 2019-11-26 | 2023-09-26 | 京东科技信息技术有限公司 | Key point detection method and device and storage medium |
CN110859624A (en) * | 2019-12-11 | 2020-03-06 | 北京航空航天大学 | Brain age deep learning prediction system based on structural magnetic resonance image |
CN113051960A (en) * | 2019-12-26 | 2021-06-29 | 深圳市光鉴科技有限公司 | Depth map face detection method, system, device and storage medium |
CN111274886B (en) * | 2020-01-13 | 2023-09-19 | 天地伟业技术有限公司 | Deep learning-based pedestrian red light running illegal behavior analysis method and system |
CN111368707B (en) * | 2020-03-02 | 2023-04-07 | 佛山科学技术学院 | Face detection method, system, device and medium based on feature pyramid and dense block |
CN111428661A (en) * | 2020-03-28 | 2020-07-17 | 北京工业大学 | Method for processing face image based on intelligent human-computer interaction |
CN113569991B (en) * | 2021-08-26 | 2024-05-28 | 深圳市捷顺科技实业股份有限公司 | Person evidence comparison model training method, computer equipment and computer storage medium |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107247956B (en) * | 2016-10-09 | 2020-03-27 | 成都快眼科技有限公司 | Rapid target detection method based on grid judgment |
CN107748858A (en) * | 2017-06-15 | 2018-03-02 | 华南理工大学 | A kind of multi-pose eye locating method based on concatenated convolutional neutral net |
CN107688786A (en) * | 2017-08-30 | 2018-02-13 | 南京理工大学 | A kind of method for detecting human face based on concatenated convolutional neutral net |
CN108090918A (en) * | 2018-02-12 | 2018-05-29 | 天津天地伟业信息系统集成有限公司 | A kind of Real-time Human Face Tracking based on the twin network of the full convolution of depth |
-
2018
- 2018-06-13 CN CN201810605067.6A patent/CN108875624B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN108875624A (en) | 2018-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108875624B (en) | Face detection method based on multi-scale cascade dense connection neural network | |
Gupta et al. | Sequential modeling of deep features for breast cancer histopathological image classification | |
CN108416266B (en) | Method for rapidly identifying video behaviors by extracting moving object through optical flow | |
US8379994B2 (en) | Digital image analysis utilizing multiple human labels | |
WO2018052587A1 (en) | Method and system for cell image segmentation using multi-stage convolutional neural networks | |
Wu et al. | U-GAN: Generative adversarial networks with U-Net for retinal vessel segmentation | |
CN104504365A (en) | System and method for smiling face recognition in video sequence | |
CN110414367B (en) | Time sequence behavior detection method based on GAN and SSN | |
CN113011357A (en) | Depth fake face video positioning method based on space-time fusion | |
CN111126115A (en) | Violence sorting behavior identification method and device | |
CN106971158A (en) | A kind of pedestrian detection method based on CoLBP symbiosis feature Yu GSS features | |
CN113095199B (en) | High-speed pedestrian identification method and device | |
CN115240024A (en) | Method and system for segmenting extraterrestrial pictures by combining self-supervised learning and semi-supervised learning | |
CN110728238A (en) | Personnel re-detection method of fusion type neural network | |
CN118212572A (en) | Road damage detection method based on improvement YOLOv7 | |
CN113205060A (en) | Human body action detection method adopting circulatory neural network to judge according to bone morphology | |
CN113221683A (en) | Expression recognition method based on CNN model in teaching scene | |
CN113221667A (en) | Face and mask attribute classification method and system based on deep learning | |
Fan et al. | BFNet: Brain-like feedback network for object detection under severe weather | |
CN112487926A (en) | Scenic spot feeding behavior identification method based on space-time diagram convolutional network | |
CN112418358A (en) | Vehicle multi-attribute classification method for strengthening deep fusion network | |
Han et al. | Feature fusion and adversary occlusion networks for object detection | |
CN111401209A (en) | Action recognition method based on deep learning | |
Shi et al. | SSFD: A face detector using a single-scale feature map | |
Alsaedi et al. | Design and Simulation of Smart Parking System Using Image Segmentation and CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |