CN111368707B - Face detection method, system, device and medium based on feature pyramid and dense block - Google Patents

Face detection method, system, device and medium based on feature pyramid and dense block Download PDF

Info

Publication number
CN111368707B
CN111368707B CN202010134064.6A CN202010134064A CN111368707B CN 111368707 B CN111368707 B CN 111368707B CN 202010134064 A CN202010134064 A CN 202010134064A CN 111368707 B CN111368707 B CN 111368707B
Authority
CN
China
Prior art keywords
dense block
face detection
face
image
dense
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010134064.6A
Other languages
Chinese (zh)
Other versions
CN111368707A (en
Inventor
曾凡智
邹磊
周燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Foshan University
Original Assignee
Foshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Foshan University filed Critical Foshan University
Priority to CN202010134064.6A priority Critical patent/CN111368707B/en
Publication of CN111368707A publication Critical patent/CN111368707A/en
Application granted granted Critical
Publication of CN111368707B publication Critical patent/CN111368707B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a face detection method, a system, equipment and a medium based on a characteristic pyramid and a dense block, wherein the method comprises the following steps: constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid; acquiring a face image training set; training a face detection network by using a face image training set; and carrying out face detection on the image to be detected by using the trained face detection network to obtain a face detection result. The invention utilizes the dense blocks with low parameter number and high frequency feature multiplexing and combines the feature pyramid top-down feature fusion method, and can quickly and efficiently detect the human faces with different scales in the image.

Description

Face detection method, system, device and medium based on feature pyramid and dense block
Technical Field
The invention relates to a face detection method, a system, equipment and a medium based on a feature pyramid and a dense block, and belongs to the field of deep learning and image processing.
Background
The face detection technology is a key link in an automatic face recognition system, and is characterized in that any given image is searched by adopting a certain strategy to determine whether the image contains a face or not, and if the image contains the face, the position, the size and the posture of the face are returned.
Early face detection algorithms were mostly based on manual features (e.g., image texture), and in this regard there were face detection algorithms proposed in the Viola-Jones detector that combine cascaded features and Adaboost learning. Many scholars subsequently proposed face detection algorithms that could run in real time, such as new local features, new acceleration algorithms, and new cascaded architectures. In addition to the method based on the cascade framework, some researchers propose to add a Deformable Part Model (DPM) to a face detection algorithm, so that a better detection effect is achieved.
In recent years, with the continuous progress of the deep learning technology, the face detection technology has been further developed, and nowadays, the deep learning technology is added to more and more face detection algorithms. Farfade et al fine-tune the convolutional neural network trained on 1000 classes of ImageNet, and use it in the task of classifying faces and non-faces. Faceness trains a series of small networks and concatenates them for detecting partially occluded faces. CascadeCNN builds the cascade structure on the basis of the convolutional neural network, and achieves a very good effect. UnitBox introduces a new cross-over loss function, and realizes the effect of distinguishing human faces from non-human faces without errors.
However, most of the existing face detection algorithms have the following problems:
1) The detection time is too long. Most of the existing face detection algorithms are operated by combining an image pyramid technology, and the purpose is to detect faces with different scales in an image. The image pyramid technology is to scale an image into a plurality of images with different sizes, and is used for detecting objects (the object refers to a human face) with different sizes in the image. Many smaller faces may not be detected if only a single scale image is used for face detection. Therefore, the face detection algorithm is combined with the image pyramid technology, which is time-consuming in operation, and one image may need to be repeated for detecting faces with different scales for ten times.
2) The network parameters are numerous, and the model is large in size. At present, a plurality of face detection networks are designed very deeply (the number of network layers is large), and the face detection network has the advantages that more features can be extracted, faces and non-faces can be better distinguished, but the problems are that the number of parameters is large, and the calculation time is long.
Disclosure of Invention
In view of this, the present invention provides a face detection method, system, device and medium based on a feature pyramid and dense blocks, which can quickly and efficiently detect faces of different scales in an image by using dense blocks with low parameter number and high frequency feature multiplexing and combining a feature pyramid top-down feature fusion method.
The first purpose of the present invention is to provide a face detection method based on a feature pyramid and a dense block.
The second objective of the present invention is to provide a face detection system based on the feature pyramid and the dense block.
It is a third object of the invention to provide a computer apparatus.
It is a fourth object of the invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a face detection method based on a feature pyramid and a dense block, the method comprising:
constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid;
acquiring a face image training set;
training a face detection network by using a face image training set;
and performing face detection on the image to be detected by using the trained face detection network to obtain a face detection result.
Further, training the face detection network by using the face image training set specifically includes:
dividing a face image training set into a plurality of batches of face images;
and setting a plurality of periods, and sequentially inputting the face images of each batch into the face detection network for training in each period so as to finish the training of the face detection network in the plurality of periods.
Furthermore, in the face detection network, eight dense blocks are provided, two pooling layers are provided, two feature fusion modules are provided, the eight dense blocks are respectively a first dense block, a second dense block, a third dense block, a fourth dense block, a fifth dense block, a sixth dense block, a seventh dense block and an eighth dense block, the two pooling layers are respectively a first pooling layer and a second pooling layer, and the feature fusion modules are respectively a first feature fusion module and a second feature fusion module;
the first dense block, the second dense block, the first pooling layer, the third dense block, the fourth dense block, the fifth dense block, the second pooling layer, the sixth dense block, the seventh dense block and the eighth dense block are connected in sequence;
the input of the first feature fusion module is respectively connected with the output of the fifth dense block and the output of the eighth dense block;
and the input of the second characteristic fusion module is respectively connected with the output of the second dense block and the output of the first characteristic fusion module.
Further, the performing face detection on the image to be detected by using the trained face detection network to obtain a face detection result specifically includes:
inputting an image to be detected into a trained face detection network, sequentially processing the image through a first dense block and a second dense block, and outputting to obtain a first image characteristic;
inputting the first image characteristic into a first pooling layer for down-sampling, sequentially processing the first image characteristic by a third dense block, a fourth dense block and a fifth dense block after the down-sampling, and outputting to obtain a second image characteristic;
inputting the second image characteristic into a second pooling layer for down-sampling, sequentially processing the second image characteristic by a sixth dense block, a seventh dense block and an eighth dense block after the down-sampling, and outputting to obtain a third image characteristic;
performing feature fusion on the second image features and the second image features by using a first fusion module, and outputting to obtain fourth image features;
and performing feature fusion on the first image feature and the fourth image feature by using a second fusion module, and outputting to obtain a face detection result.
Furthermore, the dense block comprises a plurality of elements of the same type, the elements are connected in sequence, and the elements are connected into a whole through jump connection;
each element comprises a 1 × 1 convolution layer, a first Swish function layer, a first batch normalization layer, a 3 × 3 convolution layer, a second Swish function layer and a second batch normalization layer which are connected in sequence, wherein the 1 × 1 convolution layer is used for compressing and reducing the dimension of upper-layer output data.
Further, the feature pyramid is preprocessed before feature fusion, and the preprocessing includes 1 × 1 convolution and 2 × upsampling.
Further, the face detection network training loss function is as follows:
L(p,u,t u ,v)=αL cls (p,u)+L loc (t u ,v)
wherein, L (p, u, t) u V) represents the overall loss value of the face detection network; l is cls Represents the classification loss value, L cls (p,u)=-log(p u );L loc The regression box position loss value is represented,
Figure BDA0002396690570000031
Figure BDA0002396690570000032
p=(p 0 ...p k ) Representing the probability value of each category of the detected object predicted by the face detection network, and k representing that the classified objects are a face and a non-face; u represents a human face; />
Figure BDA0002396690570000033
Information indicating the location at which the face detection network detected a face, and->
Figure BDA0002396690570000034
Respectively representing the x and y values of the upper left corner of the face frame containing the detected object and the width and the length of the regression frame; v = (v) x ,v y ,v w ,v h ) Indicating the position information of the artificially marked face, v x ,v y ,v w ,v h Respectively representing the x and y values of the upper left corner of the artificially marked face frame and the width and the length of the regression frame; α is a weight.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a feature pyramid and dense block based face detection system, the system comprising:
the construction unit is used for constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid;
the acquisition unit is used for acquiring a face image training set;
the training unit is used for training the face detection network by using a face image training set;
and the detection unit is used for carrying out face detection on the image to be detected by utilizing the trained face detection network to obtain a face detection result.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a computer device comprising a processor and a memory for storing a processor-executable program, the processor implementing the above-described super-resolution image reconstruction method when executing the program stored in the memory.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program that, when executed by a processor, implements the above-described super-resolution image reconstruction method.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention has less parameters and more stable training, uses the dense block as the main module of the face detection network, and utilizes the advantage of high-frequency feature multiplexing of the dense block, thereby realizing high-quality feature extraction under the condition of less network layers and also ensuring the appearance of gradient disappearance phenomenon in the network training process; in addition, the human face detection time is short, and the image pyramid technology used in the traditional human face detection algorithm is replaced by the characteristic pyramid technology, so that the algorithm can realize a faster detection effect under the condition of unchanged detection effect, and can meet the application requirements of scenes with low requirements on smaller human faces.
2. The trained face detection network has small volume, belongs to a lighter and faster face detection algorithm, can be transplanted into numerous small embedded devices or devices with smaller memories, can also be added into algorithms such as face recognition, face expression recognition, age recognition and the like, can acquire a face from an image more quickly on the premise of not influencing the running time of the method, and has wide application value.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a flowchart of a face detection method based on a feature pyramid and a dense block in embodiment 1 of the present invention.
Fig. 2 is a general architecture diagram of a face detection network according to embodiment 1 of the present invention.
Fig. 3 is a structural view of a dense block composition in example 1 of the present invention.
Fig. 4 is a structural diagram of a feature pyramid in embodiment 1 of the present invention.
Fig. 5 is a flowchart of training a face detection network by using a face image training set according to embodiment 1 of the present invention.
Fig. 6 is a flowchart of performing face detection on an image to be detected by using a trained face detection network in embodiment 1 of the present invention.
Fig. 7 is a block diagram of a face detection system based on a feature pyramid and a dense block according to embodiment 2 of the present invention.
Fig. 8 is a block diagram of a training unit according to embodiment 2 of the present invention.
Fig. 9 is a block diagram of a computer device according to embodiment 3 of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1, the present embodiment provides a face detection method based on a feature pyramid and a dense block, which includes the following steps:
s101, constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid.
The general architecture of the face detection network is shown in fig. 2, wherein eight dense blocks are provided, two pooling layers are provided, two feature fusion modules are provided, the eight dense blocks are respectively a first dense block, a second dense block, a third dense block, a fourth dense block, a fifth dense block, a sixth dense block, a seventh dense block and an eighth dense block, the two pooling layers are respectively a first pooling layer and a second pooling layer, and the feature fusion modules are respectively a first feature fusion module and a second feature fusion module.
As can be seen from fig. 2, the first dense block, the second dense block, the first pooling layer, the third dense block, the fourth dense block, the fifth dense block, the second pooling layer, the sixth dense block, the seventh dense block, and the eighth dense block are sequentially connected; the input of the first characteristic fusion module is respectively connected with the output of the fifth dense block and the output of the eighth dense block; the input of the second feature fusion module is connected with the output of the second dense block and the output of the first feature fusion module respectively.
Further, each dense block has a structure as shown in fig. 3, and includes a plurality of elements of the same type, the plurality of elements are connected in sequence, each element includes a 1 × 1 convolution layer, a first Swish function layer, a first batch normalization layer, a 3 × 3 convolution layer, a second Swish function layer, and a second batch normalization layer, which are connected in sequence, and the 1 × 1 convolution layer is used for compressing and reducing the dimension of the output data of the upper layer to reduce the amount of computation; the function Swish is activated to replace a ReLU function in the traditional network, so that the nonlinear expression capability of the model can be better enhanced, and the capability of the network for coping with complex environments is improved; a Batch Normalization layer (BN for short) is used for normalizing the input of each layer to ensure that the input data distribution of each layer is stable, thereby accelerating the training; 3 x 3 convolutional layers, extracting image features from the local, having lower parameters while ensuring better performance; finally, all elements in the dense block are connected into a whole through jump connection, so that the features obtained by all the elements are utilized to the maximum degree, and the gradient dispersion phenomenon in the face detection network training process is also ensured.
Because a corrected Linear Unit (ReLU for short) has the advantages of simple operation, high calculation efficiency, fast signal response and the like, it is often used in various deep learning algorithms, but its advantage is only in the aspect of forward propagation, and because the ReLU function discards all negative values, it is easy to make the model output all zero and cannot train any more.
Based on the above situation, in the present embodiment, a Swish function is used as the activation function, and the mathematical form thereof is shown in formula (1), and compared with the ReLU function, the Swish function can push the output average value of the activation unit to 0, so as to achieve the effect of batch normalization and reduce the amount of calculation, that is, the output average value close to 0 can reduce the shift effect, thereby making the gradient close to the natural state.
Figure BDA0002396690570000061
Further, since the face detection network has pooling operation (i.e., downsampling), in the process of convolution of an input image, the size of the input image is reduced in stages through the pooling operation, which is consistent with the scaling of the image pyramid, in this embodiment, the feature map before the size of each image is changed is output through the feature pyramid technology, and the coarse features and the fine features are fused, so that the purpose of detecting both a large face and a small face can be achieved, and the effect of the image feature pyramid technology can be basically achieved, but the overall processing time is shorter, and the specific structure is as shown in fig. 4. As can be seen from fig. 4, there are two main types of pre-processing, which are 1 × 1 convolution and 2 × upsampling (2 times upsampling), respectively; the purpose of 1x1 convolution is to increase the number of image channels, the number of image channels increases during pooling, the number of image channels must be the same for both images to enable feature fusion, and 2x upsampling is to increase the image size, which must be the same for both images to enable feature fusion due to the reduction in image size during pooling.
And S102, acquiring a face image training set.
And S103, training the face detection network by using the face image training set.
In the embodiment, a public face data set CASIA-Webface is used for training a face detection network, 40 ten thousand images are used as a face image training set, and 1 ten thousand images are used as a face image test set for subsequent testing.
Further, as shown in fig. 5, the step S103 specifically includes:
and S1031, dividing the face image training set into a plurality of batches of face images.
In this embodiment, the face image training set is divided into 12500 batches of face images, that is, 32 face images in each batch.
S1032, setting a plurality of periods, and sequentially inputting the face images of each batch into the face detection network for training in each period so as to finish the training of the face detection network in the plurality of periods.
In the embodiment, 400 periods are set, and an Adam algorithm optimizer is adopted, in each period, 32 face images in each batch are obtained during training, the weight attenuation is set to be 0.0001, the initial learning rate is set to be 0.001, and the learning rate is reduced by 90% after each period of 100.
The face detection is essentially a combination of image classification (the invention classifies faces and non-faces) and regression box (a rectangular box containing the whole face) position determination, and if the image classification and the regression box position determination are trained independently, the algorithm cannot obtain good face detection capability. Therefore, the loss function used by the invention is a multitask loss function, the image classification and the position determination of the regression frame are trained in the same loss function, and the function of joint debugging of the whole network can be achieved, and the specific expression is shown as the following formula (2):
L(p,u,t u ,v)=αL cls (p,u)+L loc (t u ,v)
wherein, L (p, u, t) u V) represents an overall loss value of the face detection network; l is cls Represents the classification loss value, L cls (p,u)=-log(p u );L loc The regression box position loss value is represented,
Figure BDA0002396690570000071
Figure BDA0002396690570000072
p=(p 0 ...p k ) Representing the probability value of each category of the detected object predicted by the face detection network, wherein k represents that the classified objects are a face and a non-face, and k is 2 in the embodiment; u represents the category of the artificial mark, which is a human face in this embodiment; />
Figure BDA0002396690570000073
Information indicating the location at which the face detection network detected a face, and->
Figure BDA0002396690570000074
Respectively representing the x and y values of the upper left corner of the face frame containing the detected object and the width and the length of the regression frame; v = (v) x ,v y ,v w ,v h ) Indicating the position information of the artificially marked face, v x ,v y ,v w ,v h Respectively representing the x and y values of the upper left corner of the human face frame marked by the human hand and the width and the length of the regression frame; α is a weight, and α is 2 in this embodiment.
And S104, carrying out face detection on the image to be detected by using the trained face detection network to obtain a face detection result.
Further, the step S104 specifically includes:
s1041, inputting the image to be detected into the trained face detection network, sequentially processing the image through the first dense block and the second dense block, and outputting to obtain a first image characteristic.
In the embodiment, the face image in the face image test set is used as the image to be detected, the image to be detected is input into the trained face detection network, and is processed by the first dense block and the second dense block in sequence and output to obtain the first image characteristic.
And S1042, inputting the first image characteristic into the first pooling layer for down-sampling, sequentially processing the first image characteristic by the third dense block, the fourth dense block and the fifth dense block after down-sampling, and outputting to obtain a second image characteristic.
And S1043, inputting the second image characteristic into a second pooling layer for down-sampling, sequentially processing the second image characteristic by a sixth dense block, a seventh dense block and an eighth dense block after the down-sampling, and outputting to obtain a third image characteristic.
And S1044, performing feature fusion on the second image features and the second image features by using the first fusion module, and outputting to obtain fourth image features.
And S1045, performing feature fusion on the first image feature and the fourth image feature by using a second fusion module, and outputting to obtain a face detection result.
The steps S101 to S103 are off-line, i.e., training, phases, and are composed of three major parts, i.e., a construction generator network, a construction discriminator network, and a training generation antagonistic neural network, and the step S104 is on-line, i.e., application. It can be understood that the steps S101 to S103 are completed in one computer device (e.g., a computer), the application stage of the step S104 can be performed on the computer device, or the face detection network trained by the computer device can be transplanted to another computer device (e.g., a small embedded device such as a mobile phone and a tablet computer, or a device with a small memory), and the application stage of the step S104 can be performed on another computer device.
CMS-RCNN (context Multi-Scale Region-based convolutional Neural Network), SSH (Single Stage Headless Face Detector, single Shot Scale invariant Face Detector), and the Face detection Network of the present embodiment are tested on the WIDER Face (the data set is divided into a simple image set, a medium image set, and a difficult image set according to the size of the Face, the side Face degree, the occlusion condition, etc.) Face detection reference, and the test accuracy is as shown in the following Table 1.
TABLE 1 test accuracy for each model
Model (model) Simple image set Intermediate image set Difficult image set
CMS-RCNN 89.9% 87.4% 62.9%
SSH 91.9% 90.7% 81.4%
S3FD 93.1% 92.1% 84.5%
Face detection network of the embodiment 93.8% 93.1% 86.4%
As can be seen from table 1, the test accuracy of the face detection network in this embodiment on three image sets is higher than that of the other three models, so that faces can be effectively detected.
Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program to instruct associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 7, the present embodiment provides a face detection system based on a feature pyramid and a dense block, the system includes a construction unit 701, an acquisition unit 702, a training unit 703 and a detection unit 704, and specific functions of each unit are as follows:
the constructing unit 701 is configured to construct a face detection network according to the dense block, the pooling layer, and the feature fusion module based on the feature pyramid.
The acquiring unit 702 is configured to acquire a face image training set.
The training unit 703 is configured to train a face detection network by using a face image training set.
The detecting unit 704 is configured to perform face detection on the image to be detected by using the trained face detection network, so as to obtain a face detection result.
Further, as shown in fig. 8, the training unit 703 specifically includes:
a dividing subunit 7031, configured to divide the face image training set into multiple batches of face images.
Training subunit 7032 sets a plurality of periods, and inputs each batch of face images into the face detection network in sequence for training at each period, so as to complete training of the face detection network at the plurality of periods.
The specific implementation of each unit in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the system provided in this embodiment is only illustrated by the division of the functional units, and in practical applications, the above functions may be distributed by different functional units according to needs, that is, the internal structure is divided into different functional modules to complete all or part of the functions described above.
Example 3:
the present embodiment provides a computer device, which may be a computer, as shown in fig. 9, and includes a processor 902, a memory, an input device 903, a display 904, and a network interface 905 connected by a system bus 901, where the processor is used to provide computing and control capabilities, the memory includes a nonvolatile storage medium 906 and an internal memory 907, the nonvolatile storage medium 906 stores an operating system, computer programs, and a database, the internal memory 907 provides an environment for the operating system and the computer programs in the nonvolatile storage medium to run, and when the processor 902 executes the computer programs stored in the memory, the face detection method of the above embodiment 1 is implemented as follows:
constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid;
acquiring a face image training set;
training a face detection network by using a face image training set;
and carrying out face detection on the image to be detected by using the trained face detection network to obtain a face detection result.
Further, the training of the face detection network by using the face image training set specifically includes:
dividing a face image training set into a plurality of batches of face images;
and setting a plurality of periods, and sequentially inputting the face images of each batch into the face detection network for training in each period so as to finish the training of the face detection network in the plurality of periods.
Furthermore, in the face detection network, eight dense blocks are provided, two pooling layers are provided, two feature fusion modules are provided, the eight dense blocks are respectively a first dense block, a second dense block, a third dense block, a fourth dense block, a fifth dense block, a sixth dense block, a seventh dense block and an eighth dense block, the two pooling layers are respectively a first pooling layer and a second pooling layer, and the feature fusion modules are respectively a first feature fusion module and a second feature fusion module;
the first dense block, the second dense block, the first pooling layer, the third dense block, the fourth dense block, the fifth dense block, the second pooling layer, the sixth dense block, the seventh dense block and the eighth dense block are connected in sequence;
the input of the first feature fusion module is respectively connected with the output of the fifth dense block and the output of the eighth dense block;
and the input of the second characteristic fusion module is respectively connected with the output of the second dense block and the output of the first characteristic fusion module.
Further, the performing face detection on the image to be detected by using the trained face detection network to obtain a face detection result specifically includes:
inputting an image to be detected into a trained face detection network, sequentially processing the image through a first dense block and a second dense block, and outputting to obtain a first image characteristic;
inputting the first image characteristic into a first pooling layer for down-sampling, sequentially processing the first image characteristic by a third dense block, a fourth dense block and a fifth dense block after the down-sampling, and outputting to obtain a second image characteristic;
inputting the second image characteristic into a second pooling layer for down-sampling, sequentially processing the second image characteristic by a sixth dense block, a seventh dense block and an eighth dense block after the down-sampling, and outputting to obtain a third image characteristic;
performing feature fusion on the second image features and the second image features by using a first fusion module, and outputting to obtain fourth image features;
and performing feature fusion on the first image feature and the fourth image feature by using a second fusion module, and outputting to obtain a face detection result.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, where when the computer program is executed by a processor, the computer program implements the face detection method of embodiment 1, as follows:
constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid;
acquiring a face image training set;
training a face detection network by using a face image training set;
and carrying out face detection on the image to be detected by using the trained face detection network to obtain a face detection result.
Further, the training of the face detection network by using the face image training set specifically includes:
dividing a face image training set into a plurality of batches of face images;
and setting a plurality of periods, and sequentially inputting the face images of each batch into the face detection network for training in each period so as to finish the training of the face detection network in the plurality of periods.
Furthermore, in the face detection network, eight dense blocks are provided, two pooling layers are provided, two feature fusion modules are provided, the eight dense blocks are respectively a first dense block, a second dense block, a third dense block, a fourth dense block, a fifth dense block, a sixth dense block, a seventh dense block and an eighth dense block, the two pooling layers are respectively a first pooling layer and a second pooling layer, and the feature fusion modules are respectively a first feature fusion module and a second feature fusion module;
the first dense block, the second dense block, the first pooling layer, the third dense block, the fourth dense block, the fifth dense block, the second pooling layer, the sixth dense block, the seventh dense block and the eighth dense block are connected in sequence;
the input of the first feature fusion module is respectively connected with the output of the fifth dense block and the output of the eighth dense block;
and the input of the second characteristic fusion module is respectively connected with the output of the second dense block and the output of the first characteristic fusion module.
Further, the performing face detection on the image to be detected by using the trained face detection network to obtain a face detection result specifically includes:
inputting an image to be detected into a trained face detection network, sequentially processing the image through a first dense block and a second dense block, and outputting to obtain a first image characteristic;
inputting the first image characteristic into a first pooling layer for down-sampling, sequentially processing the first image characteristic by a third dense block, a fourth dense block and a fifth dense block after the down-sampling, and outputting to obtain a second image characteristic;
inputting the second image characteristic into a second pooling layer for down-sampling, sequentially processing the second image characteristic by a sixth dense block, a seventh dense block and an eighth dense block after the down-sampling, and outputting to obtain a third image characteristic;
performing feature fusion on the second image features and the second image features by using a first fusion module, and outputting to obtain fourth image features;
and performing feature fusion on the first image feature and the fourth image feature by using a second fusion module, and outputting to obtain a face detection result.
The storage medium described in this embodiment may be a magnetic disk, an optical disk, a computer Memory, a Random Access Memory (RAM), a usb disk, a removable hard disk, or other media.
In conclusion, the invention has less parameters and more stable training, uses the dense block as the main module of the face detection network, and can realize high-quality feature extraction under the condition of less network layers by using the advantage of high-frequency feature multiplexing of the dense block, and also ensures the gradient disappearance phenomenon in the network training process; in addition, the human face detection time is short, and the image pyramid technology used in the traditional human face detection algorithm is replaced by the characteristic pyramid technology, so that the algorithm can realize a faster detection effect under the condition of unchanged detection effect, and can meet the application requirements of scenes with low requirements on smaller human faces.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims (7)

1. A face detection method based on a feature pyramid and a dense block is characterized by comprising the following steps:
constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid;
acquiring a face image training set;
training a face detection network by using a face image training set;
carrying out face detection on an image to be detected by using the trained face detection network to obtain a face detection result;
in the face detection network, eight dense blocks are provided, two pooling layers are provided, two feature fusion modules are provided, the eight dense blocks are respectively a first dense block, a second dense block, a third dense block, a fourth dense block, a fifth dense block, a sixth dense block, a seventh dense block and an eighth dense block, the two pooling layers are respectively a first pooling layer and a second pooling layer, and the feature fusion modules are respectively a first feature fusion module and a second feature fusion module;
the first dense block, the second dense block, the first pooling layer, the third dense block, the fourth dense block, the fifth dense block, the second pooling layer, the sixth dense block, the seventh dense block and the eighth dense block are connected in sequence;
the input of the first feature fusion module is respectively connected with the output of the fifth dense block and the output of the eighth dense block;
the input of the second feature fusion module is respectively connected with the output of the second dense block and the output of the first feature fusion module;
the method for detecting the face of the image to be detected by using the trained face detection network to obtain the face detection result specifically comprises the following steps:
inputting an image to be detected into a trained face detection network, sequentially processing the image through a first dense block and a second dense block, and outputting to obtain a first image characteristic;
inputting the first image characteristic into a first pooling layer for down-sampling, sequentially processing the first image characteristic by a third dense block, a fourth dense block and a fifth dense block after the down-sampling, and outputting to obtain a second image characteristic;
inputting the second image characteristic into a second pooling layer for down-sampling, sequentially processing the second image characteristic by a sixth dense block, a seventh dense block and an eighth dense block after the down-sampling, and outputting to obtain a third image characteristic;
performing feature fusion on the second image features and the second image features by using a first fusion module, and outputting to obtain fourth image features;
performing feature fusion on the first image feature and the fourth image feature by using a second fusion module, and outputting to obtain a face detection result;
the training loss function of the face detection network is as follows:
L(p,u,t u ,v)=αL cls (p,u)+L loc (t u ,v)
wherein, L (p, u, t) u V) represents the overall loss value of the face detection network; l is a radical of an alcohol cls Represents the classification loss value, L cls (p,u)=-log(p u );L loc The regression box position loss value is represented,
Figure FDA0004058440440000021
Figure FDA0004058440440000022
p=(p 0 ...p k ) Representing the probability value of each category of the detected object predicted by the face detection network, and k representing that the classified objects are a face and a non-face; u represents a human face; />
Figure FDA0004058440440000023
Information indicating the location at which the face detection network detected a face, and->
Figure FDA0004058440440000024
Respectively representing the x and y values of the upper left corner of the face frame containing the detected object and the width and the length of the regression frame; v = (v) x ,v y ,v w ,v h ) Indicating the position information of the artificially marked face, v x ,v y ,v w ,v h Respectively representing the x and y values of the upper left corner of the human face frame marked by the human hand and the width and the length of the regression frame; α is a weight. />
2. The method of claim 1, wherein the training of the face detection network using the face image training set specifically comprises:
dividing a face image training set into a plurality of batches of face images;
and setting a plurality of periods, and sequentially inputting the face images of each batch into the face detection network for training in each period so as to finish the training of the face detection network in the plurality of periods.
3. The face detection method according to any one of claims 1-2, wherein the dense block comprises a plurality of elements of the same type, the plurality of elements are connected in sequence, and the plurality of elements are connected into a whole through jump connection;
each element comprises a 1 multiplied by 1 convolutional layer, a first Swish function layer, a first batch normalization layer, a 3 multiplied by 3 convolutional layer, a second Swish function layer and a second batch normalization layer which are sequentially connected, wherein the 1 multiplied by 1 convolutional layer is used for compressing output data of an upper layer and reducing dimensions.
4. The method of any of claims 1-2, wherein the feature pyramid is preprocessed before feature fusion, the preprocessing including 1x1 convolution and 2x upsampling.
5. A system for detecting a face based on a feature pyramid and a dense block, the system comprising:
the construction unit is used for constructing a face detection network according to the dense block, the pooling layer and the feature fusion module based on the feature pyramid;
the acquisition unit is used for acquiring a face image training set;
the training unit is used for training the face detection network by using a face image training set;
the detection unit is used for carrying out face detection on the image to be detected by utilizing the trained face detection network to obtain a face detection result;
in the face detection network, eight dense blocks are provided, two pooling layers are provided, two feature fusion modules are provided, the eight dense blocks are respectively a first dense block, a second dense block, a third dense block, a fourth dense block, a fifth dense block, a sixth dense block, a seventh dense block and an eighth dense block, the two pooling layers are respectively a first pooling layer and a second pooling layer, and the feature fusion modules are respectively a first feature fusion module and a second feature fusion module;
the first dense block, the second dense block, the first pooling layer, the third dense block, the fourth dense block, the fifth dense block, the second pooling layer, the sixth dense block, the seventh dense block and the eighth dense block are connected in sequence;
the input of the first feature fusion module is respectively connected with the output of the fifth dense block and the output of the eighth dense block;
the input of the second feature fusion module is respectively connected with the output of the second dense block and the output of the first feature fusion module;
the method for detecting the face of the image to be detected by using the trained face detection network to obtain the face detection result specifically comprises the following steps:
inputting an image to be detected into a trained face detection network, sequentially processing the image through a first dense block and a second dense block, and outputting to obtain a first image characteristic;
inputting the first image characteristic into a first pooling layer for down-sampling, sequentially processing the first image characteristic by a third dense block, a fourth dense block and a fifth dense block after the down-sampling, and outputting to obtain a second image characteristic;
inputting the second image characteristic into a second pooling layer for down-sampling, sequentially processing the second image characteristic by a sixth dense block, a seventh dense block and an eighth dense block after the down-sampling, and outputting to obtain a third image characteristic;
performing feature fusion on the second image features and the second image features by using a first fusion module, and outputting to obtain fourth image features;
performing feature fusion on the first image feature and the fourth image feature by using a second fusion module, and outputting to obtain a face detection result;
the training loss function of the face detection network is as follows:
L(p,u,t u ,v)=αL cls (p,u)+L loc (t u ,v)
wherein, L (p, u, t) u V) represents the overall loss value of the face detection network; l is cls Represents the classification loss value, L cls (p,u)=-log(p u );L loc The value of the position loss of the regression box is represented,
Figure FDA0004058440440000031
Figure FDA0004058440440000032
p=(p 0 ...p k ) Representing the probability value of each category of the detected object predicted by the face detection network, and k representing that the classified objects are a face and a non-face; u represents a human face; />
Figure FDA0004058440440000041
Information indicating the location at which the face detection network detected a face, and->
Figure FDA0004058440440000042
Respectively representing the x and y values of the upper left corner of a frame containing the detected object face and the width and the length of a regression frame; v = (v) x ,v y ,v w ,v h ) Indicating the position information of the artificially marked face, v x ,v y ,v w ,v h Respectively representing the x and y values of the upper left corner of the human face frame marked by the human hand and the width and the length of the regression frame; α is a weight.
6. A computer device comprising a processor and a memory for storing processor-executable programs, wherein the processor, when executing a program stored in the memory, implements the face detection method of any one of claims 1-4.
7. A storage medium storing a program which, when executed by a processor, implements the face detection method of any one of claims 1 to 4.
CN202010134064.6A 2020-03-02 2020-03-02 Face detection method, system, device and medium based on feature pyramid and dense block Active CN111368707B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010134064.6A CN111368707B (en) 2020-03-02 2020-03-02 Face detection method, system, device and medium based on feature pyramid and dense block

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010134064.6A CN111368707B (en) 2020-03-02 2020-03-02 Face detection method, system, device and medium based on feature pyramid and dense block

Publications (2)

Publication Number Publication Date
CN111368707A CN111368707A (en) 2020-07-03
CN111368707B true CN111368707B (en) 2023-04-07

Family

ID=71210269

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010134064.6A Active CN111368707B (en) 2020-03-02 2020-03-02 Face detection method, system, device and medium based on feature pyramid and dense block

Country Status (1)

Country Link
CN (1) CN111368707B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783749A (en) * 2020-08-12 2020-10-16 成都佳华物链云科技有限公司 Face detection method and device, electronic equipment and storage medium
CN113674155A (en) * 2021-08-25 2021-11-19 中国铁塔股份有限公司湖北省分公司 Image super-resolution method, device and storage medium based on information aggregation network
CN113763300B (en) * 2021-09-08 2023-06-06 湖北工业大学 Multi-focusing image fusion method combining depth context and convolution conditional random field

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871134A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN107871101A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN108647668A (en) * 2018-05-21 2018-10-12 北京亮亮视野科技有限公司 The construction method of multiple dimensioned lightweight Face datection model and the method for detecting human face based on the model
CN108875624A (en) * 2018-06-13 2018-11-23 华南理工大学 Method for detecting human face based on the multiple dimensioned dense Connection Neural Network of cascade

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107871134A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN107871101A (en) * 2016-09-23 2018-04-03 北京眼神科技有限公司 A kind of method for detecting human face and device
CN108647668A (en) * 2018-05-21 2018-10-12 北京亮亮视野科技有限公司 The construction method of multiple dimensioned lightweight Face datection model and the method for detecting human face based on the model
CN108875624A (en) * 2018-06-13 2018-11-23 华南理工大学 Method for detecting human face based on the multiple dimensioned dense Connection Neural Network of cascade

Also Published As

Publication number Publication date
CN111368707A (en) 2020-07-03

Similar Documents

Publication Publication Date Title
CN111368707B (en) Face detection method, system, device and medium based on feature pyramid and dense block
US10891537B2 (en) Convolutional neural network-based image processing method and image processing apparatus
CN108875732B (en) Model training and instance segmentation method, device and system and storage medium
CN110991311A (en) Target detection method based on dense connection deep network
CN111046821B (en) Video behavior recognition method and system and electronic equipment
CN110008853B (en) Pedestrian detection network and model training method, detection method, medium and equipment
CN111400535A (en) Lightweight face recognition method, system, computer device and storage medium
US20220406090A1 (en) Face parsing method and related devices
CN116152254B (en) Industrial leakage target gas detection model training method, detection method and electronic equipment
CN112132279A (en) Convolutional neural network model compression method, device, equipment and storage medium
WO2022213395A1 (en) Light-weighted target detection method and device, and storage medium
KR20210100592A (en) Face recognition technology based on heuristic Gaussian cloud transformation
CN109657711A (en) A kind of image classification method, device, equipment and readable storage medium storing program for executing
CN116311462A (en) Facial image restoration and recognition method combining context information and VGG19
CN111860056B (en) Blink-based living body detection method, blink-based living body detection device, readable storage medium and blink-based living body detection equipment
CN109886317B (en) General image aesthetic evaluation method, system and equipment based on attention mechanism
CN115063833A (en) Machine room personnel detection method based on image layered vision
CN110348453B (en) Object detection method and system based on cascade connection, storage medium and terminal
CN111352926A (en) Data processing method, device, equipment and readable storage medium
CN117351307A (en) Model training method, device, equipment and storage medium
WO2024011859A1 (en) Neural network-based face detection method and device
CN111353577B (en) Multi-task-based cascade combination model optimization method and device and terminal equipment
CN109583406B (en) Facial expression recognition method based on feature attention mechanism
CN116563800A (en) Method and system for detecting vehicles in tunnel based on lightweight YOLOv3
CN113378722B (en) Behavior identification method and system based on 3D convolution and multilevel semantic information fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant