CN112163520B - MDSSD face detection method based on improved loss function - Google Patents

MDSSD face detection method based on improved loss function Download PDF

Info

Publication number
CN112163520B
CN112163520B CN202011047720.5A CN202011047720A CN112163520B CN 112163520 B CN112163520 B CN 112163520B CN 202011047720 A CN202011047720 A CN 202011047720A CN 112163520 B CN112163520 B CN 112163520B
Authority
CN
China
Prior art keywords
mdssd
loss function
network
frame
face detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011047720.5A
Other languages
Chinese (zh)
Other versions
CN112163520A (en
Inventor
王智文
安晓宁
王宇航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lichu Education Technology Wuhan Co ltd
Original Assignee
Guangxi University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangxi University of Science and Technology filed Critical Guangxi University of Science and Technology
Priority to CN202011047720.5A priority Critical patent/CN112163520B/en
Publication of CN112163520A publication Critical patent/CN112163520A/en
Application granted granted Critical
Publication of CN112163520B publication Critical patent/CN112163520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an MDSSD face detection method based on an improved loss function, which comprises the steps that an MDSSD network detects a face region by utilizing a priori frame mechanism and classifies and regresses candidate regions; carrying out clustering analysis on the group Truth frame according to k-means, and searching the optimal prior frame number, size and proportion; and the MDSSD network replaces the Focal loss function with the cross entropy loss function in the classification network, and detects and classifies the human face and the background of the prior frame after the clustering analysis. The method comprises the steps of carrying out cluster analysis on the group Truth frames marked with the human faces to find the optimal prior frame number and proportion of each detection layer; meanwhile, the MDSSD model is trained and is tested and analyzed, and experimental results show that the MDSSD algorithm has higher recall rate on small faces and fuzzy faces compared with the SSD, and the MDSSD algorithm still keeps higher detection speed.

Description

MDSSD face detection method based on improved loss function
Technical Field
The invention relates to the technical field of small face detection, in particular to an MDSSD face detection method based on an improved loss function.
Background
With the rise of deep learning, intelligent analysis technologies related to human faces become the key point and the focus of research in the field of artificial intelligence, new algorithms continuously refresh the scores of tasks related to human faces, the current face recognition technology exceeds the highest level of human beings, and meanwhile, the industrial application related to human faces is the most extensive. For example, applications related to face detection include intelligent security, urban brain, safe driving, and Chinese skynet systems; the related applications of face recognition include face payment, intelligent access control, face attendance, face verification of various intelligent terminal devices and the like, and the face related technology is closely related to the safety of various systems. Meanwhile, the technology related to the human face is also continuously applied to various aspects of life, such as missing children searching, intelligent education and the like. Further, with the improvement of the computing capability of a computer and the application of a 5G network, the cost of data storage and the delay of data transmission are lower and lower, and the application related to the human face is deployed on more and more intelligent terminals, so that the intelligent society is really realized and the human is benefited. The face detection is that the intelligent terminal judges whether a face exists on an input image and finds out the position of the face. The precondition of the face detection technology is that the face can be accurately detected without being influenced by the background of the face image. Therefore, human face detection is widely concerned by researchers as a basic and core technology of human face related tasks.
The human face detection model based on the SSD algorithm can quickly and accurately identify the human face in the natural scene image, and meanwhile, the SSD human face detection algorithm has higher detection speed, but the SSD human face detection algorithm still has larger promotion space for the recall rate of small face detection in natural or unnatural scenes, so that a new network MDSSD model needs to be constructed, namely a Mix resolution Single ShotMultiBox Detector is used for human face detection, the MDSSD algorithm improves various defects of the SSD algorithm in the aspect of human face detection, including a model structure, a detection characteristic diagram, parameter configuration, a loss function and the like, and the model is configured by a machine learning method to reduce human experience intervention, so that the detection effect of the model is greatly improved.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, the invention provides an MDSSD face detection method based on an improved loss function, which can solve the problem of low recall rate of small face detection in natural or unnatural scenes.
In order to solve the technical problems, the invention provides the following technical scheme: the MDSSD network detects a face region by using a priori frame mechanism, and classifies and regresses candidate regions; carrying out clustering analysis on the group Truth frame according to k-means, and searching the optimal prior frame number, size and proportion; and the MDSSD network replaces the Focal loss function with the cross entropy loss function in the classification network, and detects and classifies the human face and the background of the prior frame after the clustering analysis.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: the MDSSD network comprises the steps of filling 0 in a deep feature map or a deep fusion layer, combining 3-by-3 convolution to perform deconvolution operation on the filled feature map, and doubling the resolution of the feature map under the condition of ensuring that the receptive field range is unchanged; ensuring that the deconvolution operation output dimension is matched with the shallow fusion feature map dimension by using the number of convolution kernels with the same dimension as the shallow feature map channel dimension; during MDSSD feature fusion, only adding operation is carried out on corresponding positions of the shallow feature map and the deconvolution feature map so as to enhance effective context information; the MDSSD carries out nonlinear mapping by adding an activation layer to a fusion layer, and the fusion layer after activation is used as a final detection characteristic graph.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: the MDSSD network takes SSD as a basic network model; eliminating dropouts of Block6 and Block7 in the SSD network; adding a multilayer fused Mixed layer3 and single layer fused Mixed layer4 and Mixed layer 7; the MDSSD network model also adds an L2Normalization layer to reduce the difference from the detection layer.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: effective receptive fields need to be considered by the prior frame mechanism, and the effective receptive fields comprise that layers in the convolutional neural network are locally connected, so that neurons cannot sense all information of an original image; if the receptive field is larger, the more global information is acquired, namely the more global and high-level semantic features contained in the feature map are abundant; if the neuron receptive field is smaller, the lower the features contained in the feature map, the more local and texture the contained information is.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: the prior frame needs to be matched with the group Truth frame to divide positive and negative samples; if the difference between the size and the proportion of the prior frame and the real Ground Truth frame is larger, the error of calculating the intersection ratio is larger; if the size and the proportion of the prior frame are smaller than the difference between the real group Truth frame, the error of calculating the intersection ratio is smaller.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: performing the cluster analysis using the custom IOU distance as a metric distance, including,
dIOU(box,centroid)=1-IOU(box,centroid)
the clustering loss is the IOU distance between the group Truth frame and the cluster center, and if the IOU distance is smaller, the IOU value is larger; defining cluster number k and initializing cluster center at random (W)i,Hi) I ∈ {1,2, …, k }, where Wi,HiRespectively representing the length and width of the cluster center; placing the cluster center and the center of the group Truth frame at a coordinate origin and calculating the IOU distance between each group Truth frame and the cluster; distributing the group Truth frame as a cluster with the minimum IOU distance, and recalculating the cluster center after all the group Truth frames are distributed; and continuously updating until the cluster center is not changed any more, and taking the median of the cluster center as the final prior frame size and proportion.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: and determining the optimal cluster number by using an elbow strategy, wherein when k is 17, the loss function slowly descends and tends to be stable, and the optimal cluster number is determined to be 17 by comprehensively considering the settings of all detection layers.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: the loss function may include a function of the loss,
Figure GDA0003434028550000031
wherein x is a sample label, y' is a model output value, alpha is a sample balance factor, and gamma is a sample weight adjustment factor.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: also included is that when x is 1, i.e. the input is a positive sample, the larger the predictor, the easier the sample is to classify, and the smaller the sample weight.
As a preferred scheme of the MDSSD face detection method based on the improved loss function, the present invention further includes: further included, the sample balance factor α may adjust the specific gravity of the positive and negative samples in the loss function, and α -0.25 and γ -2 are set during the model training process.
The invention has the beneficial effects that: the invention aims at improving SSD network structure, loss function, model presetting and the like based on the defects of an SSD algorithm in face detection, such as unbalanced samples, low classification confidence coefficient, low recall rate of small face detection and the like, and provides an MDSSD algorithm which redesigns the network structure and a detection module, advances detection layers, and performs cluster analysis on a group Truth frame marked with a face to find the optimal prior frame number and proportion of each detection layer; meanwhile, the MDSSD model is trained and is tested and analyzed, and experimental results show that the MDSSD algorithm has higher recall rate on small faces and fuzzy faces compared with the SSD, and the MDSSD algorithm still keeps higher detection speed.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
fig. 1 is a schematic flowchart of an MDSSD face detection method based on an improved loss function according to an embodiment of the invention;
FIG. 2 is a schematic diagram of a transposed convolution of an MDSSD face detection method based on an improved loss function according to an embodiment of the present invention;
FIG. 3 is a single-layer feature fusion diagram of the MDSSD face detection method based on the modified loss function according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an MDSSD according to an embodiment of the invention, illustrating an MDSSD face detection method based on an improved loss function;
fig. 5 is a schematic diagram of a cluster Elbow of the MDSSD face detection method based on the improved loss function according to an embodiment of the present invention;
fig. 6 is a schematic diagram illustrating a visualization of a clustering result of the MDSSD face detection method based on the improved loss function according to an embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to fig. 1 to 4, a first embodiment of the present invention provides an MDSSD face detection method based on an improved loss function, including:
s1: the MDSSD network detects the face region and classifies and regression the candidate regions using a priori box mechanism. It should be noted that the MDSSD network includes:
performing 0 filling on the deep characteristic map or the deep fusion layer, performing deconvolution operation on the filled characteristic map by combining 3-by-3 convolution, and doubling the resolution of the characteristic map under the condition of ensuring that the receptive field range is unchanged;
the number of convolution kernels with the same dimensionality as the shallow feature map channel is used for ensuring that the output dimensionality of the deconvolution operation is matched with the dimensionality of the shallow fusion feature map;
during MDSSD feature fusion, only the corresponding positions of the shallow feature map and the deconvolution feature map are subjected to addition operation to enhance effective context information;
and the MDSSD carries out nonlinear mapping by adding an activation layer to the fusion layer, and takes the activated fusion layer as a final detection characteristic map.
Further, the method also comprises the following steps:
the MDSSD network takes SSD as a basic network model;
eliminating dropouts of Block6 and Block7 in the SSD network;
adding a multilayer fused Mixed layer3 and single layer fused Mixed layer4 and Mixed layer 7;
the MDSSD network model also adds an L2Normalization layer to reduce the difference from the detection layer.
S2: and carrying out clustering analysis on the group Truth frame according to k-means, and searching the optimal prior frame number, size and proportion. It should be noted that in this step, the prior frame mechanism needs to consider the effective reception field, which includes:
the layers in the convolutional neural network are locally connected, so that neurons cannot sense all information of an original image;
if the receptive field is larger, the more global information is acquired, namely the more global and high-level semantic features contained in the feature map are abundant;
if the neuron receptive field is smaller, the lower the feature contained in the feature map is, the more local and texture information is contained;
the prior frame needs to be matched with a group Truth frame to divide positive and negative samples;
if the difference between the size and the proportion of the prior frame and the real Ground Truth frame is larger, the error of calculating the intersection ratio is larger;
if the difference between the size and the proportion of the prior frame and the real Ground Truth frame is smaller, the error of calculating the intersection ratio is smaller.
Specifically, the clustering analysis is performed by using the user-defined IOU distance as the measurement distance, and comprises the following steps:
dIOU(box,centroid)=1-IOU(box,centroid)
the clustering loss is the IOU distance between the group Truth frame and the cluster center, and if the IOU distance is smaller, the IOU value is larger;
defining cluster number k and initializing cluster center at random (W)i,Hi) I ∈ {1,2, …, k }, where Wi,HiRespectively representing the length and width of the cluster center;
placing the cluster center and the center of the group Truth frame at the origin of coordinates and calculating the IOU distance between each group Truth frame and the cluster;
distributing the group Truth frames into clusters with the minimum IOU distance, and recalculating cluster centers after all the group Truth frames are distributed;
and continuously updating until the cluster center is not changed, and taking the median of the cluster center as the final prior frame size and proportion.
S3: and the MDSSD network replaces the cross entropy loss function in the classification network with the Focal loss, and detects and classifies the face and the background of the prior frame after the cluster analysis. It should be further noted that the loss function includes:
Figure GDA0003434028550000071
wherein x is a sample label, y' is a model output value, alpha is a sample balance factor, and gamma is a sample weight adjustment factor;
determining the optimal cluster number by using an elbow strategy, when k is 17, the loss function slowly descends and tends to be stable, and the optimal cluster number is 17 by comprehensively considering the settings of all detection layers;
when x is 1, namely the input is a positive sample, the larger the predicted value is, the easier the sample is classified, and the smaller the sample weight is;
the sample balance factor α can adjust the specific gravity of the positive and negative samples in the loss function, and α -0.25 and γ -2 are set during the model training process.
Referring to fig. 2, the inverse process of convolution operation is called transposed convolution or deconvolution, which is a special upsampling convolution operation with learnable parameters, and the transposed convolution is the inverse process of convolution, i.e. the forward propagation and backward propagation processes of the two operations are reciprocal; however, the inverse process only means that the transposed convolution can only restore the size of the input feature map but cannot restore the feature value of the original feature map, so the maximum use of the transposed convolution is upsampling; the convolution operation with the step length larger than 1 is equidistant downsampling, so that the size of the output feature graph is smaller than that of the input feature graph, and the transposed convolution uses the convolution with the step length smaller than 1 to perform upsampling, so that the size of the feature graph is increased; the traditional method for realizing up-sampling is to apply interpolation or manual creation rules, and the transposed convolution is to make the network learn proper transformation from data without human intervention; in the transposed convolution implementation process, firstly, s-1 0 s are inserted into a feature unit of an input feature graph to serve as a new input feature graph, then convolution operation is carried out on the feature graph after interpolation, when the size of the input feature graph is i multiplied by i, the transposed convolution step size is s, the size of a convolution kernel is k multiplied by k, and the filling size is p, the size of a transposed convolution output feature graph is s (i-1) + k-2p, and the low-resolution feature graph is subjected to up-sampling through the transposed convolution in an MDSSD algorithm, so that fusion of shallow texture features and high-level semantic features is achieved.
Referring to fig. 3, the SSD network sets a plurality of detection feature maps, and detects faces of different sizes from different feature maps, the shallow feature map is suitable for small face detection because of its smaller receptive field, and the resolution of the feature map is continuously reduced as the receptive field is continuously enlarged with the increase of the number of network convolution layers, so the deep feature map is more suitable for large face detection; in the deep convolutional neural network, a shallow feature map contains abundant semantic features and is limited by the resolution and information of a small face, so that the detection of the small face is a challenging task; the shallow feature map in the deep convolutional neural network has high resolution and contains more low-level textural features, but the feature extraction is not rich, so that the shallow feature map contains fewer semantic features and more noise, while the deep feature map is subjected to a plurality of convolutional operations, so that the extracted semantic information is rich, but the perception capability of the deep feature map on some low-level features such as textures and the like is poor, so that the MDSSD algorithm improves the detection of a small face by introducing context information, namely the performance of face detection is improved by fusing a plurality of feature layers; the detection capability of the model can be obviously improved by introducing context information layer by layer, the semantic information of the shallow feature map is enriched, but a large amount of noise can be introduced by introducing excessive context information, so that the detection of the low-resolution small face is influenced, therefore, the embodiment designs two feature fusion strategies of multilayer fusion and single-layer fusion according to the human face detection task, and only adds a feature fusion module to the shallow detection layer for detecting the small face, the multilayer fusion strategy is used for the feature map of the lower layer, namely, the deconvolution layer of the deep fusion module is fused with the feature map, and only single-layer fusion is carried out on the high-layer feature map, namely, the feature layer is only fused with the deconvolution layer of the next module.
Referring to fig. 4, the MDSSD network uses SSD as a base network, still uses VGG16 as a backbone network and maintains the original number of convolution cores and model structure, but the MDSSD network removes the drop layers of Block6 and Block7 in the SSD network, and at the same time, the network adds two feature fusion modules for face detection, the MDSSD network adds a multi-layer fusion module Mixed _ layer3 and two single-layer fusion modules Mixed _ layer4 and Mixed _ layer7, respectively, since the Conv3_3 layer of VGG16 is located at a shallower layer and the face resolution detected by the layer is lower, if the Conv4_4 layer is fused alone, the useful semantic features cannot be effectively fused, the MDssd network fuses Conv3_3 with the fusion module Mixed _ layer4, wherein the Conv _ layer4 is a convolution layer of Conv 85 4_3 and Block7, thereby realizing the fusion of Conv _ layer 3648 with the fusion module Mixed _ layer7 for face detection, since the Mixed _ layer3 and Mixed _ layer4 have larger data scale due to the earlier position, the MDSSD model adds an L2Normalization layer after the detection module to reduce the difference with the later detection layer and increase the difference between the layer data.
Preferably, in this embodiment, it should be noted that, a candidate region nomination stage is cancelled in a one-stage face detection algorithm similar to the SSD, which greatly improves the face detection speed, but the one-stage face detection algorithm also causes a relatively serious sample imbalance problem; in a one-stage face detection algorithm, an input face image may generate thousands of preselected frames, but only a few of the preselected frames are candidate frames containing real faces, so that a large number of negative samples, namely background areas, exist in a training sample, and the negative samples play a main role in loss reduction in the training process, so that the updating direction of gradient is dominant, and a model cannot well classify the faces and the backgrounds; the MDSSD algorithm uses Focal loss to replace a cross entropy loss function in a classification network, and the Focal loss solves the problems of difficult sample learning and positive and negative sample imbalance in the model training process by adding two balance factors.
Example 2
Referring to fig. 5 and 6, a second embodiment of the present invention, which is different from the first embodiment, provides a verification method of an MDSSD face detection method based on an improved loss function, including:
in this embodiment, it is found by clustering labeled images that there are 4 different proportions of {0.55,0.65,0.75,1} in all cluster centers, when a 300 × 300 face image is input, because the face pose and model data enhancement will cause different scales of face corresponding to different scales of group Truth boxes, the small scale face proportion is close to {0.65,0.75,1}, and the large scale face proportion is close to {0.55,0.65,1 }.
The number of each detection layer is determined by calculating the scale of the group Truth in the center of each cluster, and 17 prior frames are distributed to 7 different detection layers according to the receptive field size of each detection layer, wherein the specific detection layers are set as shown in the following table:
table 1: MDSSD detects the layer parameter configuration table.
Figure GDA0003434028550000091
Figure GDA0003434028550000101
Preferably, in order to better verify and explain the technical effects adopted in the method of the present invention, the present embodiment selects to perform a comparison test with the conventional natural SSD algorithm and the method of the present invention, and compares the test results with a scientific demonstration means to verify the actual effects of the method of the present invention.
In order to verify that the method has higher recall rate and better detection effect compared with the traditional method, the traditional natural SSD algorithm and the method of the invention are adopted to respectively carry out random measurement comparison on the small face in a certain unnatural scene.
And (3) testing environment: (1) DELLTower server, Windows10 operating system, NVIDA GTX1080Ti GUP and Intercore i7-8700@3.20 GHz;
(2) memory 32G and video memory 8G;
(3) both the SSD model and the MDSSD model were implemented using python3.6 based on the tensrflow1.14 framework.
Table 2: and (5) a parameter setting table.
Parameter(s) SSD networks MDSSD network
Backbone network initialization method VGG16 SSD
Batch size (batch size) 32 32
Optimization method Adam Adam
Adam_bate1 0.9 0.9
Adam_bate2 0.999 0.999
Learning rate 0.001 0.001
Learning rate decay rate 0.90 0.90
Number of iterations 50000 50000
Table 3: the recall ratio of the two methods is compared with a result table.
Figure GDA0003434028550000102
Referring to tables 2 and 3, it can be seen visually that the recall rate is compared between the conventional method and the method of the present invention under the same parameter setting condition, the recall rate is gradually reduced with the increase of the number of sample training iterations in the conventional method, while the method of the present invention is kept in a stable state and is always higher than the recall rate of the conventional method, and based on this, the true technical effect of the method of the present invention is verified.
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.

Claims (6)

1. An MDSSD face detection method based on an improved loss function is characterized in that: comprises the steps of (a) preparing a mixture of a plurality of raw materials,
the MDSSD network detects the face area by using a priori frame mechanism and classifies and regresses candidate areas;
carrying out clustering analysis on the group Truth frame according to k-means, and searching the optimal prior frame number, size and proportion;
the MDSSD network replaces Focalloss for a cross entropy loss function in a classification network, and carries out face and background detection classification on the priori frame after the clustering analysis; the MDSSD network includes a network of MDSSD networks,
performing 0 filling on the deep characteristic map or the deep fusion layer, performing deconvolution operation on the filled characteristic map by combining 3-by-3 convolution, and doubling the resolution of the characteristic map under the condition of ensuring that the receptive field range is unchanged;
ensuring that the deconvolution operation output dimension is matched with the shallow fusion feature map dimension by using the number of convolution kernels with the same dimension as the shallow feature map channel dimension;
during MDSSD feature fusion, only adding operation is carried out on corresponding positions of the shallow feature map and the deconvolution feature map so as to enhance effective context information;
the MDSSD carries out nonlinear mapping by adding an activation layer to a fusion layer, and the fusion layer after activation is used as a final detection characteristic graph;
comprises the steps of (a) preparing a mixture of a plurality of raw materials,
the MDSSD network takes SSD as a basic network model;
eliminating dropouts of Block6 and Block7 in the SSD network;
adding a multi-layer fusion Mixedlayer3 and single-layer fusions Mixedlayer4 and Mixedlayer 7;
the MDSSD network model is additionally provided with an L2Normalization layer to reduce the difference with a detection layer;
the prior box mechanism needs to take into account the effective receptive field, including,
the layers in the convolutional neural network are locally connected, so that neurons cannot sense all information of an original image;
if the receptive field is larger, the more global information is acquired, namely the more global and high-level semantic features contained in the feature map are abundant;
if the neuron receptive field is smaller, the lower the feature contained in the feature map is, the more local and texture information is contained;
also comprises the following steps of (1) preparing,
the prior frame needs to be matched with the group Truth frame to divide positive and negative samples;
if the difference between the size and the proportion of the prior frame and the real Ground Truth frame is larger, the error of calculating the intersection ratio is larger;
if the size and the proportion of the prior frame are smaller than the difference between the real group Truth frame, the error of calculating the intersection ratio is smaller.
2. The improved loss function based MDSSD face detection method of claim 1, wherein: performing the cluster analysis using the custom IOU distance as a metric distance, including,
dIOU(box,centroid)=1-IOU(box,centroid)
the clustering loss is the IOU distance between the group Truth frame and the cluster center, and if the IOU distance is smaller, the IOU value is larger;
defining cluster number k and initializing cluster center at random (W)i,Hi) I ∈ {1,2, …, k }, where Wi,HiRespectively representing the length and width of the cluster center;
placing the cluster center and the center of the group Truth frame at a coordinate origin and calculating the IOU distance between each group Truth frame and the cluster;
distributing the group Truth frame as a cluster with the minimum IOU distance, and recalculating the cluster center after all the group Truth frames are distributed;
and continuously updating until the cluster center is not changed any more, and taking the median of the cluster center as the final prior frame size and proportion.
3. The improved loss function based MDSSD face detection method of claim 2, wherein: also comprises the following steps of (1) preparing,
and determining the optimal cluster number by using an elbow strategy, wherein when k is 17, the loss function slowly descends and tends to be stable, and the optimal cluster number is 17 by comprehensively considering the settings of all detection layers.
4. The improved loss function based MDSSD face detection method of claim 3, wherein: the loss function may include a function of the loss,
Figure FDA0003434028540000021
wherein x is a sample label, y' is a model output value, alpha is a sample balance factor, and gamma is a sample weight adjustment factor.
5. The improved loss function based MDSSD face detection method of claim 4, wherein: also comprises the following steps of (1) preparing,
when x is 1, i.e. the input is a positive sample, the larger the prediction value, the easier the sample is classified, and the smaller the sample weight.
6. The improved loss function based MDSSD face detection method of claim 5, wherein: also comprises the following steps of (1) preparing,
the sample balance factor α can adjust the specific gravity of the positive and negative samples in the loss function, and α -0.25 and γ -2 are set in the model training process.
CN202011047720.5A 2020-09-29 2020-09-29 MDSSD face detection method based on improved loss function Active CN112163520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011047720.5A CN112163520B (en) 2020-09-29 2020-09-29 MDSSD face detection method based on improved loss function

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011047720.5A CN112163520B (en) 2020-09-29 2020-09-29 MDSSD face detection method based on improved loss function

Publications (2)

Publication Number Publication Date
CN112163520A CN112163520A (en) 2021-01-01
CN112163520B true CN112163520B (en) 2022-02-15

Family

ID=73860556

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011047720.5A Active CN112163520B (en) 2020-09-29 2020-09-29 MDSSD face detection method based on improved loss function

Country Status (1)

Country Link
CN (1) CN112163520B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949614B (en) * 2021-04-29 2021-09-10 成都市威虎科技有限公司 Face detection method and device for automatically allocating candidate areas and electronic equipment
CN113128479B (en) * 2021-05-18 2023-04-18 成都市威虎科技有限公司 Face detection method and device for learning noise region information
CN113111979B (en) * 2021-06-16 2021-09-07 上海齐感电子信息科技有限公司 Model training method, image detection method and detection device
CN113705341A (en) * 2021-07-16 2021-11-26 国家石油天然气管网集团有限公司 Small-scale face detection method based on generation countermeasure network
CN113724219A (en) * 2021-08-27 2021-11-30 重庆大学 Building surface disease detection method and system based on convolutional neural network

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11188794B2 (en) * 2017-08-10 2021-11-30 Intel Corporation Convolutional neural network framework using reverse connections and objectness priors for object detection
CN109800628B (en) * 2018-12-04 2023-06-23 华南理工大学 Network structure for enhancing detection performance of SSD small-target pedestrians and detection method
CN109858547A (en) * 2019-01-29 2019-06-07 东南大学 A kind of object detection method and device based on BSSD
CN110334594A (en) * 2019-05-28 2019-10-15 昆明理工大学 A kind of object detection method based on batch again YOLO algorithm of standardization processing
CN111222534B (en) * 2019-11-15 2022-10-11 重庆邮电大学 Single-shot multi-frame detector optimization method based on bidirectional feature fusion and more balanced L1 loss
CN111126472B (en) * 2019-12-18 2023-07-25 南京信息工程大学 SSD (solid State disk) -based improved target detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MDSSD:multi-scale deconvolutional single shot detector for small objects;Lisha CUI,et al.;《Science China(Information Sciences)》;20200228;第63卷(第02期);全文 *

Also Published As

Publication number Publication date
CN112163520A (en) 2021-01-01

Similar Documents

Publication Publication Date Title
CN112163520B (en) MDSSD face detection method based on improved loss function
CN109859190B (en) Target area detection method based on deep learning
CN112101190B (en) Remote sensing image classification method, storage medium and computing device
CN110929577A (en) Improved target identification method based on YOLOv3 lightweight framework
CN111523521A (en) Remote sensing image classification method for double-branch fusion multi-scale attention neural network
CN113762138B (en) Identification method, device, computer equipment and storage medium for fake face pictures
CN112927253B (en) Rock core FIB-SEM image segmentation method based on convolutional neural network
CN114332578A (en) Image anomaly detection model training method, image anomaly detection method and device
CN113569788B (en) Building semantic segmentation network model training method, system and application method
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN112818849B (en) Crowd density detection algorithm based on context attention convolutional neural network for countermeasure learning
CN111291826A (en) Multi-source remote sensing image pixel-by-pixel classification method based on correlation fusion network
CN112329721A (en) Remote sensing small target detection method with lightweight model design
CN115222998B (en) Image classification method
CN111310598A (en) Hyperspectral remote sensing image classification method based on 3-dimensional and 2-dimensional mixed convolution
CN114972860A (en) Target detection method based on attention-enhanced bidirectional feature pyramid network
CN111340039A (en) Target detection method based on feature selection
CN112070040A (en) Text line detection method for video subtitles
CN113762396A (en) Two-dimensional image semantic segmentation method
CN115830449A (en) Remote sensing target detection method with explicit contour guidance and spatial variation context enhancement
CN116310386A (en) Shallow adaptive enhanced context-based method for detecting small central Net target
CN116403113A (en) Landslide identification method, system, equipment and medium for evolution pruning lightweight convolutional neural network
CN116452810A (en) Multi-level semantic segmentation method and device, electronic equipment and storage medium
CN111222534A (en) Single-shot multi-frame detector optimization method based on bidirectional feature fusion and more balanced L1 loss
CN116958325A (en) Training method and device for image processing model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240520

Address after: Room A12, 26th Floor, Office, 17th to 26th Floor, Yulong Times Center, No. 1540 Heping Avenue, Qingshan District, Wuhan City, Hubei Province, 430080

Patentee after: Lichu Education Technology (Wuhan) Co.,Ltd.

Country or region after: China

Address before: 545006 268 East Ring Road, Central District, Liuzhou, the Guangxi Zhuang Autonomous Region

Patentee before: GUANGXI University OF SCIENCE AND TECHNOLOGY

Country or region before: China