CN112633085A - Human face detection method, system, storage medium and terminal based on attention guide mechanism - Google Patents
Human face detection method, system, storage medium and terminal based on attention guide mechanism Download PDFInfo
- Publication number
- CN112633085A CN112633085A CN202011425736.5A CN202011425736A CN112633085A CN 112633085 A CN112633085 A CN 112633085A CN 202011425736 A CN202011425736 A CN 202011425736A CN 112633085 A CN112633085 A CN 112633085A
- Authority
- CN
- China
- Prior art keywords
- feature map
- feature
- generate
- module
- face detection
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 164
- 230000007246 mechanism Effects 0.000 title claims abstract description 36
- 238000000605 extraction Methods 0.000 claims abstract description 121
- 238000000034 method Methods 0.000 claims abstract description 41
- 238000005096 rolling process Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims description 36
- 238000010586 diagram Methods 0.000 claims description 21
- 238000012549 training Methods 0.000 claims description 20
- 230000004927 fusion Effects 0.000 claims description 12
- 239000011800 void material Substances 0.000 claims description 11
- 230000000750 progressive effect Effects 0.000 claims description 7
- 238000004590 computer program Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 42
- 208000029152 Small face Diseases 0.000 description 10
- 230000006870 function Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 2
- 230000003321 amplification Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000011229 interlayer Substances 0.000 description 2
- 238000003199 nucleic acid amplification method Methods 0.000 description 2
- 238000007500 overflow downdraw method Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/161—Detection; Localisation; Normalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Human Computer Interaction (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a face detection method, a system, a storage medium and a terminal based on an attention guide mechanism, wherein the method comprises the following steps: acquiring a target image to be detected, inputting the target image to be detected into a pre-trained face detection model, firstly, performing feature extraction on the target image to be detected through VGG16 expanded in a rolling block to generate a feature map sequence, and selecting 6 layers from the feature map sequence as a first branch original feature map; a context extraction module of the face detection model performs channel splicing on each feature map in the first branch original feature map to generate a spliced feature map; an attention guide module of the face detection model acquires semantic relations and spatial information corresponding to the spliced feature maps to generate acquired feature maps; generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map; and obtaining the detected face image according to the enhanced feature image. Therefore, by adopting the embodiment of the application, the face detection precision can be improved.
Description
Technical Field
The invention relates to the technical field of deep learning of computers, in particular to a human face detection method, a human face detection system, a storage medium and a terminal based on an attention guide mechanism.
Background
In a face detection task based on deep learning, the difficulty of detecting small targets and small faces is high, and many technical challenges are faced, because the resolution ratio of pictures is low, the pictures are blurred, and background noise is high.
The existing small face detection method mainly comprises the steps of detecting small faces through a traditional image pyramid and a multi-scale sliding window; the data amplification method is used for increasing the number and the types of the small face samples to improve the small face detection performance; based on a feature fusion method, the multi-scale features of a high layer and a low layer are fused to improve the detection performance; a method based on anchor sampling and matching strategies; methods that utilize contextual information, and the like.
Since the context information in the visual task is crucial to performance improvement, many detection algorithms design an interlayer fusion structure for extracting the context information, for example, DenseNet has dense cross-layer connection to realize feature multiplexing, FPN fuses feature information of the upper layer and the bottom layer, and deplab V3 has an ASPP structure to increase the receptive field.
DSFD is used as a double-branch face detection algorithm, and combines the ideas of FPN and RFB, a Feature Enhancement Module (FEM) is provided, so that not only is the feature information between different layers used, but also the features of larger receptive field are obtained by using cavity convolution, and therefore more features with high identification degree and strong robustness are obtained. However, the FEM module only groups and processes the FPN fused feature maps and then splices the feature maps to increase the receptive field, and context features of fine granularity and coarse granularity are not effectively fused, so that the identification precision is reduced.
Disclosure of Invention
The embodiment of the application provides a face detection method, a face detection system, a storage medium and a terminal based on an attention guide mechanism. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
In a first aspect, an embodiment of the present application provides a face detection method based on an attention-guiding mechanism, where the method includes:
acquiring a target image to be detected, and inputting the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module;
performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in a rolling block to generate a feature map sequence, and selecting 6 layers from the feature map sequence as a first branch original feature map;
performing channel splicing on each feature map in the first branch original feature map based on a context extraction module to generate a spliced feature map;
acquiring semantic relations and spatial information corresponding to the spliced feature maps according to an attention guide module, and generating the acquired feature maps;
generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map;
and inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image.
Optionally, the performing channel splicing on each feature map in the first branch original feature map based on the context extraction module to generate a spliced feature map, including:
the context extraction module carries out channel grouping on the first branch original feature map to generate three groups of feature map sequences;
the context extraction module performs feature processing on the three groups of feature graph sequences to generate three groups of feature graph sequences after feature processing;
the context extraction module performs feature fusion again on each feature map in the feature map sequences after the convolution of the three groups of cavities through 1 × 1 convolution parameters to generate three groups of feature map sequences after the convolution again;
and the context extraction module splices the three groups of re-convolved feature map sequences to generate a spliced feature map.
Optionally, the context extraction module performs feature processing on the three sets of feature graph sequences to generate three sets of feature graph sequences after feature processing, including:
the context extraction module adopts different cavity convolution layers to extract multi-scale characteristic information of the face aiming at a first group of the three groups of characteristic channels to generate a first refined characteristic diagram sequence; the void convolution parameter is 3 x 3, and the rolling mark rate of the void convolution is 3;
the context extraction module increases the number of effective feature weights by adopting 1 × 1 convolution aiming at a second group in the three groups of feature channels to generate a second refined feature map sequence;
the context extraction module performs global feature extraction on a third group of the three groups of feature channels to generate a global feature map sequence;
the context extraction module carries out channel splicing on the first refined feature map sequence, the second refined feature map sequence and the global feature map sequence to generate a spliced feature map sequence;
and the context extraction module performs feature fusion on the feature graph sequences after the convolution of the three groups of cavities by adopting 1-by-1 convolution parameters to generate three groups of feature graph sequences after feature processing.
Optionally, the context extraction module performs global feature extraction on a third group of the three groups of feature channels to generate a global feature map sequence, including:
the context extraction module adopts Global Average Pooling (GAP) processing for a third group of the three groups of feature channels to generate a pooled feature graph sequence;
the context extraction module changes the channel dimension of the pooled feature map sequence by adopting 1 × 1 convolution parameters to generate a changed feature map sequence;
and the context extraction module upsamples the changed feature map sequence to the spatial dimension of a preset threshold value to generate a global feature map sequence.
Optionally, the generating the collected feature map according to the semantic relationship and the spatial information corresponding to the feature map collected and spliced by the attention guidance module includes:
the attention guiding module extracts the semantic relation between any two positions in the spliced feature map;
the attention guiding module collects spatial information between any two positions in the spliced characteristic diagram;
and the attention guiding module combines the semantic relation and the spatial information to generate an acquired feature map.
Optionally, the generating a pre-trained face detection model according to the following steps includes:
adopting the expanded convolutional neural network VGG16 to create a backbone network;
adding the volume block and the attention-directed feature enhancement module to the created backbone network to generate a face detection model; wherein the attention guidance feature enhancement module is composed of an attention guidance module (AM) and a Context Extraction Module (CEM);
loading the detection layer sequence of the first branch, taking 6 layers in a backbone network of the face detection model as the detection layer sequence of the first branch, and generating a replaced face detection model;
collecting a training sample with a face image, inputting the training sample with the face image into the replaced face detection model for training, and outputting a progressive anchor loss value of the face detection model;
and when the gradual anchor loss value of the face detection model reaches a preset minimum value, generating the trained face detection model.
Optionally, when the gradual anchor loss value of the face detection model reaches a preset minimum value, a trained face detection model is generated, including:
when the gradual anchor loss value of the face detection model does not reach a preset minimum value, continuing to execute the step of collecting a training sample with a face image; or
And when the training times of the training samples with the face images do not reach the preset times, continuing to execute the step of collecting the training samples with the face images.
In a second aspect, an embodiment of the present application provides a face detection system based on an attention-oriented mechanism, where the system includes:
the image acquisition module is used for acquiring a target image to be detected and inputting the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module;
the first branch original feature map generation module is used for performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in the volume block to generate a feature map sequence, and selecting 6 layers from the feature map sequence as a first branch original feature map;
the first feature map generation module is used for performing channel splicing on each feature map in the first branch original feature map based on the context extraction module to generate a spliced feature map;
the second feature map generation module is used for acquiring semantic relations and spatial information corresponding to the spliced feature maps according to the attention guidance module and generating the acquired feature maps;
the enhanced feature map generation module is used for generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map;
and the face image output module is used for inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image.
In a third aspect, embodiments of the present application provide a computer storage medium having stored thereon a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a terminal, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the embodiment of the application, a human face detection system based on an attention-oriented mechanism firstly acquires a target image to be detected and inputs the target image to be detected into a human face detection model trained in advance; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module; and then, performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in a rolling block to generate a feature map sequence, selecting 6 layers from the feature map sequence as a first branch original feature map, performing channel splicing on each feature map in the first branch original feature map based on a context extraction module to generate a spliced feature map, acquiring semantic relations and spatial information corresponding to the spliced feature maps according to an attention guide module to generate an acquired feature map, generating a second branch enhanced feature map based on the first branch original feature map and the acquired feature map, and finally inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image. Therefore, by adopting the embodiment of the application, after the enhancement is performed through the attention guide module and the context extraction module, the face detection model focuses more on the face features, so that the face detection performance is greatly improved, and the face detection precision is further improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
Fig. 1 is a schematic flowchart of a face detection method based on an attention-directing mechanism according to an embodiment of the present application;
fig. 2 is a schematic network structure diagram of a context extraction module in a face detection network according to an embodiment of the present application;
fig. 3 is a structural diagram of an attention guidance module in an attention guidance feature enhancing module according to an embodiment of the present application;
fig. 4 is a structural diagram of a face detection network structure according to an embodiment of the present application;
fig. 5 is a schematic flowchart of another face detection method based on an attention-oriented mechanism according to an embodiment of the present application;
fig. 6 is a schematic system structure diagram of a face detection system based on an attention-oriented mechanism according to an embodiment of the present application;
fig. 7 is a schematic diagram of a terminal according to an embodiment of the present application.
Detailed Description
The following description and the drawings sufficiently illustrate specific embodiments of the invention to enable those skilled in the art to practice them.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of systems and methods consistent with certain aspects of the invention, as detailed in the appended claims.
In the description of the present invention, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Up to now, the existing small face detection method mainly includes detecting small faces by a traditional image pyramid and a multi-scale sliding window; the data amplification method is used for increasing the number and the types of the small face samples to improve the small face detection performance; based on a feature fusion method, the multi-scale features of a high layer and a low layer are fused to improve the detection performance; a method based on anchor sampling and matching strategies; methods that utilize contextual information, and the like. Since the context information in the visual task is crucial to performance improvement, many detection algorithms design an interlayer fusion structure for extracting the context information, for example, DenseNet has dense cross-layer connection to realize feature multiplexing, FPN fuses feature information of the upper layer and the bottom layer, and deplab V3 has an ASPP structure to increase the receptive field.
DSFD is used as a double-branch face detection algorithm, and combines the ideas of FPN and RFB, a Feature Enhancement Module (FEM) is provided, so that not only is the feature information between different layers used, but also the features of larger receptive field are obtained by using cavity convolution, and therefore more features with high identification degree and strong robustness are obtained. However, the FEM module only groups and processes the FPN fused feature maps and then splices the feature maps to increase the receptive field, and context features of fine granularity and coarse granularity are not effectively fused, so that the identification precision is reduced. Therefore, the present application provides a method, a system, a storage medium and a terminal for detecting a face based on an attention-guiding mechanism, so as to solve the problems in the related art. In the technical scheme provided by the application, after the enhancement is performed through the attention guide module and the context extraction module, the face detection model focuses on the face features more, so that the face detection performance is greatly improved, the face detection precision is further improved, and the following detailed description is performed by adopting an exemplary embodiment.
The following describes in detail a face detection method based on an attention-oriented mechanism according to an embodiment of the present application with reference to fig. 1 to 5. The method may be implemented in dependence on a computer program, operable on an attention-directed mechanism-based face detection system based on the von neumann architecture. The computer program may be integrated into the application or may run as a separate tool-like application.
Referring to fig. 1, a schematic flow chart of a face detection method based on an attention-guiding mechanism is provided in an embodiment of the present application. As shown in fig. 1, the method of the embodiment of the present application may include the following steps:
s101, acquiring a target image to be detected, and inputting the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module;
the target image is an image to be detected, the image comprises one or more faces, and the face image can be a face image acquired in real time or a face image stored in a computer. Either inline or offline.
Generally, a pre-trained face detection model is a mathematical model with a small face detection function, and when the face detection model is trained, firstly, an expanded convolutional neural network VGG16 is adopted to create a backbone network, and then a convolutional block and an attention-directed feature enhancement module are added to the created backbone network to generate the face detection model; the attention guide feature enhancement module is composed of an attention guide module (AM) and a Context Extraction Module (CEM), a detection layer sequence of a first branch is loaded, 6 layers in a backbone network of the face detection model are used as the detection layer sequence of the first branch to generate a replaced face detection model, a training sample with a face image is collected, the training sample with the face image is input into the replaced face detection model to be trained, a progressive anchor loss value of the face detection model is output, and finally the trained face detection model is generated when the progressive anchor loss value of the face detection model reaches a preset minimum value.
Further, when the progressive anchor loss value of the face detection model does not reach a preset minimum value, continuing to execute the step of collecting the training sample with the face image; or when the training times of the training samples with the face images do not reach the preset times, continuing to execute the step of collecting the training samples with the face images.
In particular, the Context Extraction Module (CEM) may utilize rich context information from domains of various sizes; the attention guidance module (AM) may enhance significant contextual dependencies.
In a possible implementation manner, when a face in a target image is detected, a target image with the face is collected by a camera, and then the target image with the face is input into a pre-trained face detection model for processing.
S102, performing feature extraction operation on a target image to be detected by adopting the VGG16 expanded in the volume block to generate a feature map sequence, and selecting 6 layers from the feature map sequence as first branch original feature maps;
in general, a face detection model is provided with a convolution block, a convolution operation is performed on a target image through the convolution block firstly, so that a series of feature maps (future maps) of the target image are generated, and then feature maps of6 layers are selected from the generated series of feature maps as original feature maps of a first branch.
S103, channel splicing is carried out on each feature map in the first branch original feature map based on the context extraction module, and a spliced feature map is generated;
in the embodiment of the application, when channel splicing is performed on each feature map in the first branch original feature map, the context extraction module firstly performs channel grouping on the first branch original feature map to generate three groups of feature map sequences, then performs feature processing on the three groups of feature map sequences to generate three groups of feature map sequences after feature processing, then performs feature fusion again on each feature map in the three groups of feature map sequences after cavity convolution through 1 × 1 convolution parameters to generate three groups of feature map sequences after convolution again, and finally performs splicing processing on the three groups of feature map sequences after convolution again to generate a spliced feature map.
Further, when the context extraction module performs feature processing on the three groups of feature map sequences to generate three groups of feature map sequences after feature processing, the context extraction module firstly extracts multi-scale characteristic information of the face by adopting different hole convolution layers for a first group of the three groups of feature channels to generate a first refined feature map sequence; the void convolution parameter is 3 x 3, and the rolling mark rate of the void convolution is 3; and then, increasing the number of effective feature weights by adopting 1 × 1 convolution for a second group of the three groups of feature channels to generate a second refined feature map sequence, then, carrying out global feature extraction for a third group of the three groups of feature channels to generate a global feature map sequence, carrying out channel splicing on the first refined feature map sequence, the second refined feature map sequence and the global feature map sequence to generate a spliced feature map sequence, and finally, carrying out feature fusion on the three groups of cavity convolved feature map sequences by adopting 1 × 1 convolution parameters to generate three groups of feature processed feature map sequences.
When the first group of the three groups of characteristic channels adopts different hole convolution layers to extract the multi-scale characteristic information of the human face, the first group is divided into 3 groups again, which can be expressed asAnd processing a first group in the 3 groups which are divided again by adopting a cavity convolution, processing a second group by adopting two cavity convolutions, processing a third group by adopting three cavity convolutions, and finally splicing the processed three groups of cavity convolutions to generate a first refined characteristic diagram sequence.
Further, the context extraction module performs global feature extraction on a third group of the three groups of feature channels, and when generating a global feature map sequence, the context extraction module firstly performs Global Average Pooling (GAP) processing on the third group of the three groups of feature channels to generate a pooled feature map sequence, then changes the channel dimension of the pooled feature map sequence by adopting 1 × 1 convolution parameters to generate a changed feature map sequence, and finally upsamples the changed feature map sequence to the spatial dimension of a preset threshold value to generate the global feature map sequence.
S104, collecting semantic relations and spatial information corresponding to the spliced feature maps according to the attention guide module, and generating collected feature maps;
in a possible implementation manner, the attention guiding module firstly extracts the semantic relationship between any two positions in the spliced feature map, then collects the spatial information between any two positions in the spliced feature map, and finally generates the collected feature map after combining the semantic relationship and the spatial information.
For example, as shown in fig. 2, fig. 2 is a schematic network structure diagram of a context extraction module in a face detection network provided in an embodiment of the present application, a target image is first subjected to a convolution operation by a convolution block in face detection based on an attention-oriented mechanism to generate feature maps Fd of multiple layers, then channel grouping is performed by the context extraction module, Fd is divided into three groups to perform void convolution processing, channel splicing is performed after processing is completed, a first feature map is finally output, then channel dimensions are changed by using 1 × 1 convolution parameters to generate a changed feature map sequence, and finally the changed feature map sequence is up-sampled to a spatial dimension of a preset threshold to generate a global feature map sequence. And finally, channel splicing is carried out, and then 1 × 1 convolution parameter processing is carried out, so that a characteristic diagram Fc is obtained.
S105, generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map;
the collected feature map is obtained by performing Global Average Pooling (GAP) on Fc.
In a possible implementation manner, when the acquired feature map is obtained, the acquired feature map and the second branch enhanced feature map are multiplied element by the attention guiding module and then added, and finally, the enhanced feature map of the second branch is generated.
For example, as shown in fig. 3, fig. 3 is a structural diagram of an attention guidance module in an attention guidance feature enhancing module provided in this embodiment of the present application, where FC is a feature map generated after processing by a context extraction module, Fd is a first branch original feature map, after FC and Fd are obtained, FC and Fd are respectively subjected to Global Average Pooling (GAP) processing, then are respectively multiplied element by element, and finally are added element by element to generate a final second branch enhanced feature map Fa.
And S106, inputting the second branch enhancement feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image.
For example, as shown in fig. 4, fig. 4 is a structural diagram of a face detection network structure provided in the present application, first, by extending VGG16 to serve as a base backbone network of DSFD, that is, replacing a full connection layer of VGG16 with another auxiliary convolutional layer. The convolutional layers selected in the present application are the following, respectively:
conv3_3, conv4_3, conv5_3, conv _ fc7, conv6_2, conv7_2 as the detector layer of the first branch to generate 6 original feature maps, which are named of1, of2, of3, of4, of5, of6, then the attention-directed feature enhancement module proposed in this application converts the 6 original feature maps into feature maps of6 attention-directed mechanisms, which are named ef1, ef2, ef3, ef4, ef5, ef6, which have the same size as the corresponding 6 original feature maps, by inputting them to the head of the network SSD type of the face detection model, thereby constructing the detection layer of the second branch. After using the attention-steering mechanism module to enhance the receptive field and the new anchor design strategy, it is in principle unnecessary to let the three sizes (stride, anchor, receptive field) satisfy the equal proportion interval principle. Thus, DSFD is more flexible and also more robust. At the same time, the original detector layer of the first branch and the detector layer that constitutes the second branch have 2 different loss values. Are named as first and second branch progressive anchor loss (FSL) and SSL, respectively.
In the embodiment of the application, a human face detection system based on an attention-oriented mechanism firstly acquires a target image to be detected and inputs the target image to be detected into a human face detection model trained in advance; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module; and then, performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in a rolling block to generate a feature map sequence, selecting 6 layers from the feature map sequence as a first branch original feature map, performing channel splicing on each feature map in the first branch original feature map based on a context extraction module to generate a spliced feature map, acquiring semantic relations and spatial information corresponding to the spliced feature maps according to an attention guide module to generate an acquired feature map, generating a second branch enhanced feature map based on the first branch original feature map and the acquired feature map, and finally inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image. Therefore, by adopting the embodiment of the application, after the enhancement is performed through the attention guide module and the context extraction module, the face detection model focuses more on the face features, so that the face detection performance is greatly improved, and the face detection precision is further improved.
Please refer to fig. 4, which is a flowchart illustrating another method for detecting a face based on an attention-oriented mechanism according to an embodiment of the present disclosure. The face detection method based on the attention guiding mechanism can comprise the following steps:
s201, acquiring a target image to be detected, and inputting the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module;
s202, performing feature extraction operation on a target image to be detected by adopting the VGG16 expanded in the volume block to generate a feature map sequence, and selecting 6 layers from the feature map sequence as a first branch original feature map;
s203, the context extraction module carries out channel grouping on the first branch original feature graph to generate three groups of feature graph sequences;
s204, the context extraction module adopts different cavity convolution layers to extract multi-scale characteristic information of the face aiming at a first group of the three groups of characteristic channels to generate a first refined characteristic diagram sequence; the void convolution parameter is 3 x 3, and the rolling mark rate of the void convolution is 3;
s205, the context extraction module increases the number of effective feature weights by adopting 1 x 1 convolution aiming at a second group in the three groups of feature channels to generate a second refined feature map sequence;
s206, the context extraction module performs global feature extraction on a third group of the three groups of feature channels to generate a global feature map sequence;
s207, the context extraction module carries out channel splicing on the first refined feature map sequence, the second refined feature map sequence and the global feature map sequence to generate a spliced feature map sequence;
s208, the context extraction module performs feature fusion on the feature graph sequences after the convolution of the three groups of cavities by adopting 1 x 1 convolution parameters to generate the feature graph sequences after the convolution of the three groups of cavities;
s209, the context extraction module performs feature fusion again on each feature map in the feature map sequences after the convolution of the three groups of cavities through 1 × 1 convolution parameters to generate three groups of feature map sequences after the convolution again;
s210, splicing the three groups of feature graph sequences which are convoluted again by the context extraction module to generate a spliced feature graph;
s211, collecting semantic relations and spatial information corresponding to the spliced feature maps by an attention guide module, and generating collected feature maps;
s212, the attention guiding module collects semantic relations and spatial information corresponding to the spliced feature maps to generate collected feature maps;
and S213, the attention guiding module collects the semantic relation and the spatial information corresponding to the spliced feature map to generate the collected feature map.
And further, generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map, and inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image.
In the embodiment of the application, a human face detection system based on an attention-oriented mechanism firstly acquires a target image to be detected and inputs the target image to be detected into a human face detection model trained in advance; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module; and then, performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in a rolling block to generate a feature map sequence, selecting 6 layers from the feature map sequence as a first branch original feature map, performing channel splicing on each feature map in the first branch original feature map based on a context extraction module to generate a spliced feature map, acquiring semantic relations and spatial information corresponding to the spliced feature maps according to an attention guide module to generate an acquired feature map, generating a second branch enhanced feature map based on the first branch original feature map and the acquired feature map, and finally inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image. Therefore, by adopting the embodiment of the application, after the enhancement is performed through the attention guide module and the context extraction module, the face detection model focuses more on the face features, so that the face detection performance is greatly improved, and the face detection precision is further improved.
The following are embodiments of systems of the present invention that may be used to perform embodiments of methods of the present invention. For details which are not disclosed in the embodiments of the system of the present invention, reference is made to the embodiments of the method of the present invention.
Referring to fig. 6, a schematic structural diagram of a face detection system based on an attention-oriented mechanism according to an exemplary embodiment of the present invention is shown. The human face detection system based on the attention-oriented mechanism can be realized into all or part of an intelligent robot through software, hardware or a combination of the software and the hardware. The system 1 comprises an image acquisition module 10, a first branch original feature map generation module 20, a first feature map generation module 30, a second feature map generation module 40, an enhanced feature map generation module 50 and a face image output module 60.
The image acquisition module 10 is configured to acquire a target image to be detected and input the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module;
the first branch original feature map generation module 20 is configured to perform feature extraction operation on a target image to be detected by using the VGG16 expanded in the volume block to generate a feature map sequence, and select 6 layers from the feature map sequence as a first branch original feature map;
a first feature map generation module 30, configured to perform channel splicing on each feature map in the first branch original feature map based on the context extraction module, and generate a spliced feature map;
the second feature map generation module 40 is configured to generate an acquired feature map according to the semantic relationship and the spatial information corresponding to the feature map acquired by the attention guidance module after splicing;
an enhanced feature map generation module 50, configured to generate a second branch enhanced feature map based on the first branch original feature map and the acquired feature map;
and the face image output module 60 is configured to input the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model, and obtain a detected face image.
It should be noted that, when the face detection system based on the attention-oriented mechanism provided in the foregoing embodiment executes the face detection method based on the attention-oriented mechanism, only the division of the functional modules is illustrated, and in practical applications, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the face detection system based on the attention-oriented mechanism provided by the above embodiment and the face detection method embodiment based on the attention-oriented mechanism belong to the same concept, and the detailed implementation process thereof is referred to the method embodiment and is not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
In the embodiment of the application, a human face detection system based on an attention-oriented mechanism firstly acquires a target image to be detected and inputs the target image to be detected into a human face detection model trained in advance; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module; and then, performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in a rolling block to generate a feature map sequence, selecting 6 layers from the feature map sequence as a first branch original feature map, performing channel splicing on each feature map in the first branch original feature map based on a context extraction module to generate a spliced feature map, acquiring semantic relations and spatial information corresponding to the spliced feature maps according to an attention guide module to generate an acquired feature map, generating a second branch enhanced feature map based on the first branch original feature map and the acquired feature map, and finally inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image. Therefore, by adopting the embodiment of the application, after the enhancement is performed through the attention guide module and the context extraction module, the face detection model focuses more on the face features, so that the face detection performance is greatly improved, and the face detection precision is further improved.
The present invention also provides a computer readable medium, on which program instructions are stored, and the program instructions, when executed by a processor, implement the method for detecting a face based on an attention-oriented mechanism provided by the above-mentioned method embodiments.
The present invention also provides a computer program product containing instructions which, when run on a computer, cause the computer to perform the method for detecting a face based on an attention-directed mechanism of the above-mentioned method embodiments.
Please refer to fig. 7, which provides a schematic structural diagram of a terminal according to an embodiment of the present application. As shown in fig. 7, terminal 1000 can include: at least one processor 1001, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002.
Wherein a communication bus 1002 is used to enable connective communication between these components.
The user interface 1003 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 1003 may also include a standard wired interface and a wireless interface.
The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
The Memory 1005 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 1005 includes a non-transitory computer-readable medium. The memory 1005 may be used to store an instruction, a program, code, a set of codes, or a set of instructions. The memory 1005 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 1005 may optionally be at least one memory system located remotely from the processor 1001. As shown in fig. 7, a memory 1005, which is a type of computer storage medium, may include an operating system, a network communication module, a user interface module, and a face detection application based on an attention-directed mechanism.
In the terminal 1000 shown in fig. 7, the user interface 1003 is mainly used as an interface for providing input for a user, and acquiring data input by the user; and the processor 1001 may be configured to invoke the attention-directed mechanism-based face detection application stored in the memory 1005, and specifically perform the following operations:
acquiring a target image to be detected, and inputting the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module;
performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in a rolling block to generate a feature map sequence, and selecting 6 layers from the feature map sequence as a first branch original feature map;
performing channel splicing on each feature map in the first branch original feature map based on a context extraction module to generate a spliced feature map;
acquiring semantic relations and spatial information corresponding to the spliced feature maps according to an attention guide module, and generating the acquired feature maps;
generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map;
and inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image.
In an embodiment, when the processor 1001 performs channel splicing on each feature map in the first branch original feature map based on the context extraction module to generate a spliced feature map, the following operations are specifically performed:
the context extraction module carries out channel grouping on the first branch original feature map to generate three groups of feature map sequences;
the context extraction module performs feature processing on the three groups of feature graph sequences to generate three groups of feature graph sequences after feature processing;
the context extraction module performs feature fusion again on each feature map in the feature map sequences after the convolution of the three groups of cavities through 1 × 1 convolution parameters to generate three groups of feature map sequences after the convolution again;
and the context extraction module splices the three groups of re-convolved feature map sequences to generate a spliced feature map.
In an embodiment, when the processor 1001 executes the context extraction module to perform feature processing on the three sets of feature map sequences, and generates a feature map sequence after the three sets of feature processing, the following operation is specifically performed:
the context extraction module adopts different cavity convolution layers to extract multi-scale characteristic information of the face aiming at a first group of the three groups of characteristic channels to generate a first refined characteristic diagram sequence; the void convolution parameter is 3 x 3, and the rolling mark rate of the void convolution is 3;
the context extraction module increases the number of effective feature weights by adopting 1 × 1 convolution aiming at a second group in the three groups of feature channels to generate a second refined feature map sequence;
the context extraction module performs global feature extraction on a third group of the three groups of feature channels to generate a global feature map sequence;
the context extraction module carries out channel splicing on the first refined feature map sequence, the second refined feature map sequence and the global feature map sequence to generate a spliced feature map sequence;
and the context extraction module performs feature fusion on the feature graph sequences after the convolution of the three groups of cavities by adopting 1-by-1 convolution parameters to generate three groups of feature graph sequences after feature processing.
In an embodiment, when the processor 1001 executes the context extraction module to perform global feature extraction on a third group of the three groups of feature channels and generate a global feature map sequence, the following operations are specifically performed:
the context extraction module adopts Global Average Pooling (GAP) processing for a third group of the three groups of feature channels to generate a pooled feature graph sequence;
the context extraction module changes the channel dimension of the pooled feature map sequence by adopting 1 × 1 convolution parameters to generate a changed feature map sequence;
and the context extraction module upsamples the changed feature map sequence to the spatial dimension of a preset threshold value to generate a global feature map sequence.
In an embodiment, when the processor 1001 performs the following operation when acquiring the semantic relationship and the spatial information corresponding to the spliced feature map according to the attention guidance module and generating the acquired feature map:
the attention guiding module extracts the semantic relation between any two positions in the spliced feature map;
the attention guiding module collects spatial information between any two positions in the spliced characteristic diagram;
and the attention guiding module combines the semantic relation and the spatial information to generate an acquired feature map.
In the embodiment of the application, a human face detection system based on an attention-oriented mechanism firstly acquires a target image to be detected and inputs the target image to be detected into a human face detection model trained in advance; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention guidance feature enhancement module comprises an attention guidance module and a context extraction module; and then, performing feature extraction operation on a target image to be detected by adopting VGG16 expanded in a rolling block to generate a feature map sequence, selecting 6 layers from the feature map sequence as a first branch original feature map, performing channel splicing on each feature map in the first branch original feature map based on a context extraction module to generate a spliced feature map, acquiring semantic relations and spatial information corresponding to the spliced feature maps according to an attention guide module to generate an acquired feature map, generating a second branch enhanced feature map based on the first branch original feature map and the acquired feature map, and finally inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image. Therefore, by adopting the embodiment of the application, after the enhancement is performed through the attention guide module and the context extraction module, the face detection model focuses more on the face features, so that the face detection performance is greatly improved, and the face detection precision is further improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware that is related to instructions of a computer program, and the program can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.
Claims (10)
1. A face detection method based on an attention-guiding mechanism is characterized by comprising the following steps:
acquiring a target image to be detected, and inputting the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention-directed feature enhancement module comprises an attention-directed module and a context extraction module;
performing feature extraction operation on the target image to be detected by adopting the VGG16 expanded in the volume block to generate a feature map sequence, and selecting 6 layers from the feature map sequence as a first branch original feature map;
performing channel splicing on each feature map in the first branch original feature map based on the context extraction module to generate a spliced feature map;
acquiring semantic relation and spatial information corresponding to the spliced feature map according to an attention guide module, and generating an acquired feature map;
generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map;
and inputting the second branch enhancement feature map into an SSD target detection algorithm head of a face detection model to obtain a detected face image.
2. The method according to claim 1, wherein the channel-based stitching of each feature map in the first branch raw feature map by the context extraction module to generate a stitched feature map comprises:
the context extraction module carries out channel grouping on the first branch original feature map to generate three groups of feature map sequences;
the context extraction module performs feature processing on the three groups of feature graph sequences to generate three groups of feature graph sequences after feature processing;
the context extraction module performs feature fusion again on each feature map in the feature map sequence after the three groups of cavities are convoluted through 1 × 1 convolution parameters to generate three groups of feature map sequences after the convolution again;
and the context extraction module splices the three groups of re-convolved feature map sequences to generate a spliced feature map.
3. The method of claim 2, wherein the context extraction module performs feature processing on the three sets of feature map sequences to generate three sets of feature-processed feature map sequences, and comprises:
the context extraction module adopts different cavity convolution layers to extract multi-scale characteristic information of the face aiming at a first group in the three groups of characteristic channels to generate a first refined characteristic diagram sequence; wherein the void convolution parameter is 3 x 3, and the rolling mark rate of the void convolution is 3;
the context extraction module increases the number of effective feature weights by adopting 1-by-1 convolution aiming at a second group in the three groups of feature channels to generate a second refined feature map sequence;
the context extraction module performs global feature extraction on a third group of the three groups of feature channels to generate a global feature map sequence;
the context extraction module carries out channel splicing on the first refined feature map sequence, the second refined feature map sequence and the global feature map sequence to generate a spliced feature map sequence;
and the context extraction module performs feature fusion on the feature graph sequences after the convolution of the three groups of cavities by adopting 1-by-1 convolution parameters to generate three groups of feature graph sequences after feature processing.
4. The method of claim 3, wherein the context extraction module performs global feature extraction on a third group of the three groups of feature channels to generate a global feature map sequence, comprising:
the context extraction module adopts Global Average Pooling (GAP) processing to the third group of the three groups of feature channels to generate a pooled feature map sequence;
the context extraction module changes the channel dimension of the pooled feature map sequence by adopting 1 × 1 convolution parameters to generate a changed feature map sequence;
and the context extraction module samples the changed characteristic diagram sequence to the space dimension of a preset threshold value to generate a global characteristic diagram sequence.
5. The method according to claim 1, wherein the collecting semantic relationships and spatial information corresponding to the spliced feature map according to the attention-directed module to generate a collected feature map comprises:
the attention guiding module extracts the semantic relation between any two positions in the spliced feature map;
the attention guiding module collects spatial information between any two positions in the spliced characteristic diagram;
and the attention guiding module combines the semantic relation and the spatial information to generate an acquired feature map.
6. The method of claim 1, wherein generating a pre-trained face detection model comprises:
adopting the expanded convolutional neural network VGG16 to create a backbone network;
adding a rolling block and an attention-directed feature enhancement module to the created backbone network to generate a face detection model; wherein the attention-directed feature enhancement module consists of an attention-directed module (AM) and a Context Extraction Module (CEM);
loading a detection layer sequence of a first branch, taking 6 layers in a backbone network of the face detection model as the detection layer sequence of the first branch, and generating a replaced face detection model;
collecting a training sample with a face image, inputting the training sample with the face image into the replaced face detection model for training, and outputting a progressive anchor loss value of the face detection model;
and when the gradual anchor loss value of the face detection model reaches a preset minimum value, generating a trained face detection model.
7. The method according to claim 6, wherein the generating the trained face detection model when the progressive anchor loss value of the face detection model reaches a preset minimum value comprises:
when the gradual anchor loss value of the face detection model does not reach a preset minimum value, continuing to execute the step of collecting the training sample with the face image; or
And when the training times of the training samples with the face images do not reach the preset times, continuing to execute the step of collecting the training samples with the face images.
8. A human face detection system based on an attention-directed mechanism, the system comprising:
the image acquisition module is used for acquiring a target image to be detected and inputting the target image to be detected into a pre-trained face detection model; the human face detection model comprises a volume block and an attention guide feature enhancement module; the attention-directed feature enhancement module comprises an attention-directed module and a context extraction module;
a first branch original feature map generation module, configured to perform feature extraction operation on the target image to be detected by using the VGG16 expanded in the volume block to generate a feature map sequence, and select 6 layers from the feature map sequence as a first branch original feature map;
the first feature map generation module is used for performing channel splicing on each feature map in the first branch original feature map based on the context extraction module to generate a spliced feature map;
the second feature map generation module is used for acquiring semantic relations and spatial information corresponding to the spliced feature maps according to the attention guidance module and generating acquired feature maps;
the enhanced feature map generation module is used for generating a second branch enhanced feature map based on the first branch original feature map and the collected feature map;
and the face image output module is used for inputting the second branch enhanced feature map into the SSD target detection algorithm head of the face detection model to obtain a detected face image.
9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to carry out the method steps according to any one of claims 1 to 7.
10. A terminal, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011425736.5A CN112633085B (en) | 2020-12-08 | 2020-12-08 | Attention-oriented mechanism-based face detection method, system, storage medium and terminal |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011425736.5A CN112633085B (en) | 2020-12-08 | 2020-12-08 | Attention-oriented mechanism-based face detection method, system, storage medium and terminal |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112633085A true CN112633085A (en) | 2021-04-09 |
CN112633085B CN112633085B (en) | 2024-08-02 |
Family
ID=75308652
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011425736.5A Active CN112633085B (en) | 2020-12-08 | 2020-12-08 | Attention-oriented mechanism-based face detection method, system, storage medium and terminal |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112633085B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113538313A (en) * | 2021-07-22 | 2021-10-22 | 深圳大学 | Polyp segmentation method and device, computer equipment and storage medium |
CN117115928A (en) * | 2023-08-29 | 2023-11-24 | 北京国旺盛源智能终端科技有限公司 | Rural area network co-construction convenience service terminal based on multiple identity authentications |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197182A (en) * | 2019-06-11 | 2019-09-03 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic segmentation method based on contextual information and attention mechanism |
CN111274994A (en) * | 2020-02-13 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Cartoon face detection method and device, electronic equipment and computer readable medium |
CN111461089A (en) * | 2020-06-17 | 2020-07-28 | 腾讯科技(深圳)有限公司 | Face detection method, and training method and device of face detection model |
CN111898617A (en) * | 2020-06-29 | 2020-11-06 | 南京邮电大学 | Target detection method and system based on attention mechanism and parallel void convolution network |
-
2020
- 2020-12-08 CN CN202011425736.5A patent/CN112633085B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110197182A (en) * | 2019-06-11 | 2019-09-03 | 中国电子科技集团公司第五十四研究所 | Remote sensing image semantic segmentation method based on contextual information and attention mechanism |
CN111274994A (en) * | 2020-02-13 | 2020-06-12 | 腾讯科技(深圳)有限公司 | Cartoon face detection method and device, electronic equipment and computer readable medium |
CN111461089A (en) * | 2020-06-17 | 2020-07-28 | 腾讯科技(深圳)有限公司 | Face detection method, and training method and device of face detection model |
CN111898617A (en) * | 2020-06-29 | 2020-11-06 | 南京邮电大学 | Target detection method and system based on attention mechanism and parallel void convolution network |
Non-Patent Citations (1)
Title |
---|
JIAN LI等: "DSFD: Dual Shot Face Detector", 《ARXIV.ORG》, pages 1 - 3 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113538313A (en) * | 2021-07-22 | 2021-10-22 | 深圳大学 | Polyp segmentation method and device, computer equipment and storage medium |
CN113538313B (en) * | 2021-07-22 | 2022-03-25 | 深圳大学 | Polyp segmentation method and device, computer equipment and storage medium |
CN117115928A (en) * | 2023-08-29 | 2023-11-24 | 北京国旺盛源智能终端科技有限公司 | Rural area network co-construction convenience service terminal based on multiple identity authentications |
CN117115928B (en) * | 2023-08-29 | 2024-03-22 | 北京国旺盛源智能终端科技有限公司 | Rural area network co-construction convenience service terminal based on multiple identity authentications |
Also Published As
Publication number | Publication date |
---|---|
CN112633085B (en) | 2024-08-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111582316B (en) | RGB-D significance target detection method | |
JP7089106B2 (en) | Image processing methods and equipment, electronic devices, computer-readable storage media and computer programs | |
CN112396115B (en) | Attention mechanism-based target detection method and device and computer equipment | |
CN109255352B (en) | Target detection method, device and system | |
CN108876792B (en) | Semantic segmentation method, device and system and storage medium | |
CN111402143B (en) | Image processing method, device, equipment and computer readable storage medium | |
TWI717865B (en) | Image processing method and device, electronic equipment, computer readable recording medium and computer program product | |
CN111739035B (en) | Image processing method, device and equipment based on artificial intelligence and storage medium | |
CN112434721A (en) | Image classification method, system, storage medium and terminal based on small sample learning | |
CN112633077B (en) | Face detection method, system, storage medium and terminal based on in-layer multi-scale feature enhancement | |
CN107368550B (en) | Information acquisition method, device, medium, electronic device, server and system | |
CN110929735B (en) | Rapid significance detection method based on multi-scale feature attention mechanism | |
CN112633085B (en) | Attention-oriented mechanism-based face detection method, system, storage medium and terminal | |
JP6902811B2 (en) | Parallax estimation systems and methods, electronic devices and computer readable storage media | |
CN114416260B (en) | Image processing method, device, electronic equipment and storage medium | |
CN111967515A (en) | Image information extraction method, training method and device, medium and electronic equipment | |
CN112529897A (en) | Image detection method and device, computer equipment and storage medium | |
CN113869282A (en) | Face recognition method, hyper-resolution model training method and related equipment | |
CN112419342A (en) | Image processing method, image processing device, electronic equipment and computer readable medium | |
CN115131281A (en) | Method, device and equipment for training change detection model and detecting image change | |
CN114926734A (en) | Solid waste detection device and method based on feature aggregation and attention fusion | |
CN114511702A (en) | Remote sensing image segmentation method and system based on multi-scale weighted attention | |
CN117078602A (en) | Image stretching recognition and model training method, device, equipment, medium and product | |
CN111967478A (en) | Feature map reconstruction method and system based on weight inversion, storage medium and terminal | |
CN115223018B (en) | Camouflage object collaborative detection method and device, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |