CN116228702A - Camouflage target detection method based on attention mechanism and convolutional neural network - Google Patents
Camouflage target detection method based on attention mechanism and convolutional neural network Download PDFInfo
- Publication number
- CN116228702A CN116228702A CN202310157199.8A CN202310157199A CN116228702A CN 116228702 A CN116228702 A CN 116228702A CN 202310157199 A CN202310157199 A CN 202310157199A CN 116228702 A CN116228702 A CN 116228702A
- Authority
- CN
- China
- Prior art keywords
- camouflage
- edge
- camouflage target
- scale
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 39
- 230000007246 mechanism Effects 0.000 title claims abstract description 35
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 21
- 230000004927 fusion Effects 0.000 claims abstract description 33
- 238000000605 extraction Methods 0.000 claims abstract description 31
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 28
- 238000000034 method Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 11
- 230000002776 aggregation Effects 0.000 claims abstract description 4
- 238000004220 aggregation Methods 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims description 8
- 239000000284 extract Substances 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 claims description 2
- 230000008447 perception Effects 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 14
- 230000011218 segmentation Effects 0.000 description 8
- 238000011176 pooling Methods 0.000 description 7
- 238000005728 strengthening Methods 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000003709 image segmentation Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 241000122205 Chamaeleonidae Species 0.000 description 2
- 230000002708 enhancing effect Effects 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000035945 sensitivity Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 244000025254 Cannabis sativa Species 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 241000282320 Panthera leo Species 0.000 description 1
- 102100040363 UDP-glucose:glycoprotein glucosyltransferase 1 Human genes 0.000 description 1
- 101150060067 Uggt1 gene Proteins 0.000 description 1
- 208000027418 Wounds and injury Diseases 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 230000003213 activating effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 244000213578 camo Species 0.000 description 1
- 235000009120 camo Nutrition 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 230000006378 damage Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 208000014674 injury Diseases 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/13—Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/42—Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Quality & Reliability (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a camouflage target detection method based on an attention mechanism and a convolutional neural network, which belongs to the camouflage target detection field and specifically comprises the following steps: inputting the training set image into a backbone network to extract multi-scale features containing camouflage target images; respectively inputting features output by Stage3, stage4 and Stage5 of the backbone network into a carry position perception cyclic convolution module to output global features; extracting edge contour information of a camouflage target by using an edge extraction module to obtain an edge prediction graphAnd by camouflaging the edge tag of the objectFor a pair ofPerforming boundary supervision; fusing the obtained global features and the edge contour information, and then performing multi-scale feature fusion to obtain multi-scale aggregation featuresAnd obtaining a disguised target prediction graph, and performing deep supervision on the disguised target through a binary label graph of the disguised target. The method for detecting the camouflage target can comprehensively sense the camouflage target and refine the boundary outline of the camouflage target, and improves the detection performance of the camouflage target.
Description
Technical Field
The invention relates to a camouflage target detection method based on an attention mechanism and a convolutional neural network, and belongs to the field of camouflage target detection.
Background
In nature, many organisms have the camouflage property, and chameleon can adjust the color according to the surrounding environment so as to achieve the camouflage purpose; the lion "camouflage" the body in the grass and opportunistically wait for the approaching of the prey; the butterfly lies on the trunk with similar color to the butterfly to avoid the injury of natural enemy. Biologists refer to such camouflage as background matching, i.e., animals, to avoid being identified, try to change their own color to "perfectly" blend into the surrounding environment. Therefore, compared with the general target detection and the salient target detection, the method has obvious difference between the target and the background, and can be easily distinguished by human eyes in normal cases, and the high similarity between the camouflage target and the background in the camouflage target detection makes the detection of the camouflage target more challenging.
The boundary between the boundary of the camouflage target and the background is quite fuzzy and difficult to distinguish, the camouflage object is difficult to accurately position without introducing additional prior information, the literature "Camou fl aged object segmentation with distraction mining In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, june 2021" proposes that the concept of PFNet interference elimination is introduced into the camouflage object segmentation task, a new mining strategy is developed for the discovery and removal of an interference area so as to help the segmentation of the camouflage object, but the boundary information of the camouflage object is not concerned, and the complete boundary of the camouflage object cannot be accurately segmented; literature Camou fl, object detection, in CVPR, 2020, proposes that SINet increases receptive field by receptive field module (RFB) to promote segmentation of camouflage targets, but RFB can only enhance local receptive field, cannot acquire global features, enhance global receptive field, and cannot acquire global context information; the chinese patent of patent No. CN113468996a discloses a camouflage object detection method based on edge refinement, which considers the prior information of the edge, but only considers global average pooling in the attention mechanism of the edge refinement module, resulting in losing a large amount of available different frequency information. In view of the above, the capability of the existing camouflage target detection algorithm needs to be improved.
Disclosure of Invention
Aiming at the problems, the invention provides a camouflage target detection method based on an attention mechanism and a convolutional neural network, which is characterized in that edge contour information is effectively extracted by using an edge extraction module, different frequency component information is extracted by a frequency channel attention component Fcanet introduced by an edge enhancement module through two-dimensional discrete cosine transform DCT to be combined, various characteristic information can be captured and fused with the extracted global characteristic, boundary representation is enhanced, a multi-scale attention mechanism is introduced in a camouflage target detection network model to effectively aggregate multi-scale characteristics, the comprehensive perception of the camouflage target is realized, the boundary contour of the camouflage target is refined, and the detection performance of the camouflage target is improved.
The technical scheme adopted for solving the technical problems is as follows:
a camouflage target detection method based on an attention mechanism and a convolutional neural network specifically comprises the following steps:
s1, dividing an image data set of a camouflage target into a training set and a testing set;
s2, training set imagesThe multi-scale feature containing the camouflage target image is extracted from a backbone network input to a pre-constructed camouflage target detection network model>,/>Resolution of +.>;
S3, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone networkThe input positions are respectively input to a position sensing circular convolution module PARCM to output global features;
s4, extracting edge contour information of the camouflage target by utilizing an edge extraction moduleFurther, an edge prediction map is obtained>And by camouflaging the edge tag of the object +.>For->Performing boundary supervision;
s5, effectively fusing the global features obtained in the step S3 with the edge contour information obtained in the step S4, and then fusing the multi-scale featuresObtaining multi-scale polymeric features;
S6, carrying out multi-scale polymerization on the characteristics obtained in the step S5Processing to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target;
s7, testing the set imageAnd obtaining a final camouflage target detection result as input of the trained camouflage target detection network model.
Further, the backbone network extracts multi-scale features containing camouflage target images using an Efficient Net-B4 model in the Efficient Net series.
Further, the position-aware cyclic convolution module PARCM in the step S3 includes a position-aware cyclic convolution component ParC and a channel attention component, where the position-aware cyclic convolution component ParC uses global cyclic convolution GCC to extract global features.
Further, in step S4, edge contour information of the camouflage target is extracted by using an edge extraction module, so as to obtain an edge prediction graphAnd by camouflaging the edge tag of the object +.>For->Specific contents for performing boundary supervision include: the edge extraction module EEM is utilized to output low-level features of the feature extraction layer Stage2 of the backbone network>Advanced semantic feature ++Stage 5 output from feature extraction layer>Carrying out fusion extraction on edge contour information of a camouflage target, and obtaining a binary image by outputting an EEM through a normalization function Sigmoid>Will->Upsampling four times to obtain an edge prediction map +.>By camouflage the edge tag of the object->For->Boundary supervision is performed using an edge loss function of +.>。
Further, a position embedding (Position Embedding) strategy is introduced in the position aware cyclic convolution module PARCM.
Further, nonlinear characteristics are introduced into the channel attention component of the position sensing circular convolution module PARCM through a feedforward neural network (FFN), and a channel attention mechanism SE Block is added after the FFN to highlight key channels.
Further, a residual connection is introduced in the position-aware cyclic convolution module PARCM.
Further, after the global feature obtained in the step S3 and the edge contour information obtained in the step S4 are effectively fused, multi-scale feature fusion is then performed to obtain multi-scale aggregate featuresThe specific contents of (3) include: edge contour information +_f from edge extraction module EEM using edge enhancement module ERM>Global feature +.>Fusion to give the feature->The method comprises the steps of carrying out a first treatment on the surface of the Then, multi-scale feature fusion is carried out by introducing a multi-scale attention mechanism MSAM into a multi-scale fusion module MSFM, wherein the feature +.>The multi-scale feature fusion is carried out through a multi-scale fusion module MSFM to obtain features->Then the feature->And features->Feature fusion is carried out through a multi-scale fusion module MSFM to obtain multi-scale aggregation features +.>. For camouflage data sets, the size of camouflage objects is generally variable, and the multi-scale attention mechanism MSAM has strong adaptability to camouflage targets with different scales, so that the multi-scale features can be effectively fused.
Furthermore, a frequency channel attention component FcaNet is introduced into the edge strengthening module ERM, different frequency component information is extracted through two-dimensional discrete cosine transformation DCT and combined, and compared with single global average pooling operation, various characteristic information can be captured, and edge details are enhanced.
Further, in step S6, the structured loss function selected by performing depth supervision on the disguised target through the binary label graph is weighted binary cross lossAnd weighted cross ratio loss->;
Further, the total loss function of the camouflage target detection modelThe formula of (2) is as follows:
wherein ,representing structural loss, < >>Represents edge loss, ++> and />The weight factors of the structuring loss and the edge loss are respectively; />Is a predicted camouflage target saliency map, +.>Representing a predicted camouflage target edge profile; />A label representing the saliency of a camouflage object,>an edge tag representing a camouflage object; />For weighted binary cross-loss in structured loss, < >>For weighted cross-ratio loss in structured loss, +.>For edge loss using the Dice coefficient.
Compared with the prior art, the technical scheme of the invention has the following technical effects:
according to the invention, additional edge priori information is introduced and depth supervision is performed, a frequency channel attention component FcNet is introduced in an edge strengthening module, different frequency component information is extracted through two-dimensional discrete cosine transform DCT and combined, and compared with single global average pooling operation, multiple feature information can be captured and fused with global features extracted through global cyclic convolution GCC, and boundary representation can be enhanced.
The position perception cyclic convolution module of the camouflage target detection network model can effectively extract global features by introducing global cyclic convolution GCC to obtain global receptive fields, and the problems of strong locality and insufficient global property of a convolutional neural network are solved, so that global context information is obtained; meanwhile, a position embedding strategy is introduced into the position sensing circular convolution module, position information is injected into the output feature map, the sensitivity of the output feature to the space position is ensured, a channel attention mechanism is introduced, and key channels are highlighted; in addition, the invention introduces a multi-scale attention mechanism in the camouflage target detection network model, and can effectively fuse multi-scale context information, thereby improving the camouflage target detection performance.
Drawings
FIG. 1 is a flowchart showing the overall implementation of the method for detecting a camouflage target based on an attention mechanism and a convolutional neural network;
fig. 2 is a flowchart of an implementation of a position aware cyclic convolution module PARCM according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a calculation process of global cyclic convolution GCC in a position aware cyclic convolution module PARCM according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating an implementation of the edge extraction module EEM according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating an implementation of the edge enhancement module ERM according to an embodiment of the present invention;
FIG. 6 is a flow chart of an implementation of the frequency channel attention component FcNet according to an embodiment of the present invention;
FIG. 7 is a flowchart of an implementation of a multi-scale attention mechanism MSAM according to an embodiment of the present invention;
FIG. 8 is a flowchart of an implementation of a multi-scale fusion module MSFM according to an embodiment of the present invention;
FIG. 9 is a graph showing the comparison of five main stream camouflage target segmentation models in the comparison test of the present invention with the image segmentation results of different camouflage target data sets by the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.
Fig. 1 is a flowchart of an implementation of the method for detecting the camouflage target based on the attention mechanism and the convolutional neural network, and as shown in fig. 1, the camouflage target detection network model of the invention comprises a main network, a position sensing and convolutional module PARCM, an edge extraction module EEM, an edge reinforcement module ERM and a multi-scale fusion module MSFM. Wherein the backbone network extracts multi-scale features containing camouflage target images using an Efficient Net-B4 model in an Efficient Net seriesDue to camouflage purposeThe input sizes of the images of the target training set are different, so that the EfficientNet is selected as a backbone network, and three dimensions of network depth, network width and resolution can be balanced, so that the feature extraction capability is effectively improved; the position sensing circular convolution module PARCM is used for extracting global features and enhancing receptive fields; the edge extraction module EEM is used for edge extraction and outputting edge contour information; the edge enhancement module ERM is used for effectively fusing the edge contour information output by the edge extraction module EEM and the features enhanced by the PARCM receptive field through the position perception cyclic convolution module, and enhancing boundary representation; the multi-scale fusion module MSFM effectively fuses the multi-scale features by introducing a multi-scale attention mechanism MSAM.
The camouflage target detection method based on the attention mechanism and the convolutional neural network comprises the following steps:
dividing an image data set of a camouflage target into a training set and a testing set
Step two, training set images are obtainedInputting the multi-scale characteristics of the camouflage target image into a backbone network EfficientNet of a pre-constructed camouflage target detection network model>,Resolution of +.>, wherein /> and />Respectively representing the height and width of the feature map, +.>Representing the resolution scaling factor.
Step three, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone networkInput position sensing circular convolution module PARCM outputs global features
In this embodiment, the feature extraction layers Stage3, stage4, stage5 of the backbone network output featuresInput position sensing circular convolution module PARCM outputs +.>. The implementation flow chart of the position-aware cyclic convolution module PARCM in the embodiment of the invention is shown in fig. 2, and the position-aware cyclic convolution module PARCM is composed of a position-aware cyclic convolution component ParC and a channel attention component, wherein the position-aware cyclic convolution component ParC uses global cyclic convolution GCC, the global cyclic convolution GCC comprises a horizontal direction GCC-H and a vertical direction GCC-V, global features can be extracted from all input positions through joint use of GCC-H and GCC-V, and a global receptive field is obtained; the size and the input of the convolution kernel in the GCC are consistent, and position information is injected into the output characteristic diagram through a position embedding (Position Embedding) strategy, so that the sensitivity of the output characteristic to the spatial position is ensured, and the interference of the use of the cyclic convolution to the spatial structure is reduced. Meanwhile, in the channel attention component, nonlinear characteristics are introduced through a feedforward neural network (FFN), and a channel attention mechanism SE Block is added after the FFN, so that a key channel can be highlighted, and a characteristic channel with little use for the current task can be restrained. In addition, the position-aware cyclic convolution module PARCM introduces residual connection, and can perceive global and local context information at the same time. Where PW-Conv is DW convolution, which acts to adjust the input dimension, and Pre-Norm uses batch normalization operations.
FIG. 3 is a schematic diagram illustrating a calculation process of a global circular convolution GCC in a position aware circular convolution module PARCM according to an embodiment of the present invention, as shown in FIG. 3, an inputBy the calculation process of GCC-H and GCC-V, for simplicity of representation, assume inputsOnly one channel, the corresponding shape is +.>,/>The pixel at the position is subject to the output +.>Can be calculated by the following formula:
wherein Is a basic position embedded information with dimension +.>The method comprises the steps of carrying out a first treatment on the surface of the By bilinear interpolation function->Obtain instance location embedding information +.>Dimension is->The method comprises the steps of carrying out a first treatment on the surface of the By means of a vertical extension function>Through the input vector along +.>Directional replication->Second, generate->PE matrix of size, i.e.)>Is +.>Therefore, the position embedding PE can flexibly adapt to input features with different sizes; embedding instance location information->And input->Superposition to obtain the characteristic->The method comprises the steps of carrying out a first treatment on the surface of the Features->Along->Directional stacking, the characteristic dimension obtained is +.>Standard convolution with global receptive field and capable of parameter sharing can thus be achieved; next by constructing the convolution kernel parameter +.>Also by a size ofBasic quantity->By bilinear interpolation function->Obtaining the product with the size of +.>Is->At this time, the spatial receptive field of the convolution kernel is +.>Spatial receptive field matched with GCC-H convolution kernel>Approximating a global overlay; will->Input of embedded position information->Performing standard convolution operations, i.e.)>In this procedure, GCC-V is +.>Output feature coordinates of convolution kernel>In fact corresponds to the coordinate range +.>, wherein />Is 0 to->The values between, different->Corresponding to the relative coordinates in the local neighborhood covered by the convolution kernel, the modulo operation corresponds to a loopThe operations, namely corresponding to global circular convolution, can be realized by specific codesTo realize the method. GCC-H is in the same way->The output at can be expressed as:
wherein ,/>The input vector is extended in the vertical direction as an extension function. In summary, the receptive fields of the GCC-H and GCC-V outputs can cover the same columns and the same rows of all input positions, and global features can be extracted from all input positions by using the GCC-H and the GCC-V jointly to obtain the global receptive fields.
Step four, extracting edge contour information of the camouflage target by using an edge extraction module EEMFurther, an edge prediction map is obtained>And by camouflaging the edge tag of the object +.>For->Boundary supervision
FIG. 4 is a flowchart illustrating an implementation of the edge extraction module EEM according to an embodiment of the present invention, as shown in FIG. 4, the edge extraction module EEM obtains an edge prediction graphThe specific contents of (3) include: first use->Convolutions of (a) will feature->And features->The number of channels is compressed and then the characteristic +.>Upsampling to and feature->After the same size, the two are spliced together and then passed through oneIs fused by convolution of one +.>Normalized by convolution and Sigmiod functions to obtain an edge prediction graphThe edge loss function here is chosen to be +.>. Due to low-level features->The noise in (a) is more, so that the high-level semantic feature (EEM) is introduced into the Edge Extraction Module (EEM)>As an aid. The main function of the edge extraction module EEM is to provide a valuable edge prior for subsequent segmentation, so that the model can better segment the edge contour of the camouflage target.
Step five, effectively fusing the global features obtained in the step three with the edge contour information obtained in the step fourResultant edge enhancement features
FIG. 5 is a flowchart illustrating an implementation of an edge enhancement module ERM according to an embodiment of the present invention, as shown in FIG. 5, the edge enhancement module ERM obtains edge enhancement featuresThe specific contents of (3) include: first of all the edge profile is->Downsampling to output feature +/with position aware circular convolution module PARCM>The same size and then +.>Element-wise multiplication is performed and the result is connected with +.>Element-by-element addition followed by a +.>The convolution layer is used for activating functions through batch normalization and ReLU, then introducing a frequency channel attention component FcaNet, and adding the input and the output of the frequency channel attention component element by element through residual connection to obtain boundary strengthening characteristics->。
The edge strengthening module ERM introduces the frequency channel attention component FcaNet, fig. 6 is a flowchart of the implementation of the frequency channel attention component FcaNet, as shown in fig. 6, in which FcaNet extracts different frequency component information through two-dimensional discrete cosine transform DCT to combine, and compared with a single global average pooling operation, various feature information can be captured, and edge details are enhanced.
The two-dimensional discrete cosine transform DCT is calculated as follows:
in the formula For the width and height of the image +.>Is->The value of the pixel point, frequency component +.>The method can be regarded as the weighted sum of each input point in the image, the cosine part is equivalent to the weight, and the conversion from the space domain to the frequency domain can be realized through the formula. Wherein the lowest frequency component->Is that
Since global average pooling can be expressed asSo the lowest frequency component +.>And->Proportional to the ratio.
the image signal is composed of global average pooling term GAP and other frequency componentsPixel value +.>Can be defined by all frequency components at->The channel attention component such as SENet only considers GAP (lowest frequency component information) to lose a large amount of available information, fcaNet extracts different frequency component information through two-dimensional discrete cosine transform DCT and effectively combines the information, so that various characteristic information can be obtained, and characteristic representation and edge detail are enhanced.
Step six, the edge strengthening characteristic obtained in the step five is obtainedCarrying out multi-scale feature fusion to obtain multi-scale polymerized features +.>
The multi-scale fusion module MSFM introduces a multi-scale attention mechanism MSAM, and FIG. 7 is a flow chart for implementing the multi-scale attention mechanism MSAM, as shown in FIG. 7, the MSAM is composed of two branches, one branch uses a global average pooling layer and two point-by-point convolutions to obtain global context information so as to highlight a large object in global distribution, and the other branch maintains the original feature size, and only two point-by-point convolutions are adopted to obtain local context information, so that small objects are prevented from being ignored. And finally, splicing the two branches, and obtaining a multi-scale channel attention coefficient through a Sigmoid function. The contributions of the features of different layers to the task are different, and the fusion of the multi-layer features can complement each other to obtain comprehensive feature expression. Fig. 8 is a flowchart of an implementation of the MSFM module according to an embodiment of the present invention, where the formula is as follows:
wherein ,is characterized by-> and />Results after MSFM by a multiscale fusion module, < >>Representing a multiscale attention mechanism MSAM component, < ->Representing low-level features, ++>Representing high-level features->High-level features representing up-sampling twice +.>,/>Representing element addition operations, ++>Representing the element multiplication operation.
The low-level features and the high-level features are fused to obtainThen input a +.>Convolutional layer, then batch normalized and ReLU activatedThe function outputs the fused multi-scale features. In particular implementations, the present embodiment allows for lower level features of greater spatial resolution to require more computing resources than high level semantic features and to contribute less to the performance of the model. Based on this observation, only the advanced features +.>Feature fusion is performed in which->Feature fusion is carried out through a multi-scale fusion module MSFM to obtain +.>Then ∈> and />Feature fusion is carried out through a multi-scale fusion module MSFM to obtainFeatures of different scales have different contributions to tasks, and feature fusion of multiple scales can be mutually complemented through a multi-scale fusion module MSFM so as to obtain comprehensive feature representation.
And seventhly, convolving the multi-scale aggregation features obtained in the step six to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target.
By mixing the above obtained materialsBy a +.>Convolution to obtain a single-channel gray level image, 8 times up-sampling to obtain a final camouflage target prediction image, and obtaining a binary label image of the camouflage target +.>Performing depth supervision, calculating its structured loss, and selecting a loss function as binary cross loss +.>And weighted cross ratio loss->. The total loss function of the final model is:
wherein , and />Is the weight factor of each loss, +.in experimental simulation> and /> Taking 1 and 3 respectively, < > and->Representing structural loss, < >>Represents edge loss, ++>Is a predicted camouflage target saliency map, +.>Representing a predicted camouflage target edge map +.>A label representing the saliency of a camouflage object,>an edge tag representing a camouflage object.
Finally, using test set imageAnd obtaining a final camouflage target detection result as input of the trained camouflage target detection network model.
To verify the effectiveness of the present invention, five main stream camouflage target segmentation models UGTR, MGL-R, PFNet, EGNet, SINet were tested on the camouflage target data set CAMO, CHAMELEON, COD10K, NC K partial image segmentation results and compared with the camouflage target prediction results of the present invention.
The experimental simulation content of the camouflage target detection method based on the attention mechanism and the convolutional neural network is as follows:
the experimental platform adopts a 64-bit Ubuntu system, the version of the system is 20.04.4, the model of the GPU is GeForce RTX 2080 Ti, python is used as a programming language, the version of the system is Python 3.8, and PyCharm is used as a software development platform; the model is realized by using a deep learning framework Pytorch 1.4, and the input image is a 3-channel RGB image with the size ofIn the training stage, the batch size (batch size) is 24, the iteration round (epoch) is 40, the parameters are optimized by using an AdaX optimizer, the initial learning rate is set to 3e-4, the initial learning rate is continuously adjusted to the power of 0.9 by adopting a poly strategy, and the whole network training time is about 1 hour under the acceleration of three GeForce RTX 2080 Ti GPUs. FIG. 9 is a graph showing the comparison of five main stream camouflage target segmentation models in the comparison test of the present invention with the image segmentation results of different camouflage target data sets by the method of the present inventionWherein->The method and the device can effectively detect the camouflage target by comparing the camouflage target significance labels and Our, refine the edge outline of the camouflage target and improve the accuracy of camouflage target detection.
The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any person skilled in the art will make any equivalent substitution or modification to the technical solution and technical content disclosed in the invention without departing from the scope of the technical solution of the invention, and the technical solution of the invention is not departing from the scope of the invention.
Claims (10)
1. The camouflage target detection method based on the attention mechanism and the convolutional neural network is characterized by comprising the following steps of:
s1, dividing an image data set of a camouflage target into a training set and a testing set;
s2, inputting the training set image into a backbone network of a pre-constructed camouflage target detection network model to extract multi-scale features containing the camouflage target image;
S3, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone networkThe input positions are respectively input to a position sensing circular convolution module PARCM to output global features;
s4, extracting edge contour information of the camouflage target by using an edge extraction module EEMFurther, an edge prediction map is obtained>And by camouflaging the edge tag of the object +.>For->Performing boundary supervision;
s5, effectively fusing the global features obtained in the step S3 with the edge contour information obtained in the step S4, and then fusing the multi-scale features to obtain multi-scale aggregated features;
S6, carrying out multi-scale polymerization on the characteristics obtained in the step S5Processing to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target;
s7, taking the test set image as the input of the camouflage target detection network model after training, and obtaining a final camouflage target detection result.
2. The method for camouflage target detection based on an attention mechanism and a convolutional neural network of claim 1, wherein the backbone network uses an EfficientNet-B4 model in the EfficientNet series to extract multi-scale features containing camouflage target images.
3. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network as set forth in claim 2, wherein in step S4, the edge extraction module EEM is utilized to extract edge contour information of the camouflage target, thereby obtaining an edge prediction graphAnd by camouflaging the edge tag of the object +.>For->Specific contents for performing boundary supervision include: the edge extraction module EEM is utilized to output low-level features of the feature extraction layer Stage2 of the backbone network>Advanced semantic feature ++Stage 5 output from feature extraction layer>Carrying out fusion extraction on edge contour information of a camouflage target, and obtaining a binary image by outputting an EEM through a normalization function Sigmoid>Will->Upsampling four times to obtain an edge prediction map +.>By camouflage the edge tag of the object->For->Boundary supervision is performed using an edge loss function of +.>。
4. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein the position-aware cyclic convolution module PARCM in the step S3 includes a position-aware cyclic convolution component ParC and a channel attention component, wherein the position-aware cyclic convolution component ParC extracts global features by using global cyclic convolution GCC, and introduces residual connection in the position-aware cyclic convolution module PARCM.
5. The method for camouflage target detection based on an attention mechanism and a convolutional neural network according to claim 4, wherein a position embedding strategy is introduced in a position aware cyclic convolution module PARCM.
6. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 4, wherein in the channel attention component, nonlinear characteristics are introduced through a feedforward neural network FFN, and a channel attention mechanism SE Block is added after FFN to highlight a key channel.
7. The method for detecting the camouflage target based on the attention mechanism and the convolutional neural network according to claim 1, wherein the global features obtained in the step S3 and the edge contour information obtained in the step S4 are effectively fused, and then multi-scale feature fusion is performed to obtain multi-scale aggregated featuresThe specific contents of (3) include: edge contour information +_f from edge extraction module EEM using edge enhancement module ERM>Global feature +.>Fusion to give the feature->The method comprises the steps of carrying out a first treatment on the surface of the Then introducing a multi-scale attention mechanism MSAM into a multi-scale fusion module MSFM for multi-scale operationScale feature fusion, wherein the feature->The multi-scale feature fusion is carried out through a multi-scale fusion module MSFM to obtain features->Then the feature->And features->Feature fusion is carried out through an MSFM multi-scale fusion module to obtain multi-scale aggregation features +.>。
8. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein a frequency channel attention component FcaNet is introduced into an edge enhancement module ERM, and different frequency component information is extracted and combined through a two-dimensional discrete cosine transform DCT.
9. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein the structured loss function selected by deep supervision of the camouflage target by its binary label map in step S6 is a weighted binary cross lossAnd weighted cross ratio loss->。
10. An attention mechanism and convolutional neural network based on claim 1A method for detecting a camouflage target, characterized in that the total loss function of a camouflage target detection modelThe formula of (2) is as follows:
wherein ,representing structural loss, < >>Represents edge loss, ++> and />The weight factors of the structuring loss and the edge loss are respectively; />Is a predicted camouflage target saliency map, +.>Representing a predicted camouflage target edge profile; />A label representing the saliency of a camouflage object,>an edge tag representing a camouflage object; />For weighted binary cross-loss in structured loss, < >>For weighted cross-ratio loss in structured loss, +.>For edge loss using the Dice coefficient. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310157199.8A CN116228702A (en) | 2023-02-23 | 2023-02-23 | Camouflage target detection method based on attention mechanism and convolutional neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310157199.8A CN116228702A (en) | 2023-02-23 | 2023-02-23 | Camouflage target detection method based on attention mechanism and convolutional neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116228702A true CN116228702A (en) | 2023-06-06 |
Family
ID=86578159
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310157199.8A Pending CN116228702A (en) | 2023-02-23 | 2023-02-23 | Camouflage target detection method based on attention mechanism and convolutional neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116228702A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116664990A (en) * | 2023-08-01 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Camouflage target detection method, model training method, device, equipment and medium |
CN116703950A (en) * | 2023-08-07 | 2023-09-05 | 中南大学 | Camouflage target image segmentation method and system based on multi-level feature fusion |
CN116894943A (en) * | 2023-07-20 | 2023-10-17 | 深圳大学 | Double-constraint camouflage target detection method and system |
CN117173523A (en) * | 2023-08-04 | 2023-12-05 | 山东大学 | Camouflage target detection method and system based on frequency perception |
CN117422939A (en) * | 2023-12-15 | 2024-01-19 | 武汉纺织大学 | Breast tumor classification method and system based on ultrasonic feature extraction |
CN118072001A (en) * | 2024-04-22 | 2024-05-24 | 西南科技大学 | Camouflage target detection method based on scale feature perception and extensive perception convolution |
-
2023
- 2023-02-23 CN CN202310157199.8A patent/CN116228702A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116894943A (en) * | 2023-07-20 | 2023-10-17 | 深圳大学 | Double-constraint camouflage target detection method and system |
CN116664990A (en) * | 2023-08-01 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Camouflage target detection method, model training method, device, equipment and medium |
CN116664990B (en) * | 2023-08-01 | 2023-11-14 | 苏州浪潮智能科技有限公司 | Camouflage target detection method, model training method, device, equipment and medium |
CN117173523A (en) * | 2023-08-04 | 2023-12-05 | 山东大学 | Camouflage target detection method and system based on frequency perception |
CN117173523B (en) * | 2023-08-04 | 2024-04-09 | 山东大学 | Camouflage target detection method and system based on frequency perception |
CN116703950A (en) * | 2023-08-07 | 2023-09-05 | 中南大学 | Camouflage target image segmentation method and system based on multi-level feature fusion |
CN116703950B (en) * | 2023-08-07 | 2023-10-20 | 中南大学 | Camouflage target image segmentation method and system based on multi-level feature fusion |
CN117422939A (en) * | 2023-12-15 | 2024-01-19 | 武汉纺织大学 | Breast tumor classification method and system based on ultrasonic feature extraction |
CN117422939B (en) * | 2023-12-15 | 2024-03-08 | 武汉纺织大学 | Breast tumor classification method and system based on ultrasonic feature extraction |
CN118072001A (en) * | 2024-04-22 | 2024-05-24 | 西南科技大学 | Camouflage target detection method based on scale feature perception and extensive perception convolution |
CN118072001B (en) * | 2024-04-22 | 2024-06-21 | 西南科技大学 | Camouflage target detection method based on scale feature perception and extensive perception convolution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xie et al. | Multilevel cloud detection in remote sensing images based on deep learning | |
Ren et al. | Deep texture-aware features for camouflaged object detection | |
CN116228702A (en) | Camouflage target detection method based on attention mechanism and convolutional neural network | |
CN110378381B (en) | Object detection method, device and computer storage medium | |
El Amin et al. | Convolutional neural network features based change detection in satellite images | |
Rezaeilouyeh et al. | Microscopic medical image classification framework via deep learning and shearlet transform | |
CN112750140B (en) | Information mining-based disguised target image segmentation method | |
Zhao et al. | Multi-scale image block-level F-CNN for remote sensing images object detection | |
Lee et al. | Accurate traffic light detection using deep neural network with focal regression loss | |
Wen et al. | Gcsba-net: Gabor-based and cascade squeeze bi-attention network for gland segmentation | |
Woźniak et al. | Graphic object feature extraction system based on cuckoo search algorithm | |
Fang et al. | SAR-optical image matching by integrating Siamese U-Net with FFT correlation | |
Wang et al. | Semantic segmentation of remote sensing ship image via a convolutional neural networks model | |
CN113191489B (en) | Training method of binary neural network model, image processing method and device | |
CN113468996A (en) | Camouflage object detection method based on edge refinement | |
Shen et al. | ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection | |
Krasilenko et al. | Modeling of biologically motivated self-learning equivalent-convolutional recurrent-multilayer neural structures (BLM_SL_EC_RMNS) for image fragments clustering and recognition | |
Sadanandan et al. | Feature augmented deep neural networks for segmentation of cells | |
Kang et al. | ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation | |
Santana et al. | Oceanic mesoscale eddy detection and convolutional neural network complexity | |
Yildirim et al. | Ship detection in optical remote sensing images using YOLOv4 and Tiny YOLOv4 | |
Chen et al. | Attention-based hierarchical fusion of visible and infrared images | |
Gao et al. | Multi-scale learning based segmentation of glands in digital colonrectal pathology images | |
Zhao et al. | Multitask learning for SAR ship detection with Gaussian-mask joint segmentation | |
Ataş | Performance Evaluation of Jaccard-Dice Coefficient on Building Segmentation from High Resolution Satellite Images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |