CN116228702A - Camouflage target detection method based on attention mechanism and convolutional neural network - Google Patents

Camouflage target detection method based on attention mechanism and convolutional neural network Download PDF

Info

Publication number
CN116228702A
CN116228702A CN202310157199.8A CN202310157199A CN116228702A CN 116228702 A CN116228702 A CN 116228702A CN 202310157199 A CN202310157199 A CN 202310157199A CN 116228702 A CN116228702 A CN 116228702A
Authority
CN
China
Prior art keywords
camouflage
edge
camouflage target
scale
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310157199.8A
Other languages
Chinese (zh)
Inventor
朱虎
鲁飞
邓丽珍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310157199.8A priority Critical patent/CN116228702A/en
Publication of CN116228702A publication Critical patent/CN116228702A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a camouflage target detection method based on an attention mechanism and a convolutional neural network, which belongs to the camouflage target detection field and specifically comprises the following steps: inputting the training set image into a backbone network to extract multi-scale features containing camouflage target images; respectively inputting features output by Stage3, stage4 and Stage5 of the backbone network into a carry position perception cyclic convolution module to output global features; extracting edge contour information of a camouflage target by using an edge extraction module to obtain an edge prediction graph
Figure ZY_1
And by camouflaging the edge tag of the object
Figure ZY_2
For a pair of
Figure ZY_3
Performing boundary supervision; fusing the obtained global features and the edge contour information, and then performing multi-scale feature fusion to obtain multi-scale aggregation features
Figure ZY_4
And obtaining a disguised target prediction graph, and performing deep supervision on the disguised target through a binary label graph of the disguised target. The method for detecting the camouflage target can comprehensively sense the camouflage target and refine the boundary outline of the camouflage target, and improves the detection performance of the camouflage target.

Description

Camouflage target detection method based on attention mechanism and convolutional neural network
Technical Field
The invention relates to a camouflage target detection method based on an attention mechanism and a convolutional neural network, and belongs to the field of camouflage target detection.
Background
In nature, many organisms have the camouflage property, and chameleon can adjust the color according to the surrounding environment so as to achieve the camouflage purpose; the lion "camouflage" the body in the grass and opportunistically wait for the approaching of the prey; the butterfly lies on the trunk with similar color to the butterfly to avoid the injury of natural enemy. Biologists refer to such camouflage as background matching, i.e., animals, to avoid being identified, try to change their own color to "perfectly" blend into the surrounding environment. Therefore, compared with the general target detection and the salient target detection, the method has obvious difference between the target and the background, and can be easily distinguished by human eyes in normal cases, and the high similarity between the camouflage target and the background in the camouflage target detection makes the detection of the camouflage target more challenging.
The boundary between the boundary of the camouflage target and the background is quite fuzzy and difficult to distinguish, the camouflage object is difficult to accurately position without introducing additional prior information, the literature "Camou fl aged object segmentation with distraction mining In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, june 2021" proposes that the concept of PFNet interference elimination is introduced into the camouflage object segmentation task, a new mining strategy is developed for the discovery and removal of an interference area so as to help the segmentation of the camouflage object, but the boundary information of the camouflage object is not concerned, and the complete boundary of the camouflage object cannot be accurately segmented; literature Camou fl, object detection, in CVPR, 2020, proposes that SINet increases receptive field by receptive field module (RFB) to promote segmentation of camouflage targets, but RFB can only enhance local receptive field, cannot acquire global features, enhance global receptive field, and cannot acquire global context information; the chinese patent of patent No. CN113468996a discloses a camouflage object detection method based on edge refinement, which considers the prior information of the edge, but only considers global average pooling in the attention mechanism of the edge refinement module, resulting in losing a large amount of available different frequency information. In view of the above, the capability of the existing camouflage target detection algorithm needs to be improved.
Disclosure of Invention
Aiming at the problems, the invention provides a camouflage target detection method based on an attention mechanism and a convolutional neural network, which is characterized in that edge contour information is effectively extracted by using an edge extraction module, different frequency component information is extracted by a frequency channel attention component Fcanet introduced by an edge enhancement module through two-dimensional discrete cosine transform DCT to be combined, various characteristic information can be captured and fused with the extracted global characteristic, boundary representation is enhanced, a multi-scale attention mechanism is introduced in a camouflage target detection network model to effectively aggregate multi-scale characteristics, the comprehensive perception of the camouflage target is realized, the boundary contour of the camouflage target is refined, and the detection performance of the camouflage target is improved.
The technical scheme adopted for solving the technical problems is as follows:
a camouflage target detection method based on an attention mechanism and a convolutional neural network specifically comprises the following steps:
s1, dividing an image data set of a camouflage target into a training set and a testing set;
s2, training set images
Figure SMS_1
The multi-scale feature containing the camouflage target image is extracted from a backbone network input to a pre-constructed camouflage target detection network model>
Figure SMS_2
,/>
Figure SMS_3
Resolution of +.>
Figure SMS_4
S3, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone network
Figure SMS_5
The input positions are respectively input to a position sensing circular convolution module PARCM to output global features;
s4, extracting edge contour information of the camouflage target by utilizing an edge extraction module
Figure SMS_6
Further, an edge prediction map is obtained>
Figure SMS_7
And by camouflaging the edge tag of the object +.>
Figure SMS_8
For->
Figure SMS_9
Performing boundary supervision;
s5, effectively fusing the global features obtained in the step S3 with the edge contour information obtained in the step S4, and then fusing the multi-scale featuresObtaining multi-scale polymeric features
Figure SMS_10
S6, carrying out multi-scale polymerization on the characteristics obtained in the step S5
Figure SMS_11
Processing to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target;
s7, testing the set image
Figure SMS_12
And obtaining a final camouflage target detection result as input of the trained camouflage target detection network model.
Further, the backbone network extracts multi-scale features containing camouflage target images using an Efficient Net-B4 model in the Efficient Net series.
Further, the position-aware cyclic convolution module PARCM in the step S3 includes a position-aware cyclic convolution component ParC and a channel attention component, where the position-aware cyclic convolution component ParC uses global cyclic convolution GCC to extract global features.
Further, in step S4, edge contour information of the camouflage target is extracted by using an edge extraction module, so as to obtain an edge prediction graph
Figure SMS_14
And by camouflaging the edge tag of the object +.>
Figure SMS_16
For->
Figure SMS_19
Specific contents for performing boundary supervision include: the edge extraction module EEM is utilized to output low-level features of the feature extraction layer Stage2 of the backbone network>
Figure SMS_15
Advanced semantic feature ++Stage 5 output from feature extraction layer>
Figure SMS_18
Carrying out fusion extraction on edge contour information of a camouflage target, and obtaining a binary image by outputting an EEM through a normalization function Sigmoid>
Figure SMS_21
Will->
Figure SMS_23
Upsampling four times to obtain an edge prediction map +.>
Figure SMS_13
By camouflage the edge tag of the object->
Figure SMS_17
For->
Figure SMS_20
Boundary supervision is performed using an edge loss function of +.>
Figure SMS_22
Further, a position embedding (Position Embedding) strategy is introduced in the position aware cyclic convolution module PARCM.
Further, nonlinear characteristics are introduced into the channel attention component of the position sensing circular convolution module PARCM through a feedforward neural network (FFN), and a channel attention mechanism SE Block is added after the FFN to highlight key channels.
Further, a residual connection is introduced in the position-aware cyclic convolution module PARCM.
Further, after the global feature obtained in the step S3 and the edge contour information obtained in the step S4 are effectively fused, multi-scale feature fusion is then performed to obtain multi-scale aggregate features
Figure SMS_26
The specific contents of (3) include: edge contour information +_f from edge extraction module EEM using edge enhancement module ERM>
Figure SMS_28
Global feature +.>
Figure SMS_30
Fusion to give the feature->
Figure SMS_25
The method comprises the steps of carrying out a first treatment on the surface of the Then, multi-scale feature fusion is carried out by introducing a multi-scale attention mechanism MSAM into a multi-scale fusion module MSFM, wherein the feature +.>
Figure SMS_29
The multi-scale feature fusion is carried out through a multi-scale fusion module MSFM to obtain features->
Figure SMS_31
Then the feature->
Figure SMS_32
And features->
Figure SMS_24
Feature fusion is carried out through a multi-scale fusion module MSFM to obtain multi-scale aggregation features +.>
Figure SMS_27
. For camouflage data sets, the size of camouflage objects is generally variable, and the multi-scale attention mechanism MSAM has strong adaptability to camouflage targets with different scales, so that the multi-scale features can be effectively fused.
Furthermore, a frequency channel attention component FcaNet is introduced into the edge strengthening module ERM, different frequency component information is extracted through two-dimensional discrete cosine transformation DCT and combined, and compared with single global average pooling operation, various characteristic information can be captured, and edge details are enhanced.
Further, in step S6, the structured loss function selected by performing depth supervision on the disguised target through the binary label graph is weighted binary cross loss
Figure SMS_33
And weighted cross ratio loss->
Figure SMS_34
Further, the total loss function of the camouflage target detection model
Figure SMS_35
The formula of (2) is as follows:
Figure SMS_36
Figure SMS_37
Figure SMS_38
wherein ,
Figure SMS_40
representing structural loss, < >>
Figure SMS_43
Represents edge loss, ++>
Figure SMS_46
and />
Figure SMS_41
The weight factors of the structuring loss and the edge loss are respectively; />
Figure SMS_44
Is a predicted camouflage target saliency map, +.>
Figure SMS_47
Representing a predicted camouflage target edge profile; />
Figure SMS_48
A label representing the saliency of a camouflage object,>
Figure SMS_39
an edge tag representing a camouflage object; />
Figure SMS_42
For weighted binary cross-loss in structured loss, < >>
Figure SMS_45
For weighted cross-ratio loss in structured loss, +.>
Figure SMS_49
For edge loss using the Dice coefficient.
Compared with the prior art, the technical scheme of the invention has the following technical effects:
according to the invention, additional edge priori information is introduced and depth supervision is performed, a frequency channel attention component FcNet is introduced in an edge strengthening module, different frequency component information is extracted through two-dimensional discrete cosine transform DCT and combined, and compared with single global average pooling operation, multiple feature information can be captured and fused with global features extracted through global cyclic convolution GCC, and boundary representation can be enhanced.
The position perception cyclic convolution module of the camouflage target detection network model can effectively extract global features by introducing global cyclic convolution GCC to obtain global receptive fields, and the problems of strong locality and insufficient global property of a convolutional neural network are solved, so that global context information is obtained; meanwhile, a position embedding strategy is introduced into the position sensing circular convolution module, position information is injected into the output feature map, the sensitivity of the output feature to the space position is ensured, a channel attention mechanism is introduced, and key channels are highlighted; in addition, the invention introduces a multi-scale attention mechanism in the camouflage target detection network model, and can effectively fuse multi-scale context information, thereby improving the camouflage target detection performance.
Drawings
FIG. 1 is a flowchart showing the overall implementation of the method for detecting a camouflage target based on an attention mechanism and a convolutional neural network;
fig. 2 is a flowchart of an implementation of a position aware cyclic convolution module PARCM according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a calculation process of global cyclic convolution GCC in a position aware cyclic convolution module PARCM according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating an implementation of the edge extraction module EEM according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating an implementation of the edge enhancement module ERM according to an embodiment of the present invention;
FIG. 6 is a flow chart of an implementation of the frequency channel attention component FcNet according to an embodiment of the present invention;
FIG. 7 is a flowchart of an implementation of a multi-scale attention mechanism MSAM according to an embodiment of the present invention;
FIG. 8 is a flowchart of an implementation of a multi-scale fusion module MSFM according to an embodiment of the present invention;
FIG. 9 is a graph showing the comparison of five main stream camouflage target segmentation models in the comparison test of the present invention with the image segmentation results of different camouflage target data sets by the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.
Fig. 1 is a flowchart of an implementation of the method for detecting the camouflage target based on the attention mechanism and the convolutional neural network, and as shown in fig. 1, the camouflage target detection network model of the invention comprises a main network, a position sensing and convolutional module PARCM, an edge extraction module EEM, an edge reinforcement module ERM and a multi-scale fusion module MSFM. Wherein the backbone network extracts multi-scale features containing camouflage target images using an Efficient Net-B4 model in an Efficient Net series
Figure SMS_50
Due to camouflage purposeThe input sizes of the images of the target training set are different, so that the EfficientNet is selected as a backbone network, and three dimensions of network depth, network width and resolution can be balanced, so that the feature extraction capability is effectively improved; the position sensing circular convolution module PARCM is used for extracting global features and enhancing receptive fields; the edge extraction module EEM is used for edge extraction and outputting edge contour information; the edge enhancement module ERM is used for effectively fusing the edge contour information output by the edge extraction module EEM and the features enhanced by the PARCM receptive field through the position perception cyclic convolution module, and enhancing boundary representation; the multi-scale fusion module MSFM effectively fuses the multi-scale features by introducing a multi-scale attention mechanism MSAM.
The camouflage target detection method based on the attention mechanism and the convolutional neural network comprises the following steps:
dividing an image data set of a camouflage target into a training set and a testing set
Step two, training set images are obtained
Figure SMS_51
Inputting the multi-scale characteristics of the camouflage target image into a backbone network EfficientNet of a pre-constructed camouflage target detection network model>
Figure SMS_52
Figure SMS_53
Resolution of +.>
Figure SMS_54
, wherein />
Figure SMS_55
and />
Figure SMS_56
Respectively representing the height and width of the feature map, +.>
Figure SMS_57
Representing the resolution scaling factor.
Step three, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone network
Figure SMS_58
Input position sensing circular convolution module PARCM outputs global features
In this embodiment, the feature extraction layers Stage3, stage4, stage5 of the backbone network output features
Figure SMS_59
Input position sensing circular convolution module PARCM outputs +.>
Figure SMS_60
. The implementation flow chart of the position-aware cyclic convolution module PARCM in the embodiment of the invention is shown in fig. 2, and the position-aware cyclic convolution module PARCM is composed of a position-aware cyclic convolution component ParC and a channel attention component, wherein the position-aware cyclic convolution component ParC uses global cyclic convolution GCC, the global cyclic convolution GCC comprises a horizontal direction GCC-H and a vertical direction GCC-V, global features can be extracted from all input positions through joint use of GCC-H and GCC-V, and a global receptive field is obtained; the size and the input of the convolution kernel in the GCC are consistent, and position information is injected into the output characteristic diagram through a position embedding (Position Embedding) strategy, so that the sensitivity of the output characteristic to the spatial position is ensured, and the interference of the use of the cyclic convolution to the spatial structure is reduced. Meanwhile, in the channel attention component, nonlinear characteristics are introduced through a feedforward neural network (FFN), and a channel attention mechanism SE Block is added after the FFN, so that a key channel can be highlighted, and a characteristic channel with little use for the current task can be restrained. In addition, the position-aware cyclic convolution module PARCM introduces residual connection, and can perceive global and local context information at the same time. Where PW-Conv is DW convolution, which acts to adjust the input dimension, and Pre-Norm uses batch normalization operations.
FIG. 3 is a schematic diagram illustrating a calculation process of a global circular convolution GCC in a position aware circular convolution module PARCM according to an embodiment of the present invention, as shown in FIG. 3, an input
Figure SMS_61
By the calculation process of GCC-H and GCC-V, for simplicity of representation, assume inputs
Figure SMS_62
Only one channel, the corresponding shape is +.>
Figure SMS_63
,/>
Figure SMS_64
The pixel at the position is subject to the output +.>
Figure SMS_65
Can be calculated by the following formula:
Figure SMS_68
wherein
Figure SMS_91
Is a basic position embedded information with dimension +.>
Figure SMS_100
The method comprises the steps of carrying out a first treatment on the surface of the By bilinear interpolation function->
Figure SMS_104
Obtain instance location embedding information +.>
Figure SMS_70
Dimension is->
Figure SMS_96
The method comprises the steps of carrying out a first treatment on the surface of the By means of a vertical extension function>
Figure SMS_93
Through the input vector along +.>
Figure SMS_99
Directional replication->
Figure SMS_76
Second, generate->
Figure SMS_80
PE matrix of size, i.e.)>
Figure SMS_74
Is +.>
Figure SMS_81
Therefore, the position embedding PE can flexibly adapt to input features with different sizes; embedding instance location information->
Figure SMS_75
And input->
Figure SMS_87
Superposition to obtain the characteristic->
Figure SMS_83
The method comprises the steps of carrying out a first treatment on the surface of the Features->
Figure SMS_90
Along->
Figure SMS_84
Directional stacking, the characteristic dimension obtained is +.>
Figure SMS_89
Standard convolution with global receptive field and capable of parameter sharing can thus be achieved; next by constructing the convolution kernel parameter +.>
Figure SMS_86
Also by a size of
Figure SMS_88
Basic quantity->
Figure SMS_69
By bilinear interpolation function->
Figure SMS_79
Obtaining the product with the size of +.>
Figure SMS_85
Is->
Figure SMS_94
At this time, the spatial receptive field of the convolution kernel is +.>
Figure SMS_97
Spatial receptive field matched with GCC-H convolution kernel>
Figure SMS_102
Approximating a global overlay; will->
Figure SMS_82
Input of embedded position information->
Figure SMS_95
Performing standard convolution operations, i.e.)>
Figure SMS_92
In this procedure, GCC-V is +.>
Figure SMS_101
Output feature coordinates of convolution kernel>
Figure SMS_98
In fact corresponds to the coordinate range +.>
Figure SMS_103
, wherein />
Figure SMS_73
Is 0 to->
Figure SMS_78
The values between, different->
Figure SMS_71
Corresponding to the relative coordinates in the local neighborhood covered by the convolution kernel, the modulo operation corresponds to a loopThe operations, namely corresponding to global circular convolution, can be realized by specific codes
Figure SMS_77
To realize the method. GCC-H is in the same way->
Figure SMS_72
The output at can be expressed as:
Figure SMS_105
wherein
Figure SMS_106
,/>
Figure SMS_107
The input vector is extended in the vertical direction as an extension function. In summary, the receptive fields of the GCC-H and GCC-V outputs can cover the same columns and the same rows of all input positions, and global features can be extracted from all input positions by using the GCC-H and the GCC-V jointly to obtain the global receptive fields.
Step four, extracting edge contour information of the camouflage target by using an edge extraction module EEM
Figure SMS_108
Further, an edge prediction map is obtained>
Figure SMS_109
And by camouflaging the edge tag of the object +.>
Figure SMS_110
For->
Figure SMS_111
Boundary supervision
FIG. 4 is a flowchart illustrating an implementation of the edge extraction module EEM according to an embodiment of the present invention, as shown in FIG. 4, the edge extraction module EEM obtains an edge prediction graph
Figure SMS_114
The specific contents of (3) include: first use->
Figure SMS_118
Convolutions of (a) will feature->
Figure SMS_121
And features->
Figure SMS_113
The number of channels is compressed and then the characteristic +.>
Figure SMS_117
Upsampling to and feature->
Figure SMS_119
After the same size, the two are spliced together and then passed through one
Figure SMS_122
Is fused by convolution of one +.>
Figure SMS_112
Normalized by convolution and Sigmiod functions to obtain an edge prediction graph
Figure SMS_116
The edge loss function here is chosen to be +.>
Figure SMS_120
. Due to low-level features->
Figure SMS_123
The noise in (a) is more, so that the high-level semantic feature (EEM) is introduced into the Edge Extraction Module (EEM)>
Figure SMS_115
As an aid. The main function of the edge extraction module EEM is to provide a valuable edge prior for subsequent segmentation, so that the model can better segment the edge contour of the camouflage target.
Step five, effectively fusing the global features obtained in the step three with the edge contour information obtained in the step fourResultant edge enhancement features
Figure SMS_124
FIG. 5 is a flowchart illustrating an implementation of an edge enhancement module ERM according to an embodiment of the present invention, as shown in FIG. 5, the edge enhancement module ERM obtains edge enhancement features
Figure SMS_125
The specific contents of (3) include: first of all the edge profile is->
Figure SMS_126
Downsampling to output feature +/with position aware circular convolution module PARCM>
Figure SMS_127
The same size and then +.>
Figure SMS_128
Element-wise multiplication is performed and the result is connected with +.>
Figure SMS_129
Element-by-element addition followed by a +.>
Figure SMS_130
The convolution layer is used for activating functions through batch normalization and ReLU, then introducing a frequency channel attention component FcaNet, and adding the input and the output of the frequency channel attention component element by element through residual connection to obtain boundary strengthening characteristics->
Figure SMS_131
The edge strengthening module ERM introduces the frequency channel attention component FcaNet, fig. 6 is a flowchart of the implementation of the frequency channel attention component FcaNet, as shown in fig. 6, in which FcaNet extracts different frequency component information through two-dimensional discrete cosine transform DCT to combine, and compared with a single global average pooling operation, various feature information can be captured, and edge details are enhanced.
The two-dimensional discrete cosine transform DCT is calculated as follows:
Figure SMS_132
in the formula
Figure SMS_133
For the width and height of the image +.>
Figure SMS_134
Is->
Figure SMS_135
The value of the pixel point, frequency component +.>
Figure SMS_136
The method can be regarded as the weighted sum of each input point in the image, the cosine part is equivalent to the weight, and the conversion from the space domain to the frequency domain can be realized through the formula. Wherein the lowest frequency component->
Figure SMS_137
Is that
Figure SMS_138
Since global average pooling can be expressed as
Figure SMS_139
So the lowest frequency component +.>
Figure SMS_140
And->
Figure SMS_141
Proportional to the ratio.
Order the
Figure SMS_142
The two-dimensional inverse discrete cosine transform can be expressed as:
Figure SMS_143
the image signal is composed of global average pooling term GAP and other frequency components
Figure SMS_144
Pixel value +.>
Figure SMS_145
Can be defined by all frequency components at->
Figure SMS_146
The channel attention component such as SENet only considers GAP (lowest frequency component information) to lose a large amount of available information, fcaNet extracts different frequency component information through two-dimensional discrete cosine transform DCT and effectively combines the information, so that various characteristic information can be obtained, and characteristic representation and edge detail are enhanced.
Step six, the edge strengthening characteristic obtained in the step five is obtained
Figure SMS_147
Carrying out multi-scale feature fusion to obtain multi-scale polymerized features +.>
Figure SMS_148
The multi-scale fusion module MSFM introduces a multi-scale attention mechanism MSAM, and FIG. 7 is a flow chart for implementing the multi-scale attention mechanism MSAM, as shown in FIG. 7, the MSAM is composed of two branches, one branch uses a global average pooling layer and two point-by-point convolutions to obtain global context information so as to highlight a large object in global distribution, and the other branch maintains the original feature size, and only two point-by-point convolutions are adopted to obtain local context information, so that small objects are prevented from being ignored. And finally, splicing the two branches, and obtaining a multi-scale channel attention coefficient through a Sigmoid function. The contributions of the features of different layers to the task are different, and the fusion of the multi-layer features can complement each other to obtain comprehensive feature expression. Fig. 8 is a flowchart of an implementation of the MSFM module according to an embodiment of the present invention, where the formula is as follows:
Figure SMS_149
wherein ,
Figure SMS_151
is characterized by->
Figure SMS_154
and />
Figure SMS_156
Results after MSFM by a multiscale fusion module, < >>
Figure SMS_152
Representing a multiscale attention mechanism MSAM component, < ->
Figure SMS_153
Representing low-level features, ++>
Figure SMS_157
Representing high-level features->
Figure SMS_159
High-level features representing up-sampling twice +.>
Figure SMS_150
,/>
Figure SMS_155
Representing element addition operations, ++>
Figure SMS_158
Representing the element multiplication operation.
The low-level features and the high-level features are fused to obtain
Figure SMS_161
Then input a +.>
Figure SMS_164
Convolutional layer, then batch normalized and ReLU activatedThe function outputs the fused multi-scale features. In particular implementations, the present embodiment allows for lower level features of greater spatial resolution to require more computing resources than high level semantic features and to contribute less to the performance of the model. Based on this observation, only the advanced features +.>
Figure SMS_166
Feature fusion is performed in which->
Figure SMS_162
Feature fusion is carried out through a multi-scale fusion module MSFM to obtain +.>
Figure SMS_163
Then ∈>
Figure SMS_165
and />
Figure SMS_167
Feature fusion is carried out through a multi-scale fusion module MSFM to obtain
Figure SMS_160
Features of different scales have different contributions to tasks, and feature fusion of multiple scales can be mutually complemented through a multi-scale fusion module MSFM so as to obtain comprehensive feature representation.
And seventhly, convolving the multi-scale aggregation features obtained in the step six to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target.
By mixing the above obtained materials
Figure SMS_168
By a +.>
Figure SMS_169
Convolution to obtain a single-channel gray level image, 8 times up-sampling to obtain a final camouflage target prediction image, and obtaining a binary label image of the camouflage target +.>
Figure SMS_170
Performing depth supervision, calculating its structured loss, and selecting a loss function as binary cross loss +.>
Figure SMS_171
And weighted cross ratio loss->
Figure SMS_172
. The total loss function of the final model is:
Figure SMS_173
Figure SMS_174
Figure SMS_175
wherein ,
Figure SMS_177
and />
Figure SMS_180
Is the weight factor of each loss, +.in experimental simulation>
Figure SMS_183
and />
Figure SMS_178
Taking 1 and 3 respectively, < > and->
Figure SMS_181
Representing structural loss, < >>
Figure SMS_184
Represents edge loss, ++>
Figure SMS_185
Is a predicted camouflage target saliency map, +.>
Figure SMS_176
Representing a predicted camouflage target edge map +.>
Figure SMS_179
A label representing the saliency of a camouflage object,>
Figure SMS_182
an edge tag representing a camouflage object.
Finally, using test set image
Figure SMS_186
And obtaining a final camouflage target detection result as input of the trained camouflage target detection network model.
To verify the effectiveness of the present invention, five main stream camouflage target segmentation models UGTR, MGL-R, PFNet, EGNet, SINet were tested on the camouflage target data set CAMO, CHAMELEON, COD10K, NC K partial image segmentation results and compared with the camouflage target prediction results of the present invention.
The experimental simulation content of the camouflage target detection method based on the attention mechanism and the convolutional neural network is as follows:
the experimental platform adopts a 64-bit Ubuntu system, the version of the system is 20.04.4, the model of the GPU is GeForce RTX 2080 Ti, python is used as a programming language, the version of the system is Python 3.8, and PyCharm is used as a software development platform; the model is realized by using a deep learning framework Pytorch 1.4, and the input image is a 3-channel RGB image with the size of
Figure SMS_187
In the training stage, the batch size (batch size) is 24, the iteration round (epoch) is 40, the parameters are optimized by using an AdaX optimizer, the initial learning rate is set to 3e-4, the initial learning rate is continuously adjusted to the power of 0.9 by adopting a poly strategy, and the whole network training time is about 1 hour under the acceleration of three GeForce RTX 2080 Ti GPUs. FIG. 9 is a graph showing the comparison of five main stream camouflage target segmentation models in the comparison test of the present invention with the image segmentation results of different camouflage target data sets by the method of the present inventionWherein->
Figure SMS_188
The method and the device can effectively detect the camouflage target by comparing the camouflage target significance labels and Our, refine the edge outline of the camouflage target and improve the accuracy of camouflage target detection.
The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any person skilled in the art will make any equivalent substitution or modification to the technical solution and technical content disclosed in the invention without departing from the scope of the technical solution of the invention, and the technical solution of the invention is not departing from the scope of the invention.

Claims (10)

1. The camouflage target detection method based on the attention mechanism and the convolutional neural network is characterized by comprising the following steps of:
s1, dividing an image data set of a camouflage target into a training set and a testing set;
s2, inputting the training set image into a backbone network of a pre-constructed camouflage target detection network model to extract multi-scale features containing the camouflage target image
Figure QLYQS_1
S3, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone network
Figure QLYQS_2
The input positions are respectively input to a position sensing circular convolution module PARCM to output global features;
s4, extracting edge contour information of the camouflage target by using an edge extraction module EEM
Figure QLYQS_3
Further, an edge prediction map is obtained>
Figure QLYQS_4
And by camouflaging the edge tag of the object +.>
Figure QLYQS_5
For->
Figure QLYQS_6
Performing boundary supervision;
s5, effectively fusing the global features obtained in the step S3 with the edge contour information obtained in the step S4, and then fusing the multi-scale features to obtain multi-scale aggregated features
Figure QLYQS_7
S6, carrying out multi-scale polymerization on the characteristics obtained in the step S5
Figure QLYQS_8
Processing to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target;
s7, taking the test set image as the input of the camouflage target detection network model after training, and obtaining a final camouflage target detection result.
2. The method for camouflage target detection based on an attention mechanism and a convolutional neural network of claim 1, wherein the backbone network uses an EfficientNet-B4 model in the EfficientNet series to extract multi-scale features containing camouflage target images.
3. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network as set forth in claim 2, wherein in step S4, the edge extraction module EEM is utilized to extract edge contour information of the camouflage target, thereby obtaining an edge prediction graph
Figure QLYQS_11
And by camouflaging the edge tag of the object +.>
Figure QLYQS_12
For->
Figure QLYQS_15
Specific contents for performing boundary supervision include: the edge extraction module EEM is utilized to output low-level features of the feature extraction layer Stage2 of the backbone network>
Figure QLYQS_10
Advanced semantic feature ++Stage 5 output from feature extraction layer>
Figure QLYQS_14
Carrying out fusion extraction on edge contour information of a camouflage target, and obtaining a binary image by outputting an EEM through a normalization function Sigmoid>
Figure QLYQS_17
Will->
Figure QLYQS_18
Upsampling four times to obtain an edge prediction map +.>
Figure QLYQS_9
By camouflage the edge tag of the object->
Figure QLYQS_13
For->
Figure QLYQS_16
Boundary supervision is performed using an edge loss function of +.>
Figure QLYQS_19
4. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein the position-aware cyclic convolution module PARCM in the step S3 includes a position-aware cyclic convolution component ParC and a channel attention component, wherein the position-aware cyclic convolution component ParC extracts global features by using global cyclic convolution GCC, and introduces residual connection in the position-aware cyclic convolution module PARCM.
5. The method for camouflage target detection based on an attention mechanism and a convolutional neural network according to claim 4, wherein a position embedding strategy is introduced in a position aware cyclic convolution module PARCM.
6. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 4, wherein in the channel attention component, nonlinear characteristics are introduced through a feedforward neural network FFN, and a channel attention mechanism SE Block is added after FFN to highlight a key channel.
7. The method for detecting the camouflage target based on the attention mechanism and the convolutional neural network according to claim 1, wherein the global features obtained in the step S3 and the edge contour information obtained in the step S4 are effectively fused, and then multi-scale feature fusion is performed to obtain multi-scale aggregated features
Figure QLYQS_21
The specific contents of (3) include: edge contour information +_f from edge extraction module EEM using edge enhancement module ERM>
Figure QLYQS_24
Global feature +.>
Figure QLYQS_26
Fusion to give the feature->
Figure QLYQS_20
The method comprises the steps of carrying out a first treatment on the surface of the Then introducing a multi-scale attention mechanism MSAM into a multi-scale fusion module MSFM for multi-scale operationScale feature fusion, wherein the feature->
Figure QLYQS_25
The multi-scale feature fusion is carried out through a multi-scale fusion module MSFM to obtain features->
Figure QLYQS_27
Then the feature->
Figure QLYQS_28
And features->
Figure QLYQS_22
Feature fusion is carried out through an MSFM multi-scale fusion module to obtain multi-scale aggregation features +.>
Figure QLYQS_23
8. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein a frequency channel attention component FcaNet is introduced into an edge enhancement module ERM, and different frequency component information is extracted and combined through a two-dimensional discrete cosine transform DCT.
9. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein the structured loss function selected by deep supervision of the camouflage target by its binary label map in step S6 is a weighted binary cross loss
Figure QLYQS_29
And weighted cross ratio loss->
Figure QLYQS_30
10. An attention mechanism and convolutional neural network based on claim 1A method for detecting a camouflage target, characterized in that the total loss function of a camouflage target detection model
Figure QLYQS_31
The formula of (2) is as follows:
Figure QLYQS_32
Figure QLYQS_33
Figure QLYQS_34
wherein ,
Figure QLYQS_36
representing structural loss, < >>
Figure QLYQS_39
Represents edge loss, ++>
Figure QLYQS_41
and />
Figure QLYQS_37
The weight factors of the structuring loss and the edge loss are respectively; />
Figure QLYQS_38
Is a predicted camouflage target saliency map, +.>
Figure QLYQS_43
Representing a predicted camouflage target edge profile; />
Figure QLYQS_45
A label representing the saliency of a camouflage object,>
Figure QLYQS_35
an edge tag representing a camouflage object; />
Figure QLYQS_40
For weighted binary cross-loss in structured loss, < >>
Figure QLYQS_42
For weighted cross-ratio loss in structured loss, +.>
Figure QLYQS_44
For edge loss using the Dice coefficient. />
CN202310157199.8A 2023-02-23 2023-02-23 Camouflage target detection method based on attention mechanism and convolutional neural network Pending CN116228702A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310157199.8A CN116228702A (en) 2023-02-23 2023-02-23 Camouflage target detection method based on attention mechanism and convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310157199.8A CN116228702A (en) 2023-02-23 2023-02-23 Camouflage target detection method based on attention mechanism and convolutional neural network

Publications (1)

Publication Number Publication Date
CN116228702A true CN116228702A (en) 2023-06-06

Family

ID=86578159

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310157199.8A Pending CN116228702A (en) 2023-02-23 2023-02-23 Camouflage target detection method based on attention mechanism and convolutional neural network

Country Status (1)

Country Link
CN (1) CN116228702A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664990A (en) * 2023-08-01 2023-08-29 苏州浪潮智能科技有限公司 Camouflage target detection method, model training method, device, equipment and medium
CN116703950A (en) * 2023-08-07 2023-09-05 中南大学 Camouflage target image segmentation method and system based on multi-level feature fusion
CN116894943A (en) * 2023-07-20 2023-10-17 深圳大学 Double-constraint camouflage target detection method and system
CN117173523A (en) * 2023-08-04 2023-12-05 山东大学 Camouflage target detection method and system based on frequency perception
CN117422939A (en) * 2023-12-15 2024-01-19 武汉纺织大学 Breast tumor classification method and system based on ultrasonic feature extraction
CN118072001A (en) * 2024-04-22 2024-05-24 西南科技大学 Camouflage target detection method based on scale feature perception and extensive perception convolution

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116894943A (en) * 2023-07-20 2023-10-17 深圳大学 Double-constraint camouflage target detection method and system
CN116664990A (en) * 2023-08-01 2023-08-29 苏州浪潮智能科技有限公司 Camouflage target detection method, model training method, device, equipment and medium
CN116664990B (en) * 2023-08-01 2023-11-14 苏州浪潮智能科技有限公司 Camouflage target detection method, model training method, device, equipment and medium
CN117173523A (en) * 2023-08-04 2023-12-05 山东大学 Camouflage target detection method and system based on frequency perception
CN117173523B (en) * 2023-08-04 2024-04-09 山东大学 Camouflage target detection method and system based on frequency perception
CN116703950A (en) * 2023-08-07 2023-09-05 中南大学 Camouflage target image segmentation method and system based on multi-level feature fusion
CN116703950B (en) * 2023-08-07 2023-10-20 中南大学 Camouflage target image segmentation method and system based on multi-level feature fusion
CN117422939A (en) * 2023-12-15 2024-01-19 武汉纺织大学 Breast tumor classification method and system based on ultrasonic feature extraction
CN117422939B (en) * 2023-12-15 2024-03-08 武汉纺织大学 Breast tumor classification method and system based on ultrasonic feature extraction
CN118072001A (en) * 2024-04-22 2024-05-24 西南科技大学 Camouflage target detection method based on scale feature perception and extensive perception convolution
CN118072001B (en) * 2024-04-22 2024-06-21 西南科技大学 Camouflage target detection method based on scale feature perception and extensive perception convolution

Similar Documents

Publication Publication Date Title
Xie et al. Multilevel cloud detection in remote sensing images based on deep learning
Ren et al. Deep texture-aware features for camouflaged object detection
CN116228702A (en) Camouflage target detection method based on attention mechanism and convolutional neural network
CN110378381B (en) Object detection method, device and computer storage medium
El Amin et al. Convolutional neural network features based change detection in satellite images
Rezaeilouyeh et al. Microscopic medical image classification framework via deep learning and shearlet transform
CN112750140B (en) Information mining-based disguised target image segmentation method
Zhao et al. Multi-scale image block-level F-CNN for remote sensing images object detection
Lee et al. Accurate traffic light detection using deep neural network with focal regression loss
Wen et al. Gcsba-net: Gabor-based and cascade squeeze bi-attention network for gland segmentation
Woźniak et al. Graphic object feature extraction system based on cuckoo search algorithm
Fang et al. SAR-optical image matching by integrating Siamese U-Net with FFT correlation
Wang et al. Semantic segmentation of remote sensing ship image via a convolutional neural networks model
CN113191489B (en) Training method of binary neural network model, image processing method and device
CN113468996A (en) Camouflage object detection method based on edge refinement
Shen et al. ICAFusion: Iterative cross-attention guided feature fusion for multispectral object detection
Krasilenko et al. Modeling of biologically motivated self-learning equivalent-convolutional recurrent-multilayer neural structures (BLM_SL_EC_RMNS) for image fragments clustering and recognition
Sadanandan et al. Feature augmented deep neural networks for segmentation of cells
Kang et al. ASF-YOLO: A novel YOLO model with attentional scale sequence fusion for cell instance segmentation
Santana et al. Oceanic mesoscale eddy detection and convolutional neural network complexity
Yildirim et al. Ship detection in optical remote sensing images using YOLOv4 and Tiny YOLOv4
Chen et al. Attention-based hierarchical fusion of visible and infrared images
Gao et al. Multi-scale learning based segmentation of glands in digital colonrectal pathology images
Zhao et al. Multitask learning for SAR ship detection with Gaussian-mask joint segmentation
Ataş Performance Evaluation of Jaccard-Dice Coefficient on Building Segmentation from High Resolution Satellite Images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination