CN116228702A

CN116228702A - Camouflage target detection method based on attention mechanism and convolutional neural network

Info

Publication number: CN116228702A
Application number: CN202310157199.8A
Authority: CN
Inventors: 朱虎; 鲁飞; 邓丽珍
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-06-06

Abstract

The invention discloses a camouflage target detection method based on an attention mechanism and a convolutional neural network, which belongs to the camouflage target detection field and specifically comprises the following steps: inputting the training set image into a backbone network to extract multi-scale features containing camouflage target images; respectively inputting features output by Stage3, stage4 and Stage5 of the backbone network into a carry position perception cyclic convolution module to output global features; extracting edge contour information of a camouflage target by using an edge extraction module to obtain an edge prediction graph

And by camouflaging the edge tag of the object

For a pair of

Performing boundary supervision; fusing the obtained global features and the edge contour information, and then performing multi-scale feature fusion to obtain multi-scale aggregation features

And obtaining a disguised target prediction graph, and performing deep supervision on the disguised target through a binary label graph of the disguised target. The method for detecting the camouflage target can comprehensively sense the camouflage target and refine the boundary outline of the camouflage target, and improves the detection performance of the camouflage target.

Description

Camouflage target detection method based on attention mechanism and convolutional neural network

Technical Field

The invention relates to a camouflage target detection method based on an attention mechanism and a convolutional neural network, and belongs to the field of camouflage target detection.

Background

In nature, many organisms have the camouflage property, and chameleon can adjust the color according to the surrounding environment so as to achieve the camouflage purpose; the lion "camouflage" the body in the grass and opportunistically wait for the approaching of the prey; the butterfly lies on the trunk with similar color to the butterfly to avoid the injury of natural enemy. Biologists refer to such camouflage as background matching, i.e., animals, to avoid being identified, try to change their own color to "perfectly" blend into the surrounding environment. Therefore, compared with the general target detection and the salient target detection, the method has obvious difference between the target and the background, and can be easily distinguished by human eyes in normal cases, and the high similarity between the camouflage target and the background in the camouflage target detection makes the detection of the camouflage target more challenging.

The boundary between the boundary of the camouflage target and the background is quite fuzzy and difficult to distinguish, the camouflage object is difficult to accurately position without introducing additional prior information, the literature "Camou ﬂ aged object segmentation with distraction mining In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, june 2021" proposes that the concept of PFNet interference elimination is introduced into the camouflage object segmentation task, a new mining strategy is developed for the discovery and removal of an interference area so as to help the segmentation of the camouflage object, but the boundary information of the camouflage object is not concerned, and the complete boundary of the camouflage object cannot be accurately segmented; literature Camou ﬂ, object detection, in CVPR, 2020, proposes that SINet increases receptive field by receptive field module (RFB) to promote segmentation of camouflage targets, but RFB can only enhance local receptive field, cannot acquire global features, enhance global receptive field, and cannot acquire global context information; the chinese patent of patent No. CN113468996a discloses a camouflage object detection method based on edge refinement, which considers the prior information of the edge, but only considers global average pooling in the attention mechanism of the edge refinement module, resulting in losing a large amount of available different frequency information. In view of the above, the capability of the existing camouflage target detection algorithm needs to be improved.

Disclosure of Invention

Aiming at the problems, the invention provides a camouflage target detection method based on an attention mechanism and a convolutional neural network, which is characterized in that edge contour information is effectively extracted by using an edge extraction module, different frequency component information is extracted by a frequency channel attention component Fcanet introduced by an edge enhancement module through two-dimensional discrete cosine transform DCT to be combined, various characteristic information can be captured and fused with the extracted global characteristic, boundary representation is enhanced, a multi-scale attention mechanism is introduced in a camouflage target detection network model to effectively aggregate multi-scale characteristics, the comprehensive perception of the camouflage target is realized, the boundary contour of the camouflage target is refined, and the detection performance of the camouflage target is improved.

The technical scheme adopted for solving the technical problems is as follows:

a camouflage target detection method based on an attention mechanism and a convolutional neural network specifically comprises the following steps:

s1, dividing an image data set of a camouflage target into a training set and a testing set;

s2, training set images

The multi-scale feature containing the camouflage target image is extracted from a backbone network input to a pre-constructed camouflage target detection network model>

，/>

Resolution of +.>

；

S3, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone network

The input positions are respectively input to a position sensing circular convolution module PARCM to output global features;

s4, extracting edge contour information of the camouflage target by utilizing an edge extraction module

Further, an edge prediction map is obtained>

And by camouflaging the edge tag of the object +.>

For->

Performing boundary supervision;

s5, effectively fusing the global features obtained in the step S3 with the edge contour information obtained in the step S4, and then fusing the multi-scale featuresObtaining multi-scale polymeric features

；

S6, carrying out multi-scale polymerization on the characteristics obtained in the step S5

Processing to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target;

s7, testing the set image

And obtaining a final camouflage target detection result as input of the trained camouflage target detection network model.

Further, the backbone network extracts multi-scale features containing camouflage target images using an Efficient Net-B4 model in the Efficient Net series.

Further, the position-aware cyclic convolution module PARCM in the step S3 includes a position-aware cyclic convolution component ParC and a channel attention component, where the position-aware cyclic convolution component ParC uses global cyclic convolution GCC to extract global features.

Further, in step S4, edge contour information of the camouflage target is extracted by using an edge extraction module, so as to obtain an edge prediction graph

And by camouflaging the edge tag of the object +.>

For->

Specific contents for performing boundary supervision include: the edge extraction module EEM is utilized to output low-level features of the feature extraction layer Stage2 of the backbone network>

Advanced semantic feature ++Stage 5 output from feature extraction layer>

Carrying out fusion extraction on edge contour information of a camouflage target, and obtaining a binary image by outputting an EEM through a normalization function Sigmoid>

Will->

Upsampling four times to obtain an edge prediction map +.>

By camouflage the edge tag of the object->

For->

Boundary supervision is performed using an edge loss function of +.>

。

Further, a position embedding (Position Embedding) strategy is introduced in the position aware cyclic convolution module PARCM.

Further, nonlinear characteristics are introduced into the channel attention component of the position sensing circular convolution module PARCM through a feedforward neural network (FFN), and a channel attention mechanism SE Block is added after the FFN to highlight key channels.

Further, a residual connection is introduced in the position-aware cyclic convolution module PARCM.

Further, after the global feature obtained in the step S3 and the edge contour information obtained in the step S4 are effectively fused, multi-scale feature fusion is then performed to obtain multi-scale aggregate features

The specific contents of (3) include: edge contour information +_f from edge extraction module EEM using edge enhancement module ERM>

Global feature +.>

Fusion to give the feature->

The method comprises the steps of carrying out a first treatment on the surface of the Then, multi-scale feature fusion is carried out by introducing a multi-scale attention mechanism MSAM into a multi-scale fusion module MSFM, wherein the feature +.>

The multi-scale feature fusion is carried out through a multi-scale fusion module MSFM to obtain features->

Then the feature->

And features->

Feature fusion is carried out through a multi-scale fusion module MSFM to obtain multi-scale aggregation features +.>

. For camouflage data sets, the size of camouflage objects is generally variable, and the multi-scale attention mechanism MSAM has strong adaptability to camouflage targets with different scales, so that the multi-scale features can be effectively fused.

Furthermore, a frequency channel attention component FcaNet is introduced into the edge strengthening module ERM, different frequency component information is extracted through two-dimensional discrete cosine transformation DCT and combined, and compared with single global average pooling operation, various characteristic information can be captured, and edge details are enhanced.

Further, in step S6, the structured loss function selected by performing depth supervision on the disguised target through the binary label graph is weighted binary cross loss

And weighted cross ratio loss->

；

Further, the total loss function of the camouflage target detection model

The formula of (2) is as follows:

；

；

；

wherein ,

representing structural loss, < >>

Represents edge loss, ++>

and />

The weight factors of the structuring loss and the edge loss are respectively; />

Is a predicted camouflage target saliency map, +.>

Representing a predicted camouflage target edge profile; />

A label representing the saliency of a camouflage object,>

an edge tag representing a camouflage object; />

For weighted binary cross-loss in structured loss, < >>

For weighted cross-ratio loss in structured loss, +.>

For edge loss using the Dice coefficient.

Compared with the prior art, the technical scheme of the invention has the following technical effects:

according to the invention, additional edge priori information is introduced and depth supervision is performed, a frequency channel attention component FcNet is introduced in an edge strengthening module, different frequency component information is extracted through two-dimensional discrete cosine transform DCT and combined, and compared with single global average pooling operation, multiple feature information can be captured and fused with global features extracted through global cyclic convolution GCC, and boundary representation can be enhanced.

The position perception cyclic convolution module of the camouflage target detection network model can effectively extract global features by introducing global cyclic convolution GCC to obtain global receptive fields, and the problems of strong locality and insufficient global property of a convolutional neural network are solved, so that global context information is obtained; meanwhile, a position embedding strategy is introduced into the position sensing circular convolution module, position information is injected into the output feature map, the sensitivity of the output feature to the space position is ensured, a channel attention mechanism is introduced, and key channels are highlighted; in addition, the invention introduces a multi-scale attention mechanism in the camouflage target detection network model, and can effectively fuse multi-scale context information, thereby improving the camouflage target detection performance.

Drawings

FIG. 1 is a flowchart showing the overall implementation of the method for detecting a camouflage target based on an attention mechanism and a convolutional neural network;

fig. 2 is a flowchart of an implementation of a position aware cyclic convolution module PARCM according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a calculation process of global cyclic convolution GCC in a position aware cyclic convolution module PARCM according to an embodiment of the present invention;

FIG. 4 is a flowchart illustrating an implementation of the edge extraction module EEM according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating an implementation of the edge enhancement module ERM according to an embodiment of the present invention;

FIG. 6 is a flow chart of an implementation of the frequency channel attention component FcNet according to an embodiment of the present invention;

FIG. 7 is a flowchart of an implementation of a multi-scale attention mechanism MSAM according to an embodiment of the present invention;

FIG. 8 is a flowchart of an implementation of a multi-scale fusion module MSFM according to an embodiment of the present invention;

FIG. 9 is a graph showing the comparison of five main stream camouflage target segmentation models in the comparison test of the present invention with the image segmentation results of different camouflage target data sets by the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the present invention will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the invention.

Fig. 1 is a flowchart of an implementation of the method for detecting the camouflage target based on the attention mechanism and the convolutional neural network, and as shown in fig. 1, the camouflage target detection network model of the invention comprises a main network, a position sensing and convolutional module PARCM, an edge extraction module EEM, an edge reinforcement module ERM and a multi-scale fusion module MSFM. Wherein the backbone network extracts multi-scale features containing camouflage target images using an Efficient Net-B4 model in an Efficient Net series

Due to camouflage purposeThe input sizes of the images of the target training set are different, so that the EfficientNet is selected as a backbone network, and three dimensions of network depth, network width and resolution can be balanced, so that the feature extraction capability is effectively improved; the position sensing circular convolution module PARCM is used for extracting global features and enhancing receptive fields; the edge extraction module EEM is used for edge extraction and outputting edge contour information; the edge enhancement module ERM is used for effectively fusing the edge contour information output by the edge extraction module EEM and the features enhanced by the PARCM receptive field through the position perception cyclic convolution module, and enhancing boundary representation; the multi-scale fusion module MSFM effectively fuses the multi-scale features by introducing a multi-scale attention mechanism MSAM.

The camouflage target detection method based on the attention mechanism and the convolutional neural network comprises the following steps:

dividing an image data set of a camouflage target into a training set and a testing set

Step two, training set images are obtained

Inputting the multi-scale characteristics of the camouflage target image into a backbone network EfficientNet of a pre-constructed camouflage target detection network model>

，

Resolution of +.>

, wherein />

and />

Respectively representing the height and width of the feature map, +.>

Representing the resolution scaling factor.

Step three, outputting the characteristics of the characteristic extraction layers Stage3, stage4 and Stage5 of the backbone network

Input position sensing circular convolution module PARCM outputs global features

In this embodiment, the feature extraction layers Stage3, stage4, stage5 of the backbone network output features

Input position sensing circular convolution module PARCM outputs +.>

. The implementation flow chart of the position-aware cyclic convolution module PARCM in the embodiment of the invention is shown in fig. 2, and the position-aware cyclic convolution module PARCM is composed of a position-aware cyclic convolution component ParC and a channel attention component, wherein the position-aware cyclic convolution component ParC uses global cyclic convolution GCC, the global cyclic convolution GCC comprises a horizontal direction GCC-H and a vertical direction GCC-V, global features can be extracted from all input positions through joint use of GCC-H and GCC-V, and a global receptive field is obtained; the size and the input of the convolution kernel in the GCC are consistent, and position information is injected into the output characteristic diagram through a position embedding (Position Embedding) strategy, so that the sensitivity of the output characteristic to the spatial position is ensured, and the interference of the use of the cyclic convolution to the spatial structure is reduced. Meanwhile, in the channel attention component, nonlinear characteristics are introduced through a feedforward neural network (FFN), and a channel attention mechanism SE Block is added after the FFN, so that a key channel can be highlighted, and a characteristic channel with little use for the current task can be restrained. In addition, the position-aware cyclic convolution module PARCM introduces residual connection, and can perceive global and local context information at the same time. Where PW-Conv is DW convolution, which acts to adjust the input dimension, and Pre-Norm uses batch normalization operations.

FIG. 3 is a schematic diagram illustrating a calculation process of a global circular convolution GCC in a position aware circular convolution module PARCM according to an embodiment of the present invention, as shown in FIG. 3, an input

By the calculation process of GCC-H and GCC-V, for simplicity of representation, assume inputs

Only one channel, the corresponding shape is +.>

,/>

The pixel at the position is subject to the output +.>

Can be calculated by the following formula:

wherein

Is a basic position embedded information with dimension +.>

The method comprises the steps of carrying out a first treatment on the surface of the By bilinear interpolation function->

Obtain instance location embedding information +.>

Dimension is->

The method comprises the steps of carrying out a first treatment on the surface of the By means of a vertical extension function>

Through the input vector along +.>

Directional replication->

Second, generate->

PE matrix of size, i.e.)>

Is +.>

Therefore, the position embedding PE can flexibly adapt to input features with different sizes; embedding instance location information->

And input->

Superposition to obtain the characteristic->

The method comprises the steps of carrying out a first treatment on the surface of the Features->

Along->

Directional stacking, the characteristic dimension obtained is +.>

Standard convolution with global receptive field and capable of parameter sharing can thus be achieved; next by constructing the convolution kernel parameter +.>

Also by a size of

Basic quantity->

By bilinear interpolation function->

Obtaining the product with the size of +.>

Is->

At this time, the spatial receptive field of the convolution kernel is +.>

Spatial receptive field matched with GCC-H convolution kernel>

Approximating a global overlay; will->

Input of embedded position information->

Performing standard convolution operations, i.e.)>

In this procedure, GCC-V is +.>

Output feature coordinates of convolution kernel>

In fact corresponds to the coordinate range +.>

, wherein />

Is 0 to->

The values between, different->

Corresponding to the relative coordinates in the local neighborhood covered by the convolution kernel, the modulo operation corresponds to a loopThe operations, namely corresponding to global circular convolution, can be realized by specific codes

To realize the method. GCC-H is in the same way->

The output at can be expressed as:

wherein

，/>

The input vector is extended in the vertical direction as an extension function. In summary, the receptive fields of the GCC-H and GCC-V outputs can cover the same columns and the same rows of all input positions, and global features can be extracted from all input positions by using the GCC-H and the GCC-V jointly to obtain the global receptive fields.

Step four, extracting edge contour information of the camouflage target by using an edge extraction module EEM

Further, an edge prediction map is obtained>

And by camouflaging the edge tag of the object +.>

For->

Boundary supervision

FIG. 4 is a flowchart illustrating an implementation of the edge extraction module EEM according to an embodiment of the present invention, as shown in FIG. 4, the edge extraction module EEM obtains an edge prediction graph

The specific contents of (3) include: first use->

Convolutions of (a) will feature->

And features->

The number of channels is compressed and then the characteristic +.>

Upsampling to and feature->

After the same size, the two are spliced together and then passed through one

Is fused by convolution of one +.>

Normalized by convolution and Sigmiod functions to obtain an edge prediction graph

The edge loss function here is chosen to be +.>

. Due to low-level features->

The noise in (a) is more, so that the high-level semantic feature (EEM) is introduced into the Edge Extraction Module (EEM)>

As an aid. The main function of the edge extraction module EEM is to provide a valuable edge prior for subsequent segmentation, so that the model can better segment the edge contour of the camouflage target.

Step five, effectively fusing the global features obtained in the step three with the edge contour information obtained in the step fourResultant edge enhancement features

FIG. 5 is a flowchart illustrating an implementation of an edge enhancement module ERM according to an embodiment of the present invention, as shown in FIG. 5, the edge enhancement module ERM obtains edge enhancement features

The specific contents of (3) include: first of all the edge profile is->

Downsampling to output feature +/with position aware circular convolution module PARCM>

The same size and then +.>

Element-wise multiplication is performed and the result is connected with +.>

Element-by-element addition followed by a +.>

The convolution layer is used for activating functions through batch normalization and ReLU, then introducing a frequency channel attention component FcaNet, and adding the input and the output of the frequency channel attention component element by element through residual connection to obtain boundary strengthening characteristics->

。

The edge strengthening module ERM introduces the frequency channel attention component FcaNet, fig. 6 is a flowchart of the implementation of the frequency channel attention component FcaNet, as shown in fig. 6, in which FcaNet extracts different frequency component information through two-dimensional discrete cosine transform DCT to combine, and compared with a single global average pooling operation, various feature information can be captured, and edge details are enhanced.

The two-dimensional discrete cosine transform DCT is calculated as follows:

in the formula

For the width and height of the image +.>

Is->

The value of the pixel point, frequency component +.>

The method can be regarded as the weighted sum of each input point in the image, the cosine part is equivalent to the weight, and the conversion from the space domain to the frequency domain can be realized through the formula. Wherein the lowest frequency component->

Is that

Since global average pooling can be expressed as

So the lowest frequency component +.>

And->

Proportional to the ratio.

Order the

The two-dimensional inverse discrete cosine transform can be expressed as:

the image signal is composed of global average pooling term GAP and other frequency components

Pixel value +.>

Can be defined by all frequency components at->

The channel attention component such as SENet only considers GAP (lowest frequency component information) to lose a large amount of available information, fcaNet extracts different frequency component information through two-dimensional discrete cosine transform DCT and effectively combines the information, so that various characteristic information can be obtained, and characteristic representation and edge detail are enhanced.

Step six, the edge strengthening characteristic obtained in the step five is obtained

Carrying out multi-scale feature fusion to obtain multi-scale polymerized features +.>

The multi-scale fusion module MSFM introduces a multi-scale attention mechanism MSAM, and FIG. 7 is a flow chart for implementing the multi-scale attention mechanism MSAM, as shown in FIG. 7, the MSAM is composed of two branches, one branch uses a global average pooling layer and two point-by-point convolutions to obtain global context information so as to highlight a large object in global distribution, and the other branch maintains the original feature size, and only two point-by-point convolutions are adopted to obtain local context information, so that small objects are prevented from being ignored. And finally, splicing the two branches, and obtaining a multi-scale channel attention coefficient through a Sigmoid function. The contributions of the features of different layers to the task are different, and the fusion of the multi-layer features can complement each other to obtain comprehensive feature expression. Fig. 8 is a flowchart of an implementation of the MSFM module according to an embodiment of the present invention, where the formula is as follows:

wherein ,

is characterized by->

and />

Results after MSFM by a multiscale fusion module, < >>

Representing a multiscale attention mechanism MSAM component, < ->

Representing low-level features, ++>

Representing high-level features->

High-level features representing up-sampling twice +.>

，/>

Representing element addition operations, ++>

Representing the element multiplication operation.

The low-level features and the high-level features are fused to obtain

Then input a +.>

Convolutional layer, then batch normalized and ReLU activatedThe function outputs the fused multi-scale features. In particular implementations, the present embodiment allows for lower level features of greater spatial resolution to require more computing resources than high level semantic features and to contribute less to the performance of the model. Based on this observation, only the advanced features +.>

Feature fusion is performed in which->

Feature fusion is carried out through a multi-scale fusion module MSFM to obtain +.>

Then ∈>

and />

Feature fusion is carried out through a multi-scale fusion module MSFM to obtain

Features of different scales have different contributions to tasks, and feature fusion of multiple scales can be mutually complemented through a multi-scale fusion module MSFM so as to obtain comprehensive feature representation.

And seventhly, convolving the multi-scale aggregation features obtained in the step six to obtain a camouflage target prediction graph, and performing deep supervision on the camouflage target prediction graph through a binary label graph of the camouflage target.

By mixing the above obtained materials

By a +.>

Convolution to obtain a single-channel gray level image, 8 times up-sampling to obtain a final camouflage target prediction image, and obtaining a binary label image of the camouflage target +.>

Performing depth supervision, calculating its structured loss, and selecting a loss function as binary cross loss +.>

And weighted cross ratio loss->

. The total loss function of the final model is:

wherein ,

and />

Is the weight factor of each loss, +.in experimental simulation>

and />

Taking

1 and 3 respectively, < > and->

Representing structural loss, < >>

Represents edge loss, ++>

Is a predicted camouflage target saliency map, +.>

Representing a predicted camouflage target edge map +.>

A label representing the saliency of a camouflage object,>

an edge tag representing a camouflage object.

Finally, using test set image

To verify the effectiveness of the present invention, five main stream camouflage target segmentation models UGTR, MGL-R, PFNet, EGNet, SINet were tested on the camouflage target data set CAMO, CHAMELEON, COD10K, NC K partial image segmentation results and compared with the camouflage target prediction results of the present invention.

The experimental simulation content of the camouflage target detection method based on the attention mechanism and the convolutional neural network is as follows:

the experimental platform adopts a 64-bit Ubuntu system, the version of the system is 20.04.4, the model of the GPU is GeForce RTX 2080 Ti, python is used as a programming language, the version of the system is Python 3.8, and PyCharm is used as a software development platform; the model is realized by using a deep learning framework Pytorch 1.4, and the input image is a 3-channel RGB image with the size of

In the training stage, the batch size (batch size) is 24, the iteration round (epoch) is 40, the parameters are optimized by using an AdaX optimizer, the initial learning rate is set to 3e-4, the initial learning rate is continuously adjusted to the power of 0.9 by adopting a poly strategy, and the whole network training time is about 1 hour under the acceleration of three GeForce RTX 2080 Ti GPUs. FIG. 9 is a graph showing the comparison of five main stream camouflage target segmentation models in the comparison test of the present invention with the image segmentation results of different camouflage target data sets by the method of the present inventionWherein->

The method and the device can effectively detect the camouflage target by comparing the camouflage target significance labels and Our, refine the edge outline of the camouflage target and improve the accuracy of camouflage target detection.

The foregoing is merely a preferred embodiment of the present invention and is not intended to limit the present invention in any way. Any person skilled in the art will make any equivalent substitution or modification to the technical solution and technical content disclosed in the invention without departing from the scope of the technical solution of the invention, and the technical solution of the invention is not departing from the scope of the invention.

Claims

1. The camouflage target detection method based on the attention mechanism and the convolutional neural network is characterized by comprising the following steps of:

s2, inputting the training set image into a backbone network of a pre-constructed camouflage target detection network model to extract multi-scale features containing the camouflage target image

；

s4, extracting edge contour information of the camouflage target by using an edge extraction module EEM

Further, an edge prediction map is obtained>

And by camouflaging the edge tag of the object +.>

For->

Performing boundary supervision;

s5, effectively fusing the global features obtained in the step S3 with the edge contour information obtained in the step S4, and then fusing the multi-scale features to obtain multi-scale aggregated features

；

s7, taking the test set image as the input of the camouflage target detection network model after training, and obtaining a final camouflage target detection result.

2. The method for camouflage target detection based on an attention mechanism and a convolutional neural network of claim 1, wherein the backbone network uses an EfficientNet-B4 model in the EfficientNet series to extract multi-scale features containing camouflage target images.

3. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network as set forth in claim 2, wherein in step S4, the edge extraction module EEM is utilized to extract edge contour information of the camouflage target, thereby obtaining an edge prediction graph

And by camouflaging the edge tag of the object +.>

For->

Advanced semantic feature ++Stage 5 output from feature extraction layer>

Will->

Upsampling four times to obtain an edge prediction map +.>

By camouflage the edge tag of the object->

For->

Boundary supervision is performed using an edge loss function of +.>

。

4. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein the position-aware cyclic convolution module PARCM in the step S3 includes a position-aware cyclic convolution component ParC and a channel attention component, wherein the position-aware cyclic convolution component ParC extracts global features by using global cyclic convolution GCC, and introduces residual connection in the position-aware cyclic convolution module PARCM.

5. The method for camouflage target detection based on an attention mechanism and a convolutional neural network according to claim 4, wherein a position embedding strategy is introduced in a position aware cyclic convolution module PARCM.

6. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 4, wherein in the channel attention component, nonlinear characteristics are introduced through a feedforward neural network FFN, and a channel attention mechanism SE Block is added after FFN to highlight a key channel.

7. The method for detecting the camouflage target based on the attention mechanism and the convolutional neural network according to claim 1, wherein the global features obtained in the step S3 and the edge contour information obtained in the step S4 are effectively fused, and then multi-scale feature fusion is performed to obtain multi-scale aggregated features

Global feature +.>

Fusion to give the feature->

The method comprises the steps of carrying out a first treatment on the surface of the Then introducing a multi-scale attention mechanism MSAM into a multi-scale fusion module MSFM for multi-scale operationScale feature fusion, wherein the feature->

Then the feature->

And features->

Feature fusion is carried out through an MSFM multi-scale fusion module to obtain multi-scale aggregation features +.>

。

8. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein a frequency channel attention component FcaNet is introduced into an edge enhancement module ERM, and different frequency component information is extracted and combined through a two-dimensional discrete cosine transform DCT.

9. The method for detecting a camouflage target based on an attention mechanism and a convolutional neural network according to claim 1, wherein the structured loss function selected by deep supervision of the camouflage target by its binary label map in step S6 is a weighted binary cross loss

And weighted cross ratio loss->

。

10. An attention mechanism and convolutional neural network based on claim 1A method for detecting a camouflage target, characterized in that the total loss function of a camouflage target detection model