CN112907530A - Method and system for detecting disguised object based on grouped reverse attention - Google Patents

Method and system for detecting disguised object based on grouped reverse attention Download PDF

Info

Publication number
CN112907530A
CN112907530A CN202110180500.8A CN202110180500A CN112907530A CN 112907530 A CN112907530 A CN 112907530A CN 202110180500 A CN202110180500 A CN 202110180500A CN 112907530 A CN112907530 A CN 112907530A
Authority
CN
China
Prior art keywords
convolution
image
module
reverse
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110180500.8A
Other languages
Chinese (zh)
Other versions
CN112907530B (en
Inventor
程明明
范登平
季葛鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen MicroBT Electronics Technology Co Ltd
Original Assignee
Nankai University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nankai University filed Critical Nankai University
Priority to CN202110180500.8A priority Critical patent/CN112907530B/en
Publication of CN112907530A publication Critical patent/CN112907530A/en
Application granted granted Critical
Publication of CN112907530B publication Critical patent/CN112907530B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/73Deblurring; Sharpening
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/181Segmentation; Edge detection involving edge growing; involving edge linking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Multimedia (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a system for detecting a disguised object based on grouped reverse attention, which comprises the following steps: acquiring an image to be detected; extracting the characteristics of an image to be detected; searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object; and processing by utilizing a grouping reverse attention mode based on the feature extraction result and the positioning diagram of the disguised positioning object to obtain the contour diagram of the disguised object.

Description

Method and system for detecting disguised object based on grouped reverse attention
Technical Field
The invention relates to the technical field of image detection, in particular to a method and a system for detecting a disguised object based on grouping reverse attention.
Background
The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.
With the continuous improvement of the requirements of people on picture processing, the application of the fake object detection algorithm is increasingly wide. Relevant research on sensory ecology shows that the camouflage object has high similarity with the background, and the visual perception system of an observer is greatly deceived by the camouflage strategy. Thus, such detection tasks are far more challenging than traditional object detection. The camouflage object detection task requires that the algorithm model can understand the high-level camouflage semantics in the images and detect the corresponding camouflage objects from the images. This requires the model to be able to understand the image semantic content and to distinguish the camouflage pattern.
Disclosure of Invention
In order to solve the defects of the prior art, the invention provides a method and a system for detecting a disguised object based on grouped reverse attention; the method is used for mining the camouflage target in the picture. The user provides a picture and the algorithm detects the exact contour that contains the disguised object.
In a first aspect, the present invention provides a method for detecting a disguised object based on grouped reverse attention;
the method for detecting the disguised object based on the grouped reverse attention comprises the following steps:
acquiring an image to be detected; extracting the characteristics of an image to be detected;
searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object;
and processing by utilizing a grouping reverse attention mode based on the feature extraction result and the positioning diagram of the disguised positioning object to obtain the contour diagram of the disguised object.
In a second aspect, the present invention provides a camouflage object detection system based on grouped reverse attention;
a group reverse attention based camouflage object detection system comprising:
an acquisition module configured to: acquiring an image to be detected; extracting the characteristics of an image to be detected;
a search module configured to: searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object;
an output module configured to: and processing by utilizing a grouping reverse attention mode based on the feature extraction result and the positioning diagram of the disguised positioning object to obtain the contour diagram of the disguised object.
In a third aspect, the present invention further provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs are stored in the memory, and when the electronic device is running, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first aspect.
In a fourth aspect, the present invention also provides a computer-readable storage medium for storing computer instructions which, when executed by a processor, perform the method of the first aspect.
Compared with the prior art, the invention has the beneficial effects that:
the method is used for mining the camouflage target in the picture. The user provides a picture and the algorithm detects the exact contour that contains the disguised object. In particular, the texture enhancement module is adopted because the visual fields of different scales are beneficial to capturing detailed texture information of richer scales; the adjacent part decoder is adopted because the high-level rich semantics and the low-level weak semantics are fused, which is beneficial to improving the feature expression capability of the model; the packet reverse attention module is employed because fusing reverse attention in a packet manner helps to explicitly optimize coarse features from the encoder.
Advantages of additional aspects of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the invention and together with the description serve to explain the invention and not to limit the invention.
FIG. 1 is a flow chart of a method of the first embodiment;
FIG. 2 is a neighbor-joining partial decoder of the first embodiment;
FIG. 3(a) is a packet reversing module of the first embodiment;
FIG. 3(b) is a reverse attention module of the first embodiment;
FIG. 4(a) is an input image of the first embodiment;
FIG. 4(b) is a truth label for the first embodiment;
FIG. 4(c) is a diagram showing the effect of the SINet of the present invention according to the first embodiment;
FIG. 4(d) is a diagram showing the effect of SINet _ evpr of the first embodiment;
fig. 4(e) is an effect diagram of PraNet according to the first embodiment;
fig. 4(f) is a diagram showing the effect of the PFANet according to the first embodiment.
Detailed Description
It is to be understood that the following detailed description is exemplary and is intended to provide further explanation of the invention as claimed. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention. As used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, and it should be understood that the terms "comprises" and "comprising", and any variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiments and features of the embodiments of the present invention may be combined with each other without conflict.
Example one
The present embodiment provides a disguised object detection method based on grouped reverse attention;
as shown in fig. 1, the disguised object detection method based on grouped reverse attention includes:
s101: acquiring an image to be detected; extracting the characteristics of an image to be detected;
s102: searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object;
s103: and processing by utilizing a grouping reverse attention mode based on the feature extraction result and the positioning diagram of the disguised positioning object to obtain the contour diagram of the disguised object.
As shown in fig. 1, the technical solution decomposes the masquerading target detection task into two stages of searching and identifying.
In the searching stage, firstly, a feature encoder based on Res2Net framework network is used for extracting features of an input picture, then a texture enhancement module is used for respectively enhancing the details of the high-three-layer feature map, and finally a neighbor connection part decoder module is used for obtaining a primary camouflage object positioning map.
For the recognition phase, we introduce a step-by-step optimization strategy from coarse to fine, using multiple cascaded block-Reversal Attention (GRA) blocks in each feature level, to further refine the features from the encoder and improve the preliminary positioning map. In the testing stage, a Sigmoid activation function is adopted to obtain a final prediction graph.
As one or more embodiments, in S101, feature extraction is performed on an image to be detected; the method specifically comprises the following steps:
based on Res2Net-50 skeleton network, extracting the characteristics of the image to be detected;
wherein, Res2Net-50 skeleton network includes: the first volume block, the second volume block, the third volume block, the fourth volume block and the fifth volume block are connected in sequence;
the first convolution module is used for performing convolution processing on an image to be detected and outputting a first characteristic diagram;
the second convolution module is used for performing convolution processing on the first characteristic diagram and outputting a second characteristic diagram;
the third convolution module is used for performing convolution processing on the second feature map and outputting a third feature map;
the fourth convolution module is used for performing convolution processing on the third convolution image and outputting a fourth feature image;
and the fifth convolution module is used for performing convolution processing on the fourth feature map and outputting a fifth feature map.
Wherein, first convolution piece includes: a first convolution layer, a first ReLU layer and a first batch of normalization layers;
a second volume block comprising: a second convolutional layer, a second ReLU layer and a second batch normalization layer;
a third volume block comprising: a third convolutional layer, a third ReLU layer and a third batch normalization layer;
a fourth volume block comprising: a fourth convolutional layer, a fourth ReLU layer and a fourth normalization layer;
a fifth volume block comprising: a fifth convolutional layer, a fifth ReLU layer, and a fifth normalization layer.
In the technical scheme, a Res2Net-50 framework network with a full connection layer removed is used as a characteristic encoder, and the model structure is shown in table 1. Each convolutional layer is followed by a ReLU layer and a batch normalization layer. Each convolution block structure is shown in table 1. The feature encoder uses the data stream with the side connection only at the upper three layers (i.e., convolutional blocks 3, 4, and 5) for subsequent decoders and stage-by-stage optimization.
TABLE 1 Res2Net-50 construction Table (assuming input picture size of 224x224)
Figure BDA0002942079960000051
Figure BDA0002942079960000061
As one or more embodiments, the S102: searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object; the method specifically comprises the following steps:
the third characteristic diagram is processed by the first texture enhancement module and outputs a first texture enhancement diagram;
the fourth characteristic diagram is processed by the second texture enhancement module, and the second texture enhancement diagram is output;
the fifth characteristic diagram is processed by a third texture enhancement module, and a third texture enhancement diagram is output;
the first, second and third texture enhancement maps are simultaneously input to a neighboring connected portion decoder to obtain a localization map of the camouflage object.
Further, the internal structure of the first texture enhancement module, the second texture enhancement module and the third texture enhancement module is uniform.
Further, the first texture enhancement module comprises: a residual branch and four side branches;
wherein, the four side branches are in parallel relation;
wherein the four side branches include: a first side branch, a second side branch, a third side branch, and a fourth side branch;
a residual branch comprising: the convolution kernels are sequentially connected with the convolution layer 1 x 1 and the adder;
a first side branch comprising: a convolution layer with a convolution kernel of 1 x 1;
a second side branch comprising: the four convolution layers are connected in series, and sequentially comprise a convolution layer with a convolution kernel of 1 × 1, a convolution layer with a convolution kernel of 1 × 3, a convolution layer with a convolution kernel of 3 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a void rate of 3;
a third side branch comprising: the four convolution layers are connected in series, and sequentially comprise a convolution layer with a convolution kernel of 1 × 1, a convolution layer with a convolution kernel of 1 × 5, a convolution layer with a convolution kernel of 5 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a void ratio of 5;
a fourth side branch comprising: the four convolution layers are connected in series, and sequentially comprise a convolution layer with a convolution kernel of 1 × 1, a convolution layer with a convolution kernel of 1 × 7, a convolution layer with a convolution kernel of 7 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a void rate of 7;
the input ends of the first, second, third and fourth side branches are all connected with the input end of the residual error branch, and the input end of the residual error branch is used as the input end of the first texture enhancement module;
the output ends of the first, second, third and fourth side branches are all connected with the input end of the splicer;
the output end of the splicer is connected with the input end of the adder;
the output end of the adder is used as the output end of the first texture enhancement module.
Further, the working principle of the first texture enhancement module includes: as shown in fig. 1, the Texture Enhancement Module (TEM) is divided into four side branches and one residual branch. For the four side branches, different asymmetric convolution kernel sizes are adopted to reduce the amount of calculation, and convolution with different void rates is used to obtain visual receptive fields of different scales. Finally, channel splicing operation is carried out on the feature maps from different side branches, and the feature maps from the residual branches are added to obtain an enhanced texture detail map fk′,k={3,4,5}。
Further, the neighbor connection part decoder, the internal structure includes:
a first input, a second input, and a third input;
the first input end is used for inputting a first texture enhancement map;
the second input end is used for inputting a second texture enhancement map;
the third input end is used for inputting a third texture enhancement map;
the first texture enhancement image and the second texture enhancement image are processed by a first multiplier to obtain a first multiplication result;
the second texture enhancement image and the third texture enhancement image are processed by a second multiplier to obtain a second multiplication result;
the first multiplication result and the second multiplication result are processed by a third multiplier to obtain a third multiplication result;
and inputting the second multiplication result, the third multiplication result and the third texture enhancement map into a UNet structure decoder, and outputting a decoding result.
Further, the neighbor-connected partial decoder operates on the principle that: as shown in FIG. 2, the UNet structure decoder is chosen to remove its lower two layers into a partial decoder, so that it is adapted to generate three texture enhancement maps fk' input; and a strategy of carrying out feature propagation on a neighboring layer is also adopted, so that semantic features of a high layer are propagated to a low layer by layer to ensure the similarity of the semantics of the adjacent layers, and an efficient coding process is realized for generating a preliminary camouflage target positioning map C6
As one or more embodiments, in S103, based on the feature extraction result and the positioning map of the disguised positioning object, processing is performed in a grouping reverse attention manner, so as to obtain a contour map of the disguised object; the method specifically comprises the following steps:
carrying out down-sampling processing on the positioning map;
processing the downsampling processing result through a first reverse attention module; inputting the result processed by the first reverse attention module into a first grouped reverse GRA module group, and adding the obtained result with the positioning graph to obtain a first identification graph;
processing the first recognition map through a second reverse attention module; inputting the result processed by the second reverse attention module into a second grouped reverse GRA module group, and adding the obtained result with the result sampled on the first recognition graph to obtain a second recognition graph;
processing the second recognition map through a third reverse attention module; inputting the result processed by the third reverse attention module into a third grouping reverse GRA module group, and adding the obtained result with the result sampled on the second recognition image to obtain a third recognition image;
and processing the third recognition image by adopting an activation function to obtain a contour image of the camouflage object.
Further, the internal structure of the first, second and third reverse attention modules is identical.
Further, as shown in fig. 3(b), the first reverse attention module includes: the sigmoid function and the reverse attention module are connected in sequence;
the input end of the sigmoid function is used for inputting the positioning image after down-sampling processing;
and the reverse attention module is used for interchanging the background color and the camouflage object color of the output image of the sigmoid function.
Further, the first reverse attention module operates on the principle that: by calculating the reverse region of the normalized input feature map, the attention of the network is turned to a non-target background region, and the learning capability of the network for the foreground region and the background region is enhanced simultaneously.
Further, the internal structures of the first grouping reverse GRA module group, the second grouping reverse GRA module group and the third grouping reverse GRA module group are the same.
Further, as shown in fig. 3(a), the first grouping reverse GRA module group includes: three grouped reverse GRA modules connected in series in sequence; each packet reverse GRA module comprising:
two input terminals and two output terminals;
one input end is used for inputting the image processed by the first reverse attention module;
the other input end is used for inputting a corresponding third texture enhancement map;
dividing the third texture enhancement map into a plurality of groups along the channel;
inserting the images processed by the first reverse attention module into each group to obtain an insertion result;
splicing the insertion results along the channel to obtain a splicing characteristic diagram;
performing convolution processing on the spliced feature map, and adding the result after the convolution processing and the third texture enhancement map point by point to obtain a new feature map;
after the new characteristic diagram is subjected to convolution processing, a single-channel diagram is obtained;
and adding the image of the single channel and the image processed by the first reverse attention module to obtain a new image.
Multiple improvement of preliminary positioning graph C using concatenated packet reverse attention blocks (GRAs)6The quality of (c). Specifically, as shown in fig. 3(a) and 3(b), the packet reverse GRA module operates according to the following principle: receiving feature maps from the texture enhanced encoder
Figure BDA0002942079960000101
And mapping to be improved
Figure BDA0002942079960000102
Firstly, a characteristic diagram with a C channel
Figure BDA0002942079960000103
Is divided into g along the channeliGroups, and inserting positioning chart with reversed attention between each group
Figure BDA0002942079960000104
Then splicing the two into a composite material with (C + g) along the channeli) Channel size stitching profile
Figure BDA0002942079960000105
Recovering to original C channel by 1 × 1 convolution, and matching with original input characteristic diagram
Figure BDA0002942079960000106
Adding point by point to obtain a characteristic diagram
Figure BDA0002942079960000107
And will be characterized by
Figure BDA0002942079960000108
Obtaining a single-channel image and a positioning image to be improved after 1 x 1 convolution
Figure BDA0002942079960000109
Adding the two branches to form a residual error learning process of the double-branch flow。
Deep neural network training, parameter initialization: for convolution blocks 1-5, the pre-training parameters of Res2Net-50 on the ImageNet dataset are used, and the newly added network layer structure is initialized by uniformly using Gaussian distribution with variance of 0.01 and mean of 0.
Training an optimizer: in the technical scheme, a gradient descent method based on adam (adaptive motion estimation) is adopted to solve a convolution template parameter w and a bias parameter b of a neural network model, in each iteration process, a prediction result error is calculated and reversely propagated to the convolution neural network model, a gradient is calculated, and parameters of the convolution neural network model are updated.
The front end A receives data (pictures input by a user) and uploads the data to the background, and the background detects all camouflage objects in the pictures by using the technical scheme and then outputs the detection to the foreground B.
The present invention may use different network skeletons as feature encoders,
the present invention can be applied to prediction improvement processes from coarse to fine using different numbers of grouped reverse attention blocks.
Table 2 shows the results of the quantitative comparison with the segmentation model for the current 12 leading edges.
Table 2 table is compared quantitatively with the leading edge segmentation model.
Figure BDA0002942079960000111
The qualitative effect comparison graphs of the technical scheme compared with the current three leading edge models are shown in fig. 4(a) -4 (f).
Example two
The present embodiment provides a camouflage object detection system based on grouped reverse attention;
a group reverse attention based camouflage object detection system comprising:
an acquisition module configured to: acquiring an image to be detected; extracting the characteristics of an image to be detected;
a search module configured to: searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object;
an output module configured to: and processing by utilizing a grouping reverse attention mode based on the feature extraction result and the positioning diagram of the disguised positioning object to obtain the contour diagram of the disguised object.
It should be noted here that the above-mentioned obtaining module, the searching module and the outputting module correspond to steps S101 to S103 in the first embodiment, and the above-mentioned modules are the same as the examples and application scenarios realized by the corresponding steps, but are not limited to the disclosure of the first embodiment. It should be noted that the modules described above as part of a system may be implemented in a computer system such as a set of computer-executable instructions.
In the foregoing embodiments, the descriptions of the embodiments have different emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
The proposed system can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and in actual implementation, there may be other divisions, for example, multiple modules may be combined or integrated into another system, or some features may be omitted, or not executed.
EXAMPLE III
The present embodiment also provides an electronic device, including: one or more processors, one or more memories, and one or more computer programs; wherein, a processor is connected with the memory, the one or more computer programs are stored in the memory, and when the electronic device runs, the processor executes the one or more computer programs stored in the memory, so as to make the electronic device execute the method according to the first embodiment.
It should be understood that in this embodiment, the processor may be a central processing unit CPU, and the processor may also be other general purpose processors, digital signal processors DSP, application specific integrated circuits ASIC, off-the-shelf programmable gate arrays FPGA or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and so on. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include both read-only memory and random access memory, and may provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store device type information.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software.
The method in the first embodiment may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in the processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, among other storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor. To avoid repetition, it is not described in detail here.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
Example four
The present embodiments also provide a computer-readable storage medium for storing computer instructions, which when executed by a processor, perform the method of the first embodiment.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The method for detecting the disguised object based on the grouped reverse attention is characterized by comprising the following steps of:
acquiring an image to be detected; extracting the characteristics of an image to be detected;
searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object;
and processing by utilizing a grouping reverse attention mode based on the feature extraction result and the positioning diagram of the disguised positioning object to obtain the contour diagram of the disguised object.
2. The method for detecting a disguised object based on grouped reverse attention as claimed in claim 1, wherein the feature extraction is performed on the image to be detected; the method specifically comprises the following steps:
based on Res2Net-50 skeleton network, extracting the characteristics of the image to be detected;
wherein, Res2Net-50 skeleton network includes: the first volume block, the second volume block, the third volume block, the fourth volume block and the fifth volume block are connected in sequence;
the first convolution module is used for performing convolution processing on an image to be detected and outputting a first characteristic diagram;
the second convolution module is used for performing convolution processing on the first characteristic diagram and outputting a second characteristic diagram;
the third convolution module is used for performing convolution processing on the second feature map and outputting a third feature map;
the fourth convolution module is used for performing convolution processing on the third convolution image and outputting a fourth feature image;
and the fifth convolution module is used for performing convolution processing on the fourth feature map and outputting a fifth feature map.
3. The method for detecting a disguised object based on the grouping reverse attention as claimed in claim 1, wherein the disguised object in the image to be detected is searched based on the feature extraction result to obtain a localization map of the disguised object; the method specifically comprises the following steps:
the third characteristic diagram is processed by the first texture enhancement module and outputs a first texture enhancement diagram;
the fourth characteristic diagram is processed by the second texture enhancement module, and the second texture enhancement diagram is output;
the fifth characteristic diagram is processed by a third texture enhancement module, and a third texture enhancement diagram is output;
the first, second and third texture enhancement maps are simultaneously input to a neighboring connected portion decoder to obtain a localization map of the camouflage object.
4. The method of detecting a disguised object based on grouped reverse attention as claimed in claim 3, wherein said first texture enhancing module comprises: a residual branch and four side branches;
wherein, the four side branches are in parallel relation;
wherein the four side branches include: a first side branch, a second side branch, a third side branch, and a fourth side branch;
a residual branch comprising: the convolution kernels are sequentially connected with the convolution layer 1 x 1 and the adder;
a first side branch comprising: a convolution layer with a convolution kernel of 1 x 1;
a second side branch comprising: the four convolution layers are connected in series, and sequentially comprise a convolution layer with a convolution kernel of 1 × 1, a convolution layer with a convolution kernel of 1 × 3, a convolution layer with a convolution kernel of 3 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a void rate of 3;
a third side branch comprising: the four convolution layers are connected in series, and sequentially comprise a convolution layer with a convolution kernel of 1 × 1, a convolution layer with a convolution kernel of 1 × 5, a convolution layer with a convolution kernel of 5 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a void ratio of 5;
a fourth side branch comprising: the four convolution layers are connected in series, and sequentially comprise a convolution layer with a convolution kernel of 1 × 1, a convolution layer with a convolution kernel of 1 × 7, a convolution layer with a convolution kernel of 7 × 1 and a convolution layer with a convolution kernel of 3 × 3 and a void rate of 7;
the input ends of the first, second, third and fourth side branches are all connected with the input end of the residual error branch, and the input end of the residual error branch is used as the input end of the first texture enhancement module;
the output ends of the first, second, third and fourth side branches are all connected with the input end of the splicer;
the output end of the splicer is connected with the input end of the adder;
the output end of the adder is used as the output end of the first texture enhancement module.
5. The packet reverse attention based camouflage object detecting method according to claim 3, wherein said neighbor connection section decoder, having an internal structure comprising:
a first input, a second input, and a third input;
the first input end is used for inputting a first texture enhancement map;
the second input end is used for inputting a second texture enhancement map;
the third input end is used for inputting a third texture enhancement map;
the first texture enhancement image and the second texture enhancement image are processed by a first multiplier to obtain a first multiplication result;
the second texture enhancement image and the third texture enhancement image are processed by a second multiplier to obtain a second multiplication result;
the first multiplication result and the second multiplication result are processed by a third multiplier to obtain a third multiplication result;
and inputting the second multiplication result, the third multiplication result and the third texture enhancement map into a UNet structure decoder, and outputting a decoding result.
6. The method for detecting a disguised object based on grouped reverse attention as claimed in claim 1, wherein the method for detecting a disguised object based on the result of the feature extraction and the localization map of the disguised localization object is performed by a grouped reverse attention method to obtain the contour map of the disguised object; the method specifically comprises the following steps:
carrying out down-sampling processing on the positioning map;
processing the downsampling processing result through a first reverse attention module; inputting the result processed by the first reverse attention module into a first grouped reverse GRA module group, and adding the obtained result with the positioning graph to obtain a first identification graph;
processing the first recognition map through a second reverse attention module; inputting the result processed by the second reverse attention module into a second grouped reverse GRA module group, and adding the obtained result with the result sampled on the first recognition graph to obtain a second recognition graph;
processing the second recognition map through a third reverse attention module; inputting the result processed by the third reverse attention module into a third grouping reverse GRA module group, and adding the obtained result with the result sampled on the second recognition image to obtain a third recognition image;
and processing the third recognition image by adopting an activation function to obtain a contour image of the camouflage object.
7. The method of claim 6, wherein the first group of reverse GRA modules comprises: three grouped reverse GRA modules connected in series in sequence; each packet reverse GRA module comprising:
two input terminals and two output terminals;
one input end is used for inputting the image processed by the first reverse attention module;
the other input end is used for inputting a corresponding third texture enhancement map;
dividing the third texture enhancement map into a plurality of groups along the channel;
inserting the images processed by the first reverse attention module into each group to obtain an insertion result;
splicing the insertion results along the channel to obtain a splicing characteristic diagram;
performing convolution processing on the spliced feature map, and adding the result after the convolution processing and the third texture enhancement map point by point to obtain a new feature map;
after the new characteristic diagram is subjected to convolution processing, a single-channel diagram is obtained;
and adding the image of the single channel and the image processed by the first reverse attention module to obtain a new image.
8. A camouflage object detection system based on grouped reverse attention, which is characterized by comprising:
an acquisition module configured to: acquiring an image to be detected; extracting the characteristics of an image to be detected;
a search module configured to: searching a camouflage object in the image to be detected based on the feature extraction result to obtain a positioning image of the camouflage object;
an output module configured to: and processing by utilizing a grouping reverse attention mode based on the feature extraction result and the positioning diagram of the disguised positioning object to obtain the contour diagram of the disguised object.
9. An electronic device, comprising: one or more processors, one or more memories, and one or more computer programs; wherein a processor is connected to the memory, the one or more computer programs being stored in the memory, the processor executing the one or more computer programs stored in the memory when the electronic device is running, to cause the electronic device to perform the method of any of the preceding claims 1-7.
10. A computer-readable storage medium storing computer instructions which, when executed by a processor, perform the method of any one of claims 1 to 7.
CN202110180500.8A 2021-02-08 2021-02-08 Method and system for detecting disguised object based on grouped reverse attention Active CN112907530B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110180500.8A CN112907530B (en) 2021-02-08 2021-02-08 Method and system for detecting disguised object based on grouped reverse attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110180500.8A CN112907530B (en) 2021-02-08 2021-02-08 Method and system for detecting disguised object based on grouped reverse attention

Publications (2)

Publication Number Publication Date
CN112907530A true CN112907530A (en) 2021-06-04
CN112907530B CN112907530B (en) 2022-05-17

Family

ID=76123248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110180500.8A Active CN112907530B (en) 2021-02-08 2021-02-08 Method and system for detecting disguised object based on grouped reverse attention

Country Status (1)

Country Link
CN (1) CN112907530B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536978A (en) * 2021-06-28 2021-10-22 杭州电子科技大学 Method for detecting disguised target based on significance
CN113643268A (en) * 2021-08-23 2021-11-12 四川大学 Industrial product defect quality inspection method and device based on deep learning and storage medium
CN115019140A (en) * 2022-06-02 2022-09-06 杭州电子科技大学 Attention-guided camouflage target detection method
CN115223018A (en) * 2022-06-08 2022-10-21 东北石油大学 Cooperative detection method and device for disguised object, electronic device and storage medium
CN115937526A (en) * 2023-03-10 2023-04-07 鲁东大学 Bivalve gonad area segmentation method based on search recognition network
CN116894943A (en) * 2023-07-20 2023-10-17 深圳大学 Double-constraint camouflage target detection method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064005A (en) * 2003-02-13 2007-10-31 日本电气株式会社 Unauthorized person detection device and unauthorized person detection method
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
CN110414350A (en) * 2019-06-26 2019-11-05 浙江大学 The face false-proof detection method of two-way convolutional neural networks based on attention model
CN111368712A (en) * 2020-03-02 2020-07-03 四川九洲电器集团有限责任公司 Hyperspectral image disguised target detection method based on deep learning
US20200372660A1 (en) * 2019-05-21 2020-11-26 Beihang University Image salient object segmentation method and apparatus based on reciprocal attention between foreground and background
CN112288008A (en) * 2020-10-29 2021-01-29 四川九洲电器集团有限责任公司 Mosaic multispectral image disguised target detection method based on deep learning
CN112308081A (en) * 2020-11-05 2021-02-02 南强智视(厦门)科技有限公司 Attention mechanism-based image target prediction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101064005A (en) * 2003-02-13 2007-10-31 日本电气株式会社 Unauthorized person detection device and unauthorized person detection method
CN109165660A (en) * 2018-06-20 2019-01-08 扬州大学 A kind of obvious object detection method based on convolutional neural networks
US20200372660A1 (en) * 2019-05-21 2020-11-26 Beihang University Image salient object segmentation method and apparatus based on reciprocal attention between foreground and background
CN110414350A (en) * 2019-06-26 2019-11-05 浙江大学 The face false-proof detection method of two-way convolutional neural networks based on attention model
CN111368712A (en) * 2020-03-02 2020-07-03 四川九洲电器集团有限责任公司 Hyperspectral image disguised target detection method based on deep learning
CN112288008A (en) * 2020-10-29 2021-01-29 四川九洲电器集团有限责任公司 Mosaic multispectral image disguised target detection method based on deep learning
CN112308081A (en) * 2020-11-05 2021-02-02 南强智视(厦门)科技有限公司 Attention mechanism-based image target prediction method

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DENG-PING FAN ET AL.: "Camouflaged Object Detection", 《2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
DENG-PING FAN ET AL.: "PraNet: Parallel Reverse Attention Network for Polyp Segmentation", 《ARXIV:2006.11392V4 [EESS.IV]》 *
SHUHAN CHEN ET AL.: "Progressively Guided Alternate Refinement Network for RGB-D Salient Object Detection", 《ARXIV:2008.07064V1 [CS.CV]》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113536978A (en) * 2021-06-28 2021-10-22 杭州电子科技大学 Method for detecting disguised target based on significance
CN113536978B (en) * 2021-06-28 2023-08-18 杭州电子科技大学 Camouflage target detection method based on saliency
CN113643268A (en) * 2021-08-23 2021-11-12 四川大学 Industrial product defect quality inspection method and device based on deep learning and storage medium
CN113643268B (en) * 2021-08-23 2023-05-12 四川大学 Industrial product defect quality inspection method and device based on deep learning and storage medium
CN115019140A (en) * 2022-06-02 2022-09-06 杭州电子科技大学 Attention-guided camouflage target detection method
CN115019140B (en) * 2022-06-02 2023-11-21 杭州电子科技大学 Attention-guided camouflage target detection method
CN115223018A (en) * 2022-06-08 2022-10-21 东北石油大学 Cooperative detection method and device for disguised object, electronic device and storage medium
CN115937526A (en) * 2023-03-10 2023-04-07 鲁东大学 Bivalve gonad area segmentation method based on search recognition network
CN115937526B (en) * 2023-03-10 2023-06-09 鲁东大学 Method for segmenting gonad region of bivalve shellfish based on search identification network
CN116894943A (en) * 2023-07-20 2023-10-17 深圳大学 Double-constraint camouflage target detection method and system

Also Published As

Publication number Publication date
CN112907530B (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN112907530B (en) Method and system for detecting disguised object based on grouped reverse attention
Zhao et al. Jsnet: Joint instance and semantic segmentation of 3d point clouds
CN108664981B (en) Salient image extraction method and device
CN111480169B (en) Method, system and device for pattern recognition
CN108021923B (en) Image feature extraction method for deep neural network
CN111061889B (en) Automatic identification method and device for multiple labels of picture
CN111914654B (en) Text layout analysis method, device, equipment and medium
CN110023989B (en) Sketch image generation method and device
CN112734803B (en) Single target tracking method, device, equipment and storage medium based on character description
CN113901900A (en) Unsupervised change detection method and system for homologous or heterologous remote sensing image
CN113066089B (en) Real-time image semantic segmentation method based on attention guide mechanism
CN111626960A (en) Image defogging method, terminal and computer storage medium
CN110969089A (en) Lightweight face recognition system and recognition method under noise environment
CN115147598A (en) Target detection segmentation method and device, intelligent terminal and storage medium
Li et al. Gated auxiliary edge detection task for road extraction with weight-balanced loss
Bacea et al. Single stage architecture for improved accuracy real-time object detection on mobile devices
CN111612075A (en) Interest point and descriptor extraction method based on joint feature recombination and feature mixing
CN117197470A (en) Polyp segmentation method, device and medium based on colonoscope image
CN112614108A (en) Method and device for detecting nodules in thyroid ultrasound image based on deep learning
CN109767446B (en) Instance partitioning method and device, electronic equipment and storage medium
CN116935240A (en) Surface coverage classification system and method for multi-scale perception pyramid
CN111860003A (en) Image rain removing method and system based on dense connection depth residual error network
CN117237621A (en) Small sample semantic segmentation algorithm based on pixel-level semantic association
CN116363518A (en) Camouflage target detection method based on focal plane polarization imaging
WO2020224244A1 (en) Method and apparatus for obtaining depth-of-field image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240219

Address after: 518000 801 Hangsheng science and technology building, Gaoxin South liudao, Nanshan District, Shenzhen City, Guangdong Province

Patentee after: SHENZHEN BITE MICROELECTRONICS TECHNOLOGY Co.,Ltd.

Country or region after: China

Address before: 300071 Tianjin City, Nankai District Wei Jin Road No. 94

Patentee before: NANKAI University

Country or region before: China