CN115471675A - Disguised object detection method based on frequency domain enhancement - Google Patents
Disguised object detection method based on frequency domain enhancement Download PDFInfo
- Publication number
- CN115471675A CN115471675A CN202211226059.3A CN202211226059A CN115471675A CN 115471675 A CN115471675 A CN 115471675A CN 202211226059 A CN202211226059 A CN 202211226059A CN 115471675 A CN115471675 A CN 115471675A
- Authority
- CN
- China
- Prior art keywords
- frequency domain
- frequency
- camouflage
- disguised
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 12
- 238000000034 method Methods 0.000 claims abstract description 52
- 238000012549 training Methods 0.000 claims abstract description 23
- 230000011218 segmentation Effects 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000012216 screening Methods 0.000 claims abstract description 6
- 238000012360 testing method Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 12
- 230000008569 process Effects 0.000 claims description 9
- 230000008447 perception Effects 0.000 claims description 8
- 230000004044 response Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 7
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000007493 shaping process Methods 0.000 claims description 2
- 238000005070 sampling Methods 0.000 claims 2
- 230000002349 favourable effect Effects 0.000 claims 1
- 230000001131 transforming effect Effects 0.000 claims 1
- 238000005457 optimization Methods 0.000 abstract description 2
- 238000004590 computer program Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000000465 moulding Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002787 reinforcement Effects 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/56—Extraction of image or video features relating to colour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Engineering & Computer Science (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a method for detecting a disguised object based on frequency domain enhancement, which comprises the following steps: 1, constructing a camouflage object network: building a camouflage detection camouflage object network, inputting an image set of a camouflage object into the camouflage object network, and optimizing the camouflage object network through an iterative optimization loss function; 2, constructing a feature alignment module and a high-frequency channel selection module: the purpose of time-frequency feature alignment and high-frequency feature screening is achieved by using the camouflage object network; 3, dividing the network framework of the disguised object into a model training stage and a testing stage; in the model training stage, a disguised object training picture set subjected to data preprocessing is input into a disguised object network, and the disguised object network is optimized by using a frequency domain enhancement module and two-time supervision; and in the testing stage, inputting the image of the disguised object to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image.
Description
Technical Field
The invention relates to a method for detecting a camouflaged object image, in particular to a method for detecting a camouflaged object based on frequency domain enhancement.
Background
In recent years, with the rapid development of deep convolutional networks, the detection of the disguised object has made a great breakthrough. Compared with the traditional disguised object detection algorithm, the disguised object detection method based on deep learning is greatly improved in accuracy rate, high-level semantic information of an image can be acquired through a deep neural network, and the information can be used for more accurately detecting the disguised significant object in a scene. For example, the documents Deng-Ping Fan, ge-Peng Ji, ming-Ming Cheng, and Ling Shao, "converged object detection," CoRR,2021. And the documents Jingjing Ren, xiaoaeei Hu, lei Zhu, xuemiao Xu, yangyang Xu, weiming Wang, zijun Deng, and Pheng-Ann Heng, "Deep texture-aware features for camouflaged object detection," in CoRR,2021. Attempt to design texture enhancement modules or use attention mechanisms to guide the model camouflaged areas. The documents YunqiuLv, hanging Zhang, yuchao Dai, aixuan Li, bowen Liu, nick Barnes, and Dengping Fan, "Simulanouselocity localization, segment and rank the captured objects," in CVPR,2021. And the documents P.Sengottuevan, amitabh Wahi, and A.Shanmugam, "Performance of decoding through output image analysis," in ICETET,2019. It is intended to treat the segmented camouflage as a two-stage process, improving the Performance of the network algorithm.
Although the accuracy of the detection of the disguised objects can be further improved by improving the network structure by the method, the disguised objects are only detected in the RGB space, and the difference between the disguised objects and other areas in the frequency domain space is ignored, so that the further improvement of the performance of the disguised objects and other areas is prevented.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a method for detecting a disguised object based on frequency domain enhancement aiming at the defects of the prior art.
In order to solve the technical problem, the invention discloses a method for detecting a disguised object based on frequency domain enhancement.
The method disclosed by the invention is based on a frequency domain enhancement module to better detect the disguised object from the background. In particular, the invention designs a new Frequency Enhancement Module (FEM) to mine clues of disguised objects in the frequency domain. In addition, the invention provides a feature alignment method (FA) for fusing the features of the RGB domain and the frequency domain. Finally, in order to further exploit frequency information, a higher order relational module (HOR) is proposed to handle rich fusion features.
The method comprises the following specific steps:
step 1, constructing a camouflage object network based on a frequency domain enhancement module; the disguised object network comprises: the device comprises a camouflage object network framework, a frequency domain enhancement module, a feature alignment module and a high-frequency channel selection module;
the device comprises a frequency domain enhancement module, a characteristic alignment module, a high-frequency channel selection module and a frequency domain matching module, wherein the frequency domain enhancement module is used for extracting the characteristics of a disguised object in a frequency domain;
step 2, training the camouflage object network, comprising: inputting a camouflage object training image set subjected to data preprocessing into a camouflage object network, and optimizing the camouflage object network by using a frequency domain enhancement module and twice supervision to obtain a trained camouflage object network;
step 3, testing by adopting the trained camouflage object network, comprising the following steps: and inputting the disguised object image to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image, and finishing the detection of the disguised object based on frequency domain enhancement.
The step 1 comprises the following steps:
step 1-1, constructing a camouflage object network framework to extract RGB characteristics;
step 1-2, designing a frequency domain enhancement module FEM to extract the characteristics of the disguised object in the frequency domain;
step 1-3, constructing a feature alignment module FA; fusing the time domain features and the frequency domain features;
step 1-4, constructing a high-frequency channel selection module HOR; and (5) performing high-frequency characteristic screening.
In the step 1-1, the network skeleton of the camouflage object comprises four stages, wherein each stage comprises two convolution layers of 3 multiplied by 3, and the stride is 2; extracting a characteristic diagram corresponding to the RGB image by using the camouflage object networkWherein H represents a height of the image, W represents a width of the image,representing the overall resolution size of the feature map.
The frequency domain enhancement module in step 1-2 comprises: an offline discrete cosine transform and online learning enhancement module OLE;
obtaining frequency domain information from the RGB image through offline discrete cosine transform; and the online learning enhancement module OLE is used for obtaining the characteristics of the camouflage object hidden in the frequency space, namely the frequency domain characteristics.
The off-line discrete cosine transform described in step 1-2 transforms the feature map x rgb Conversion to YCbCr space, in which the feature map is represented asSubsequently, x is YCbCr Image divided into 8 × 8 sizeAn image representing a region, i, j representing the coordinates of said region; each region is then processed into a frequency spectrum by Discrete Cosine Transform (DCT)Where each value corresponds to the intensity of a certain frequency band, the above process is represented by the following formula:
in the formula, the first and second sets of data are represented,representAll connections of (c);all the regions with the same frequency are collected into one channel, a characteristic diagram is obtained again, and flatten represents a collection method.
In step 1-2, the online learning enhancement module OLE is used to obtain characteristics of a camouflage object hidden in a frequency space, and specifically includes:
firstly, in order to enable the neural network to learn information of different frequency bands in the image, the signal is down-sampled and divided into two parts, wherein the first 96 channels are low-signal segmentsThe last 96 channels are high signal segments Wherein k represents a size; in order to enhance the signal in the corresponding frequency band, the low signal segment is combined withInputting two multi-head self-attention MHSAs into the high signal section respectively, and connecting the outputs to restore the original shape;
then another multi-headed self-attention MHSA reconciles all the different frequency bands, and the newly formed signal is represented asMulti-headed self-attention MHSA capture input representationsRich dependencies between each Patch chunk in the set; the specific method comprises the following steps: firstly, the first step is toShaping to form a characteristic diagramThen, modeling the relationship among all Patch blocks by using MHSA;
finally, upsampling and obtaining an enhanced frequency signal x freq 。
In step 1-3, the method for fusing the time domain feature and the frequency domain feature comprises:
feature alignment module FA aligns time domain featuresSum frequency domain feature X fre Carrying out fusion; designing a binary base filter f covering a high frequency band base And three different learnable filters are added to the Y, cb and Cr color spacesz represents the number of learnable filters; the filtering being a frequency-response and combination filter f base +σ(f z ) The dot product between, where the representation of the sigma function is as follows:
wherein exp is an exponentiation function; for input frequency domain characteristicsThree signals of different frequency bands are obtained through the following formula:
wherein [ ] is an element-level product; information of three different frequency bands is obtained by selecting 3 different filters
The information of the three different frequency bands is spliced together to obtain a frequency domain output X freq :
Splicing the spatial domain information and the frequency domain information; the specific method comprises the following steps: x is to be i And X freq Connected and input a convolution layer with 4 output channels, the output is T; taking out from the third dimension And reshaped to HW x n
By:
T 1 =T 1 (T 2 ) T ,
T 2 =T 3 (T 4 ) T .
mapping the alignment features; then multiplied by the transform and a learned vectorTo adjust the intensity of each channel, the alignment feature field of each channel is defined as:
In steps 1-4, the method for screening high-frequency characteristics comprises:
is provided withRepresenting input features, C representing the channel dimensions of said features, first reshaped to C × HW; since the frequency response comes from a local area, it is necessary to encode primitive features of positional importance to distinguish the disguised object from other objects. The location attention weight is expressed as:
different network layers present potential information at different scales, the latter of which has a larger acceptance area. The representation of multi-scale learning is also enhanced with cross-layer semantics.
Where ψ (X) represents a subsequent layer of the input feature X; the width W of the image is taken as an attention weight to find RGB and frequency response correlations between different layers; the position weight value strengthens the original characteristics, and the most useful characteristics are selected when different samples occur through self-adaptive gating operation, wherein the method comprises the following steps:
wherein,representing gating weights generated by the FC layer as a functionThe gating operation is generated based on spatial perception, and the characteristics of position perception are formed;
after obtaining the position enhancement feature a, establishing a channel perception relation matrix D through similar operations:
wherein C represents the channel dimension of the location-aware features; finally, the relationship matrix D is applied to X to obtain selection information beneficial to the disguised object:
then, the characteristic X is measured out Input into the decoding process.
The method for training the camouflage target network in the step 2 comprises the following steps:
step 2-1, data preprocessing: the method for enhancing the data in the form of random turning and random cutting of the training set of the disguised object to be input into the disguised object network specifically comprises the following steps:
step 2-1-1, randomly turning: flipping the image in either the horizontal or vertical direction;
step 2-1-2, random cutting: cutting an area with the size of the random ratio of the picture, and keeping the length-width ratio unchanged;
step 2-2, training: inputting the image subjected to data enhancement into a camouflage object network, optimizing the camouflage object network through a loss function, enabling the camouflage object network to generate a complete and accurate camouflage object image, training the camouflage object network repeatedly, setting the number of rounds, and storing final network model parameters.
In step 2-2, for the input camouflage target image, the camouflage target network based on the frequency domain enhancement module is utilized, and the loss l is lost by means of weighted BCE bce And weighted IoU loss l iou Training is carried out; wherein the supervision function L k Is defined as follows:
L k =L bce (P k ,M)+L iou (P k ,M)
where M is the truth label, k is the kth stage of the network, P k Representing the predicted result of the kth stage; finally, the overall loss function L overall Comprises the following steps:
has the advantages that:
firstly, the invention introduces the frequency domain as an additional clue to better detect the disguised object from the background; secondly, in order to further fully utilize frequency information, a high-order relation module (HOR) is provided to process rich fusion characteristics.
Drawings
The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic process flow diagram of the present invention.
FIG. 2 is a schematic diagram of an example set of input camouflaged object images.
Detailed Description
The invention discloses a method for detecting a disguised object based on frequency domain enhancement, which is implemented according to the following steps, as shown in figure 1:
1. constructing a camouflage object network G:
inputting: a set of images of a camouflaged object.
And (3) outputting: and generating a corresponding disguised object segmentation image.
1.1 constructing a network model framework of the disguised object to extract features;
the network model framework of the camouflage object comprises four stages, wherein each stage comprises two convolution layers of 3 multiplied by 3, and the stride is 2; the RGB feature map is extracted by utilizing the skeleton, so that the spatial resolution is reduced at each stage, and the constructed backbone extracts the multi-scale features under the condition of limited parameters
1.2 designing a frequency domain enhancement module FEM to extract the characteristics of the disguised object in the frequency domain;
the frequency domain enhancement module FEM is used for extracting the characteristics of the camouflage object in the frequency domain. Wherein the offline discrete cosine transform converts x rgb Conversion to YCbCr space (by)Represents) of x YCbCr A set of frequency domain color channels divided into 8 x 8Representing the area of a certain color channel. Each region is DCT processed into a frequency spectrumWhere each value corresponds to the intensity of a certain frequency band. According to the regional rule:
and isTo representAll connections of (c); all components with the same frequency are gathered into a channel, and the frequency spectrum is reshaped into a new inputThus, the original color input is converted to the frequency domain. The online learning enhancement module OLE is used to obtain the disguised object features hidden in the frequency space. Firstly, the coefficient of local frequency band is raised, the signal is down-sampled and divided into two portions, low signal segmentAnd high signal sectionWhere k represents the size. To enhance the signals in the corresponding frequency bands, we input them separately into two multi-tap self-attention (MHSA) and connect their outputs to restore the original shape. Then another MHSA reconciles all the different frequency bands, and the newly formed signal is represented asThe MHSA is able to capture rich correlations between each item in the input properties. At this point, there is complete interaction with the different spectra of the image. For DCT, patch blocks are independent of each other, and the above process only enhances one Patch block. To help the network identify the location of the disguise object, we need to establish connections between Patch blocks. We will first of all beMoulding to formThen, we model the relationships between all Patch blocks using MHSA. Finally, we can upsample and get the enhanced frequency signal x freq .x rgb And x freq Are input to the network. Since we apply everywhereSingle-layer MHSA, and the size scale of the frequency signal is small, so that high calculation cost is not brought about.
2. Building feature alignment modules and high frequency channel selection modules
Inputting: a frequency domain image and a time domain image;
and (3) outputting: a loss of the camouflaged object;
2.1 constructing a feature alignment module FA, and fusing the time domain feature and the frequency domain feature;
the feature alignment module FA aligns the time domain featuresSum frequency domain feature X freq2s And (4) carrying out fusion. We have designed a binary base filter f covering the high frequency band base And three learnable filters are added to the color space of Y, cb and CrThe filtering being frequency response and combined filter f base +σ(f i ) Dot product between, wherein
Wherein |, is the element-level product. Finally, they are put together:
then, we calculate the variation of the two signals from the spatial and frequency domainsAnd (4) changing. We will X i And X freq Connected and then fed into a convolutional layer with 4 output channels, the output of which is T. Taking out from the third dimension And reshape them to HW × n
Thus, we proceed by:
T 1 =T 1 (T 2 ) T ,
T 2 =T 3 (T 4 ) T .
second, we can align the feature maps. Then multiplied by the transform and a learned vectorTo adjust the intensity of each channel, the alignment feature field for each channel may be defined as:
2.2 constructing a high-frequency channel selection module HOR for high-frequency feature screening;
is provided withRepresenting the input features, we first reshape it to C HW since the frequency response comes from a local area, it is necessary to encode the original features with positional importance to distinguish the disguised object from other objects. The location attention weight may be expressed as:
in addition, different network layers present potential information at different scales, the latter of which has a larger acceptance area. The representation of multi-scale learning is also enhanced with cross-layer semantics. Here ψ (X) denotes the subsequent layer of X. W thus serves as an attention weight to find RGB and frequency response correlations between different layers. The position weights then emphasize the original features, then through adaptive gating operations, select the most useful features when different samples occur:
wherein,the gating weight generated by the FC layer can be considered as a functionThe gating operation is generated based on spatial perception to form a location-aware feature.
After obtaining the location-enhancing feature a, a channel perception relationship matrix may be established by similar operations:
where C represents the channel dimension of the location-aware features. Each tensor in the channel perceptual relationship has the same c-dimension in both semantic and frequency mapping, which correspond to the original eigenchannel and spectrum. Finally, we apply this relationship matrix to X to obtain selection information that is beneficial to the disguised object:
then, the characteristic X is measured out Input into the decoding process.
3. Training an integral framework;
the deep learning convolution neural network training parameter based on the double branches comprises a data preprocessing stage, a model framework training stage and a testing stage.
3.1, preprocessing data;
the input image set of the disguise object is adjusted by pulling up, reversing and the like and then input into a network of the disguise object.
Inputting: a set of images of a camouflaged object.
And (3) outputting: and (4) the image set of the disguised object after data enhancement.
Geometric reinforcement: the generalization capability of the model can be enhanced by methods of changing the image geometry such as translation, rotation and shearing;
3.2 model framework training
Inputting: data enhanced image set of camouflaged object
And (3) outputting: image set segmentation result of disguised object
For the input camouflage subject image, the network is used to lose l by means of a weighted BCE bce And weighted IoU loss l iou So as to train and obtain a robust camouflage object detection network; wherein the supervision function L i Is defined as follows:
L i =L bce (P i ,M)+L iou (P i ,M)
where M is the truth label and i is the ith stage of the network. Finally, the overall loss function is:
during training, a small batch Stochastic Gradient Descent (SGD) optimization algorithm with a batchsize of 32, a momentum of 0.9, and a weight decay of 1e-5 may be used. The learning rate is set to 1e-4 and the maximum epoch is set to 100. The training image is adjusted to 352 x 352 as input to the entire network.
3.3 testing the model framework;
inputting: a set of images of a camouflaged object;
and (3) outputting: corresponding disguised object segmentation images;
step 1, an input image set of a disguised object is sent into a convolutional neural network, and an initial disguised segmented image result can be obtained by using a trained generation network model;
and step 2, filling details in the features obtained in the step 1, and finally obtaining a final camouflage segmentation image result with rich details.
In the present invention, as shown in fig. 2, the first column is the input set of the disguised object image, the second column is the standard segmentation result, and the third column is the network segmentation result of the present invention.
In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the method for detecting a disguised object based on frequency domain enhancement and some or all of the steps in each embodiment of the method for detecting a disguised object based on frequency domain enhancement provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
It is clear to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a computer program, that is, a software product, which may be stored in a storage medium and include several instructions for enabling a device (which may be a personal computer, a server, a single chip microcomputer, an MUU, or a network device, etc.) including a data processing unit to execute the method according to each embodiment or some portions of the embodiments of the present invention.
The present invention provides a method and a concept for detecting a disguised object based on frequency domain enhancement, and a method and a way for implementing the technical scheme are numerous, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for a person skilled in the art, a plurality of improvements and embellishments can be made without departing from the principle of the present invention, and these improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (10)
1. A camouflaged object detection method based on frequency domain enhancement is characterized by comprising the following steps:
step 1, constructing a camouflage object network based on a frequency domain enhancement module; the disguised object network comprises: the device comprises a camouflage object network framework, a frequency domain enhancement module, a characteristic alignment module and a high-frequency channel selection module;
the device comprises a frequency domain enhancement module, a characteristic alignment module, a high-frequency channel selection module and a frequency domain matching module, wherein the frequency domain enhancement module is used for extracting the characteristics of a disguised object in a frequency domain;
step 2, training the camouflage object network, which comprises the following steps: inputting a camouflage object training image set subjected to data preprocessing into a camouflage object network, and optimizing the camouflage object network by using a frequency domain enhancement module and a supervision loss to obtain a trained camouflage object network;
step 3, testing by adopting the trained camouflage object network, comprising the following steps: and inputting the image of the disguised object to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image, and completing the detection of the disguised object based on frequency domain enhancement.
2. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 1, wherein step 1 comprises the following steps:
step 1-1, constructing a camouflage object network framework to extract RGB characteristics;
step 1-2, designing a frequency domain enhancement module FEM to extract the characteristics of the camouflage object in a frequency domain;
step 1-3, constructing a feature alignment module FA; fusing the time domain features and the frequency domain features;
step 1-4, constructing a high-frequency channel selection module HOR; and (5) performing high-frequency feature screening.
3. The method for detecting the disguised object based on the frequency domain enhancement as claimed in claim 2, wherein in step 1-1, the network skeleton of the disguised object comprises four stages, each stage is two convolution layers of 3 x 3, and the step is 2; extracting a characteristic diagram corresponding to the RGB image by using the camouflage object networkWherein H represents a height of the image, W represents a width of the image,representing the overall resolution size of the feature map.
4. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 3, wherein the frequency domain enhancing module in step 1-2 comprises: an offline discrete cosine transform and online learning enhancement module OLE;
obtaining frequency domain information from an RGB image through offline discrete cosine transform; and the online learning enhancement module OLE is used for obtaining the characteristics of the camouflage object hidden in the frequency space, namely the frequency domain characteristics.
5. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 4, wherein the steps of1-2 transforming the feature map x using an offline discrete cosine transform rgb Conversion to YCbCr space, in which the feature map is represented asSubsequently, x is YCbCr Image divided into 8 × 8 sizeAn image representing a region, i, j representing the coordinates of said region; each region is then processed into a frequency spectrum by Discrete Cosine Transform (DCT)Where each value corresponds to the intensity of a certain frequency band, the above process is represented by the following formula:
6. The method for detecting a disguised object based on frequency domain enhancement as claimed in claim 5, wherein in step 1-2, the online learning enhancement module OLE is used to obtain the characteristics of the disguised object hidden in the frequency space, and specifically comprises:
firstly, the signal is transmittedDown-sampling and dividing into two parts, the first 96 channels being low signal segments The last 96 channels are high signal segmentsWherein k represents a size; respectively inputting the low signal section and the high signal section into two multi-head self-attention MHSAs, and connecting the outputs to restore the original shape;
then another multi-headed self-attention MHSA reconciles all the different frequency bands, and the newly formed signal is represented asMulti-headed self-attention MHSA capture input representationsRich dependencies between each Patch chunk in the set; the specific method comprises the following steps: firstly, the method is toShaping of the moldThen, modeling the relationship among all Patch blocks by using MHSA;
finally, up-sampling and obtaining an enhanced frequency signal x freq 。
7. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 6, wherein in step 1-3, the method for fusing the time domain feature and the frequency domain feature comprises:
feature alignment module FA aligns time domain featuresSum frequency domain feature X freq2s Carrying out fusion; designing a binary base filter f covering a high frequency band base And three different learnable filters are added to the Y, cb and Cr color spacesz represents the number of learnable filters; the filtering being a frequency-response and combination filter f base +σ(f z ) The dot product between, where the representation of the sigma function is as follows:
wherein exp is an exponentiation function; for input frequency domain featuresThree signals of different frequency bands are obtained through the following formula:
wherein £ is an element-level product; information of three different frequency bands is obtained by selecting 3 different filters
The information of the three different frequency bands is spliced together to obtain a frequency domain output X freq :
Splicing the spatial domain information and the frequency domain information; the specific method comprises the following steps: x is to be i And X freq Connected and input a convolution layer with 4 output channels, the output is T; taking out from the third dimension And remodeled to HW × n
By:
T 1 =T 1 (T 2 ) T ,
T 2 =T 3 (T 4 ) T .
mapping the alignment features; then multiplied by the transform and a learned vectorTo adjust the intensity of each channel, the alignment feature field of each channel is defined as:
8. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 7, wherein in step 1-4, the method for high frequency feature screening comprises:
is provided withRepresenting input features, C representing the channel dimensions of said features, which are first reshaped to cxhw; the location attention weight is expressed as:
where ψ (X) represents a subsequent layer of the input feature X; the width W of the image is taken as an attention weight to find RGB and frequency response correlation between different layers; the position weight value strengthens the original characteristics, and the most useful characteristics are selected when different samples occur through self-adaptive gating operation, and the method comprises the following steps:
wherein,representing gating weights generated by the FC layer as a functionThe gating operation is generated based on space perception, and a position perception characteristic is formed;
after obtaining the position enhancement feature a, establishing a channel perception relation matrix D through similar operations:
wherein C represents the channel dimension of the location-aware feature; finally, the relationship matrix D is applied to X to obtain selection information that is favorable for the decoy:
then, the characteristic X is measured out Input into the decoding process.
9. The method for detecting a disguised object based on frequency domain enhancement as claimed in claim 8, wherein the method for training the network of the disguised object in step 2 comprises the steps of:
step 2-1, data preprocessing: the method for enhancing the data in the form of random turning and random cutting of the training set of the disguised object to be input into the disguised object network specifically comprises the following steps:
step 2-1-1, randomly turning: flipping the image in either the horizontal or vertical direction;
step 2-1-2, random cutting: cutting an area with the size of the random ratio of the picture, and keeping the length-width ratio unchanged;
step 2-2, training: inputting the image subjected to data enhancement into a camouflage object network, optimizing the camouflage object network through a loss function, enabling the camouflage object network to generate a complete and accurate camouflage object image, training the camouflage object network to set the number of turns repeatedly, and storing final network model parameters.
10. The method according to claim 9, wherein in step 2-2, the input camouflage object image is subjected to BCE loss l weighting by using the camouflage object network based on the frequency domain enhancement module bce And weighted IoU loss l iou Training is carried out; wherein the supervision function L k Is defined as follows:
L k =L bce (P k ,M)+L iou (P k ,M)
where M is the truth label and k is the kth of the networkStage, P k Representing the predicted result of the kth stage; finally, the overall loss function L overall Comprises the following steps:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211226059.3A CN115471675A (en) | 2022-10-09 | 2022-10-09 | Disguised object detection method based on frequency domain enhancement |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211226059.3A CN115471675A (en) | 2022-10-09 | 2022-10-09 | Disguised object detection method based on frequency domain enhancement |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115471675A true CN115471675A (en) | 2022-12-13 |
Family
ID=84337310
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211226059.3A Pending CN115471675A (en) | 2022-10-09 | 2022-10-09 | Disguised object detection method based on frequency domain enhancement |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115471675A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116664990A (en) * | 2023-08-01 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Camouflage target detection method, model training method, device, equipment and medium |
CN117828536A (en) * | 2024-03-04 | 2024-04-05 | 粤港澳大湾区数字经济研究院(福田) | Prediction method, model, terminal and medium for node interaction |
CN118379484A (en) * | 2024-06-19 | 2024-07-23 | 西北工业大学 | Camouflage target detection method based on frequency guidance space self-adaption |
-
2022
- 2022-10-09 CN CN202211226059.3A patent/CN115471675A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116664990A (en) * | 2023-08-01 | 2023-08-29 | 苏州浪潮智能科技有限公司 | Camouflage target detection method, model training method, device, equipment and medium |
CN116664990B (en) * | 2023-08-01 | 2023-11-14 | 苏州浪潮智能科技有限公司 | Camouflage target detection method, model training method, device, equipment and medium |
CN117828536A (en) * | 2024-03-04 | 2024-04-05 | 粤港澳大湾区数字经济研究院(福田) | Prediction method, model, terminal and medium for node interaction |
CN117828536B (en) * | 2024-03-04 | 2024-06-11 | 粤港澳大湾区数字经济研究院(福田) | Prediction method, model, terminal and medium for node interaction |
CN118379484A (en) * | 2024-06-19 | 2024-07-23 | 西北工业大学 | Camouflage target detection method based on frequency guidance space self-adaption |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112507997B (en) | Face super-resolution system based on multi-scale convolution and receptive field feature fusion | |
CN110728192B (en) | High-resolution remote sensing image classification method based on novel characteristic pyramid depth network | |
CN115471675A (en) | Disguised object detection method based on frequency domain enhancement | |
CN110929736B (en) | Multi-feature cascading RGB-D significance target detection method | |
CN107563433B (en) | Infrared small target detection method based on convolutional neural network | |
CN109766858A (en) | Three-dimensional convolution neural network hyperspectral image classification method combined with bilateral filtering | |
CN108491849A (en) | Hyperspectral image classification method based on three-dimensional dense connection convolutional neural networks | |
CN111680176A (en) | Remote sensing image retrieval method and system based on attention and bidirectional feature fusion | |
CN109657582A (en) | Recognition methods, device, computer equipment and the storage medium of face mood | |
CN111191735B (en) | Convolutional neural network image classification method based on data difference and multi-scale features | |
CN113538457B (en) | Video semantic segmentation method utilizing multi-frequency dynamic hole convolution | |
CN115222998B (en) | Image classification method | |
CN112257741A (en) | Method for detecting generative anti-false picture based on complex neural network | |
CN117058558A (en) | Remote sensing image scene classification method based on evidence fusion multilayer depth convolution network | |
CN113205016A (en) | River and lake shoreline change detection method based on constant residual error type Unet and remote sensing water body index | |
CN115410081A (en) | Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium | |
CN114913337A (en) | Camouflage target frame detection method based on ternary cascade perception | |
CN117422711B (en) | Ocean vortex hyperspectral change detection method, device, equipment and medium | |
CN114581789A (en) | Hyperspectral image classification method and system | |
CN114299382A (en) | Hyperspectral remote sensing image classification method and system | |
CN114170154A (en) | Remote sensing VHR image change detection method based on Transformer | |
CN111539434B (en) | Infrared weak and small target detection method based on similarity | |
CN117315473A (en) | Strawberry maturity detection method and system based on improved YOLOv8 | |
CN117292117A (en) | Small target detection method based on attention mechanism | |
Hamouda et al. | Framework for automatic selection of kernels based on convolutional neural networks and ckmeans clustering algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |