CN115471675A - Disguised object detection method based on frequency domain enhancement - Google Patents

Disguised object detection method based on frequency domain enhancement Download PDF

Info

Publication number
CN115471675A
CN115471675A CN202211226059.3A CN202211226059A CN115471675A CN 115471675 A CN115471675 A CN 115471675A CN 202211226059 A CN202211226059 A CN 202211226059A CN 115471675 A CN115471675 A CN 115471675A
Authority
CN
China
Prior art keywords
frequency domain
frequency
camouflage
disguised
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211226059.3A
Other languages
Chinese (zh)
Inventor
黄小珊
朱江海
仲亦杰
田锋亮
刘政龙
黄培强
张文强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xuchang 3d Mapping Co ltd
Original Assignee
Xuchang 3d Mapping Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xuchang 3d Mapping Co ltd filed Critical Xuchang 3d Mapping Co ltd
Priority to CN202211226059.3A priority Critical patent/CN115471675A/en
Publication of CN115471675A publication Critical patent/CN115471675A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for detecting a disguised object based on frequency domain enhancement, which comprises the following steps: 1, constructing a camouflage object network: building a camouflage detection camouflage object network, inputting an image set of a camouflage object into the camouflage object network, and optimizing the camouflage object network through an iterative optimization loss function; 2, constructing a feature alignment module and a high-frequency channel selection module: the purpose of time-frequency feature alignment and high-frequency feature screening is achieved by using the camouflage object network; 3, dividing the network framework of the disguised object into a model training stage and a testing stage; in the model training stage, a disguised object training picture set subjected to data preprocessing is input into a disguised object network, and the disguised object network is optimized by using a frequency domain enhancement module and two-time supervision; and in the testing stage, inputting the image of the disguised object to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image.

Description

Method for detecting disguised object based on frequency domain enhancement
Technical Field
The invention relates to a method for detecting a camouflaged object image, in particular to a method for detecting a camouflaged object based on frequency domain enhancement.
Background
In recent years, with the rapid development of deep convolutional networks, the detection of the disguised object has made a great breakthrough. Compared with the traditional disguised object detection algorithm, the disguised object detection method based on deep learning is greatly improved in accuracy rate, high-level semantic information of an image can be acquired through a deep neural network, and the information can be used for more accurately detecting the disguised significant object in a scene. For example, the documents Deng-Ping Fan, ge-Peng Ji, ming-Ming Cheng, and Ling Shao, "converged object detection," CoRR,2021. And the documents Jingjing Ren, xiaoaeei Hu, lei Zhu, xuemiao Xu, yangyang Xu, weiming Wang, zijun Deng, and Pheng-Ann Heng, "Deep texture-aware features for camouflaged object detection," in CoRR,2021. Attempt to design texture enhancement modules or use attention mechanisms to guide the model camouflaged areas. The documents YunqiuLv, hanging Zhang, yuchao Dai, aixuan Li, bowen Liu, nick Barnes, and Dengping Fan, "Simulanouselocity localization, segment and rank the captured objects," in CVPR,2021. And the documents P.Sengottuevan, amitabh Wahi, and A.Shanmugam, "Performance of decoding through output image analysis," in ICETET,2019. It is intended to treat the segmented camouflage as a two-stage process, improving the Performance of the network algorithm.
Although the accuracy of the detection of the disguised objects can be further improved by improving the network structure by the method, the disguised objects are only detected in the RGB space, and the difference between the disguised objects and other areas in the frequency domain space is ignored, so that the further improvement of the performance of the disguised objects and other areas is prevented.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing a method for detecting a disguised object based on frequency domain enhancement aiming at the defects of the prior art.
In order to solve the technical problem, the invention discloses a method for detecting a disguised object based on frequency domain enhancement.
The method disclosed by the invention is based on a frequency domain enhancement module to better detect the disguised object from the background. In particular, the invention designs a new Frequency Enhancement Module (FEM) to mine clues of disguised objects in the frequency domain. In addition, the invention provides a feature alignment method (FA) for fusing the features of the RGB domain and the frequency domain. Finally, in order to further exploit frequency information, a higher order relational module (HOR) is proposed to handle rich fusion features.
The method comprises the following specific steps:
step 1, constructing a camouflage object network based on a frequency domain enhancement module; the disguised object network comprises: the device comprises a camouflage object network framework, a frequency domain enhancement module, a feature alignment module and a high-frequency channel selection module;
the device comprises a frequency domain enhancement module, a characteristic alignment module, a high-frequency channel selection module and a frequency domain matching module, wherein the frequency domain enhancement module is used for extracting the characteristics of a disguised object in a frequency domain;
step 2, training the camouflage object network, comprising: inputting a camouflage object training image set subjected to data preprocessing into a camouflage object network, and optimizing the camouflage object network by using a frequency domain enhancement module and twice supervision to obtain a trained camouflage object network;
step 3, testing by adopting the trained camouflage object network, comprising the following steps: and inputting the disguised object image to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image, and finishing the detection of the disguised object based on frequency domain enhancement.
The step 1 comprises the following steps:
step 1-1, constructing a camouflage object network framework to extract RGB characteristics;
step 1-2, designing a frequency domain enhancement module FEM to extract the characteristics of the disguised object in the frequency domain;
step 1-3, constructing a feature alignment module FA; fusing the time domain features and the frequency domain features;
step 1-4, constructing a high-frequency channel selection module HOR; and (5) performing high-frequency characteristic screening.
In the step 1-1, the network skeleton of the camouflage object comprises four stages, wherein each stage comprises two convolution layers of 3 multiplied by 3, and the stride is 2; extracting a characteristic diagram corresponding to the RGB image by using the camouflage object network
Figure BDA0003879806740000025
Wherein H represents a height of the image, W represents a width of the image,
Figure BDA0003879806740000022
representing the overall resolution size of the feature map.
The frequency domain enhancement module in step 1-2 comprises: an offline discrete cosine transform and online learning enhancement module OLE;
obtaining frequency domain information from the RGB image through offline discrete cosine transform; and the online learning enhancement module OLE is used for obtaining the characteristics of the camouflage object hidden in the frequency space, namely the frequency domain characteristics.
The off-line discrete cosine transform described in step 1-2 transforms the feature map x rgb Conversion to YCbCr space, in which the feature map is represented as
Figure BDA0003879806740000023
Subsequently, x is YCbCr Image divided into 8 × 8 size
Figure BDA0003879806740000024
An image representing a region, i, j representing the coordinates of said region; each region is then processed into a frequency spectrum by Discrete Cosine Transform (DCT)
Figure BDA0003879806740000031
Where each value corresponds to the intensity of a certain frequency band, the above process is represented by the following formula:
Figure BDA0003879806740000032
in the formula, the first and second sets of data are represented,
Figure BDA0003879806740000033
represent
Figure BDA0003879806740000034
All connections of (c);
Figure BDA0003879806740000035
all the regions with the same frequency are collected into one channel, a characteristic diagram is obtained again, and flatten represents a collection method.
In step 1-2, the online learning enhancement module OLE is used to obtain characteristics of a camouflage object hidden in a frequency space, and specifically includes:
firstly, in order to enable the neural network to learn information of different frequency bands in the image, the signal is down-sampled and divided into two parts, wherein the first 96 channels are low-signal segments
Figure BDA00038798067400000318
The last 96 channels are high signal segments
Figure BDA0003879806740000037
Figure BDA0003879806740000038
Wherein k represents a size; in order to enhance the signal in the corresponding frequency band, the low signal segment is combined withInputting two multi-head self-attention MHSAs into the high signal section respectively, and connecting the outputs to restore the original shape;
then another multi-headed self-attention MHSA reconciles all the different frequency bands, and the newly formed signal is represented as
Figure BDA00038798067400000319
Multi-headed self-attention MHSA capture input representations
Figure BDA00038798067400000310
Rich dependencies between each Patch chunk in the set; the specific method comprises the following steps: firstly, the first step is to
Figure BDA00038798067400000311
Shaping to form a characteristic diagram
Figure BDA00038798067400000320
Then, modeling the relationship among all Patch blocks by using MHSA;
finally, upsampling and obtaining an enhanced frequency signal x freq
In step 1-3, the method for fusing the time domain feature and the frequency domain feature comprises:
feature alignment module FA aligns time domain features
Figure BDA00038798067400000313
Sum frequency domain feature X fre Carrying out fusion; designing a binary base filter f covering a high frequency band base And three different learnable filters are added to the Y, cb and Cr color spaces
Figure BDA00038798067400000314
z represents the number of learnable filters; the filtering being a frequency-response and combination filter f base +σ(f z ) The dot product between, where the representation of the sigma function is as follows:
Figure BDA00038798067400000315
wherein exp is an exponentiation function; for input frequency domain characteristics
Figure BDA00038798067400000316
Three signals of different frequency bands are obtained through the following formula:
Figure BDA00038798067400000317
wherein [ ] is an element-level product; information of three different frequency bands is obtained by selecting 3 different filters
Figure BDA0003879806740000041
The information of the three different frequency bands is spliced together to obtain a frequency domain output X freq
Figure BDA0003879806740000042
Splicing the spatial domain information and the frequency domain information; the specific method comprises the following steps: x is to be i And X freq Connected and input a convolution layer with 4 output channels, the output is T; taking out from the third dimension
Figure BDA0003879806740000043
Figure BDA0003879806740000044
And reshaped to HW x n
By:
T 1 =T 1 (T 2 ) T
T 2 =T 3 (T 4 ) T .
mapping the alignment features; then multiplied by the transform and a learned vector
Figure BDA0003879806740000045
To adjust the intensity of each channel, the alignment feature field of each channel is defined as:
Figure BDA0003879806740000046
Figure BDA0003879806740000047
finally, a fused feature is obtained by adding features of the two domains
Figure BDA0003879806740000048
Figure BDA0003879806740000049
In steps 1-4, the method for screening high-frequency characteristics comprises:
is provided with
Figure BDA00038798067400000410
Representing input features, C representing the channel dimensions of said features, first reshaped to C × HW; since the frequency response comes from a local area, it is necessary to encode primitive features of positional importance to distinguish the disguised object from other objects. The location attention weight is expressed as:
Figure BDA00038798067400000411
different network layers present potential information at different scales, the latter of which has a larger acceptance area. The representation of multi-scale learning is also enhanced with cross-layer semantics.
Where ψ (X) represents a subsequent layer of the input feature X; the width W of the image is taken as an attention weight to find RGB and frequency response correlations between different layers; the position weight value strengthens the original characteristics, and the most useful characteristics are selected when different samples occur through self-adaptive gating operation, wherein the method comprises the following steps:
Figure BDA00038798067400000412
wherein the content of the first and second substances,
Figure BDA0003879806740000051
representing gating weights generated by the FC layer as a function
Figure BDA0003879806740000052
The gating operation is generated based on spatial perception, and the characteristics of position perception are formed;
after obtaining the position enhancement feature a, establishing a channel perception relation matrix D through similar operations:
Figure BDA0003879806740000053
wherein C represents the channel dimension of the location-aware features; finally, the relationship matrix D is applied to X to obtain selection information beneficial to the disguised object:
Figure BDA0003879806740000054
then, the characteristic X is measured out Input into the decoding process.
The method for training the camouflage target network in the step 2 comprises the following steps:
step 2-1, data preprocessing: the method for enhancing the data in the form of random turning and random cutting of the training set of the disguised object to be input into the disguised object network specifically comprises the following steps:
step 2-1-1, randomly turning: flipping the image in either the horizontal or vertical direction;
step 2-1-2, random cutting: cutting an area with the size of the random ratio of the picture, and keeping the length-width ratio unchanged;
step 2-2, training: inputting the image subjected to data enhancement into a camouflage object network, optimizing the camouflage object network through a loss function, enabling the camouflage object network to generate a complete and accurate camouflage object image, training the camouflage object network repeatedly, setting the number of rounds, and storing final network model parameters.
In step 2-2, for the input camouflage target image, the camouflage target network based on the frequency domain enhancement module is utilized, and the loss l is lost by means of weighted BCE bce And weighted IoU loss l iou Training is carried out; wherein the supervision function L k Is defined as follows:
L k =L bce (P k ,M)+L iou (P k ,M)
where M is the truth label, k is the kth stage of the network, P k Representing the predicted result of the kth stage; finally, the overall loss function L overall Comprises the following steps:
Figure BDA0003879806740000055
has the advantages that:
firstly, the invention introduces the frequency domain as an additional clue to better detect the disguised object from the background; secondly, in order to further fully utilize frequency information, a high-order relation module (HOR) is provided to process rich fusion characteristics.
Drawings
The foregoing and/or other advantages of the invention will become more apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a schematic process flow diagram of the present invention.
FIG. 2 is a schematic diagram of an example set of input camouflaged object images.
Detailed Description
The invention discloses a method for detecting a disguised object based on frequency domain enhancement, which is implemented according to the following steps, as shown in figure 1:
1. constructing a camouflage object network G:
inputting: a set of images of a camouflaged object.
And (3) outputting: and generating a corresponding disguised object segmentation image.
1.1 constructing a network model framework of the disguised object to extract features;
the network model framework of the camouflage object comprises four stages, wherein each stage comprises two convolution layers of 3 multiplied by 3, and the stride is 2; the RGB feature map is extracted by utilizing the skeleton, so that the spatial resolution is reduced at each stage, and the constructed backbone extracts the multi-scale features under the condition of limited parameters
Figure BDA0003879806740000061
1.2 designing a frequency domain enhancement module FEM to extract the characteristics of the disguised object in the frequency domain;
the frequency domain enhancement module FEM is used for extracting the characteristics of the camouflage object in the frequency domain. Wherein the offline discrete cosine transform converts x rgb Conversion to YCbCr space (by)
Figure BDA0003879806740000062
Represents) of x YCbCr A set of frequency domain color channels divided into 8 x 8
Figure BDA0003879806740000063
Representing the area of a certain color channel. Each region is DCT processed into a frequency spectrum
Figure BDA0003879806740000064
Where each value corresponds to the intensity of a certain frequency band. According to the regional rule:
Figure BDA0003879806740000065
and is
Figure BDA0003879806740000066
To represent
Figure BDA0003879806740000067
All connections of (c); all components with the same frequency are gathered into a channel, and the frequency spectrum is reshaped into a new input
Figure BDA0003879806740000068
Thus, the original color input is converted to the frequency domain. The online learning enhancement module OLE is used to obtain the disguised object features hidden in the frequency space. Firstly, the coefficient of local frequency band is raised, the signal is down-sampled and divided into two portions, low signal segment
Figure BDA0003879806740000069
And high signal section
Figure BDA00038798067400000610
Where k represents the size. To enhance the signals in the corresponding frequency bands, we input them separately into two multi-tap self-attention (MHSA) and connect their outputs to restore the original shape. Then another MHSA reconciles all the different frequency bands, and the newly formed signal is represented as
Figure BDA0003879806740000071
The MHSA is able to capture rich correlations between each item in the input properties. At this point, there is complete interaction with the different spectra of the image. For DCT, patch blocks are independent of each other, and the above process only enhances one Patch block. To help the network identify the location of the disguise object, we need to establish connections between Patch blocks. We will first of all be
Figure BDA0003879806740000072
Moulding to form
Figure BDA0003879806740000073
Then, we model the relationships between all Patch blocks using MHSA. Finally, we can upsample and get the enhanced frequency signal x freq .x rgb And x freq Are input to the network. Since we apply everywhereSingle-layer MHSA, and the size scale of the frequency signal is small, so that high calculation cost is not brought about.
2. Building feature alignment modules and high frequency channel selection modules
Inputting: a frequency domain image and a time domain image;
and (3) outputting: a loss of the camouflaged object;
2.1 constructing a feature alignment module FA, and fusing the time domain feature and the frequency domain feature;
the feature alignment module FA aligns the time domain features
Figure BDA0003879806740000074
Sum frequency domain feature X freq2s And (4) carrying out fusion. We have designed a binary base filter f covering the high frequency band base And three learnable filters are added to the color space of Y, cb and Cr
Figure BDA0003879806740000075
The filtering being frequency response and combined filter f base +σ(f i ) Dot product between, wherein
Figure BDA0003879806740000076
For input frequency domain characteristics
Figure BDA0003879806740000077
The network can pass
Figure BDA0003879806740000078
Wherein |, is the element-level product. Finally, they are put together:
Figure BDA0003879806740000079
then, we calculate the variation of the two signals from the spatial and frequency domainsAnd (4) changing. We will X i And X freq Connected and then fed into a convolutional layer with 4 output channels, the output of which is T. Taking out from the third dimension
Figure BDA00038798067400000710
Figure BDA00038798067400000711
And reshape them to HW × n
Thus, we proceed by:
T 1 =T 1 (T 2 ) T
T 2 =T 3 (T 4 ) T .
second, we can align the feature maps. Then multiplied by the transform and a learned vector
Figure BDA0003879806740000081
To adjust the intensity of each channel, the alignment feature field for each channel may be defined as:
Figure BDA0003879806740000082
Figure BDA0003879806740000083
finally, we can obtain a fused feature by adding features of these two domains
Figure BDA0003879806740000084
Figure BDA0003879806740000085
2.2 constructing a high-frequency channel selection module HOR for high-frequency feature screening;
is provided with
Figure BDA0003879806740000086
Representing the input features, we first reshape it to C HW since the frequency response comes from a local area, it is necessary to encode the original features with positional importance to distinguish the disguised object from other objects. The location attention weight may be expressed as:
Figure BDA0003879806740000087
in addition, different network layers present potential information at different scales, the latter of which has a larger acceptance area. The representation of multi-scale learning is also enhanced with cross-layer semantics. Here ψ (X) denotes the subsequent layer of X. W thus serves as an attention weight to find RGB and frequency response correlations between different layers. The position weights then emphasize the original features, then through adaptive gating operations, select the most useful features when different samples occur:
Figure BDA0003879806740000088
wherein the content of the first and second substances,
Figure BDA0003879806740000089
the gating weight generated by the FC layer can be considered as a function
Figure BDA00038798067400000810
The gating operation is generated based on spatial perception to form a location-aware feature.
After obtaining the location-enhancing feature a, a channel perception relationship matrix may be established by similar operations:
Figure BDA00038798067400000811
where C represents the channel dimension of the location-aware features. Each tensor in the channel perceptual relationship has the same c-dimension in both semantic and frequency mapping, which correspond to the original eigenchannel and spectrum. Finally, we apply this relationship matrix to X to obtain selection information that is beneficial to the disguised object:
Figure BDA00038798067400000812
then, the characteristic X is measured out Input into the decoding process.
3. Training an integral framework;
the deep learning convolution neural network training parameter based on the double branches comprises a data preprocessing stage, a model framework training stage and a testing stage.
3.1, preprocessing data;
the input image set of the disguise object is adjusted by pulling up, reversing and the like and then input into a network of the disguise object.
Inputting: a set of images of a camouflaged object.
And (3) outputting: and (4) the image set of the disguised object after data enhancement.
Geometric reinforcement: the generalization capability of the model can be enhanced by methods of changing the image geometry such as translation, rotation and shearing;
3.2 model framework training
Inputting: data enhanced image set of camouflaged object
And (3) outputting: image set segmentation result of disguised object
For the input camouflage subject image, the network is used to lose l by means of a weighted BCE bce And weighted IoU loss l iou So as to train and obtain a robust camouflage object detection network; wherein the supervision function L i Is defined as follows:
L i =L bce (P i ,M)+L iou (P i ,M)
where M is the truth label and i is the ith stage of the network. Finally, the overall loss function is:
Figure BDA0003879806740000091
during training, a small batch Stochastic Gradient Descent (SGD) optimization algorithm with a batchsize of 32, a momentum of 0.9, and a weight decay of 1e-5 may be used. The learning rate is set to 1e-4 and the maximum epoch is set to 100. The training image is adjusted to 352 x 352 as input to the entire network.
3.3 testing the model framework;
inputting: a set of images of a camouflaged object;
and (3) outputting: corresponding disguised object segmentation images;
step 1, an input image set of a disguised object is sent into a convolutional neural network, and an initial disguised segmented image result can be obtained by using a trained generation network model;
and step 2, filling details in the features obtained in the step 1, and finally obtaining a final camouflage segmentation image result with rich details.
In the present invention, as shown in fig. 2, the first column is the input set of the disguised object image, the second column is the standard segmentation result, and the third column is the network segmentation result of the present invention.
In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and the computer program, when executed by the data processing unit, may execute the inventive content of the method for detecting a disguised object based on frequency domain enhancement and some or all of the steps in each embodiment of the method for detecting a disguised object based on frequency domain enhancement provided by the present invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
It is clear to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a computer program, that is, a software product, which may be stored in a storage medium and include several instructions for enabling a device (which may be a personal computer, a server, a single chip microcomputer, an MUU, or a network device, etc.) including a data processing unit to execute the method according to each embodiment or some portions of the embodiments of the present invention.
The present invention provides a method and a concept for detecting a disguised object based on frequency domain enhancement, and a method and a way for implementing the technical scheme are numerous, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for a person skilled in the art, a plurality of improvements and embellishments can be made without departing from the principle of the present invention, and these improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (10)

1. A camouflaged object detection method based on frequency domain enhancement is characterized by comprising the following steps:
step 1, constructing a camouflage object network based on a frequency domain enhancement module; the disguised object network comprises: the device comprises a camouflage object network framework, a frequency domain enhancement module, a characteristic alignment module and a high-frequency channel selection module;
the device comprises a frequency domain enhancement module, a characteristic alignment module, a high-frequency channel selection module and a frequency domain matching module, wherein the frequency domain enhancement module is used for extracting the characteristics of a disguised object in a frequency domain;
step 2, training the camouflage object network, which comprises the following steps: inputting a camouflage object training image set subjected to data preprocessing into a camouflage object network, and optimizing the camouflage object network by using a frequency domain enhancement module and a supervision loss to obtain a trained camouflage object network;
step 3, testing by adopting the trained camouflage object network, comprising the following steps: and inputting the image of the disguised object to be detected into the trained disguised object network to obtain a corresponding disguised object segmentation image, and completing the detection of the disguised object based on frequency domain enhancement.
2. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 1, wherein step 1 comprises the following steps:
step 1-1, constructing a camouflage object network framework to extract RGB characteristics;
step 1-2, designing a frequency domain enhancement module FEM to extract the characteristics of the camouflage object in a frequency domain;
step 1-3, constructing a feature alignment module FA; fusing the time domain features and the frequency domain features;
step 1-4, constructing a high-frequency channel selection module HOR; and (5) performing high-frequency feature screening.
3. The method for detecting the disguised object based on the frequency domain enhancement as claimed in claim 2, wherein in step 1-1, the network skeleton of the disguised object comprises four stages, each stage is two convolution layers of 3 x 3, and the step is 2; extracting a characteristic diagram corresponding to the RGB image by using the camouflage object network
Figure FDA0003879806730000011
Wherein H represents a height of the image, W represents a width of the image,
Figure FDA0003879806730000012
representing the overall resolution size of the feature map.
4. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 3, wherein the frequency domain enhancing module in step 1-2 comprises: an offline discrete cosine transform and online learning enhancement module OLE;
obtaining frequency domain information from an RGB image through offline discrete cosine transform; and the online learning enhancement module OLE is used for obtaining the characteristics of the camouflage object hidden in the frequency space, namely the frequency domain characteristics.
5. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 4, wherein the steps of1-2 transforming the feature map x using an offline discrete cosine transform rgb Conversion to YCbCr space, in which the feature map is represented as
Figure FDA0003879806730000021
Subsequently, x is YCbCr Image divided into 8 × 8 size
Figure FDA0003879806730000022
An image representing a region, i, j representing the coordinates of said region; each region is then processed into a frequency spectrum by Discrete Cosine Transform (DCT)
Figure FDA0003879806730000023
Where each value corresponds to the intensity of a certain frequency band, the above process is represented by the following formula:
Figure FDA0003879806730000024
in the formula, the first and second sets of data are represented,
Figure FDA0003879806730000025
to represent
Figure FDA0003879806730000026
All connections of (a);
Figure FDA0003879806730000027
all the regions with the same frequency are collected into one channel, a characteristic diagram is obtained again, and flatten represents a collection method.
6. The method for detecting a disguised object based on frequency domain enhancement as claimed in claim 5, wherein in step 1-2, the online learning enhancement module OLE is used to obtain the characteristics of the disguised object hidden in the frequency space, and specifically comprises:
firstly, the signal is transmitted
Figure FDA0003879806730000028
Down-sampling and dividing into two parts, the first 96 channels being low signal segments
Figure FDA0003879806730000029
Figure FDA00038798067300000210
The last 96 channels are high signal segments
Figure FDA00038798067300000211
Wherein k represents a size; respectively inputting the low signal section and the high signal section into two multi-head self-attention MHSAs, and connecting the outputs to restore the original shape;
then another multi-headed self-attention MHSA reconciles all the different frequency bands, and the newly formed signal is represented as
Figure FDA00038798067300000212
Multi-headed self-attention MHSA capture input representations
Figure FDA00038798067300000213
Rich dependencies between each Patch chunk in the set; the specific method comprises the following steps: firstly, the method is to
Figure FDA00038798067300000214
Shaping of the mold
Figure FDA00038798067300000215
Then, modeling the relationship among all Patch blocks by using MHSA;
finally, up-sampling and obtaining an enhanced frequency signal x freq
7. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 6, wherein in step 1-3, the method for fusing the time domain feature and the frequency domain feature comprises:
feature alignment module FA aligns time domain features
Figure FDA00038798067300000216
Sum frequency domain feature X freq2s Carrying out fusion; designing a binary base filter f covering a high frequency band base And three different learnable filters are added to the Y, cb and Cr color spaces
Figure FDA00038798067300000217
z represents the number of learnable filters; the filtering being a frequency-response and combination filter f base +σ(f z ) The dot product between, where the representation of the sigma function is as follows:
Figure FDA00038798067300000218
wherein exp is an exponentiation function; for input frequency domain features
Figure FDA0003879806730000031
Three signals of different frequency bands are obtained through the following formula:
Figure FDA0003879806730000032
wherein £ is an element-level product; information of three different frequency bands is obtained by selecting 3 different filters
Figure FDA0003879806730000033
The information of the three different frequency bands is spliced together to obtain a frequency domain output X freq
Figure FDA0003879806730000034
Splicing the spatial domain information and the frequency domain information; the specific method comprises the following steps: x is to be i And X freq Connected and input a convolution layer with 4 output channels, the output is T; taking out from the third dimension
Figure FDA0003879806730000035
Figure FDA0003879806730000036
And remodeled to HW × n
By:
T 1 =T 1 (T 2 ) T ,
T 2 =T 3 (T 4 ) T .
mapping the alignment features; then multiplied by the transform and a learned vector
Figure FDA0003879806730000037
To adjust the intensity of each channel, the alignment feature field of each channel is defined as:
Figure FDA0003879806730000038
Figure FDA0003879806730000039
finally, a fused feature is obtained by adding features of the two domains
Figure FDA00038798067300000310
Figure FDA00038798067300000311
8. The method for detecting a camouflaged object based on frequency domain enhancement as claimed in claim 7, wherein in step 1-4, the method for high frequency feature screening comprises:
is provided with
Figure FDA00038798067300000312
Representing input features, C representing the channel dimensions of said features, which are first reshaped to cxhw; the location attention weight is expressed as:
Figure FDA00038798067300000313
where ψ (X) represents a subsequent layer of the input feature X; the width W of the image is taken as an attention weight to find RGB and frequency response correlation between different layers; the position weight value strengthens the original characteristics, and the most useful characteristics are selected when different samples occur through self-adaptive gating operation, and the method comprises the following steps:
Figure FDA0003879806730000041
wherein the content of the first and second substances,
Figure FDA0003879806730000042
representing gating weights generated by the FC layer as a function
Figure FDA0003879806730000043
The gating operation is generated based on space perception, and a position perception characteristic is formed;
after obtaining the position enhancement feature a, establishing a channel perception relation matrix D through similar operations:
Figure FDA0003879806730000044
wherein C represents the channel dimension of the location-aware feature; finally, the relationship matrix D is applied to X to obtain selection information that is favorable for the decoy:
Figure FDA0003879806730000045
then, the characteristic X is measured out Input into the decoding process.
9. The method for detecting a disguised object based on frequency domain enhancement as claimed in claim 8, wherein the method for training the network of the disguised object in step 2 comprises the steps of:
step 2-1, data preprocessing: the method for enhancing the data in the form of random turning and random cutting of the training set of the disguised object to be input into the disguised object network specifically comprises the following steps:
step 2-1-1, randomly turning: flipping the image in either the horizontal or vertical direction;
step 2-1-2, random cutting: cutting an area with the size of the random ratio of the picture, and keeping the length-width ratio unchanged;
step 2-2, training: inputting the image subjected to data enhancement into a camouflage object network, optimizing the camouflage object network through a loss function, enabling the camouflage object network to generate a complete and accurate camouflage object image, training the camouflage object network to set the number of turns repeatedly, and storing final network model parameters.
10. The method according to claim 9, wherein in step 2-2, the input camouflage object image is subjected to BCE loss l weighting by using the camouflage object network based on the frequency domain enhancement module bce And weighted IoU loss l iou Training is carried out; wherein the supervision function L k Is defined as follows:
L k =L bce (P k ,M)+L iou (P k ,M)
where M is the truth label and k is the kth of the networkStage, P k Representing the predicted result of the kth stage; finally, the overall loss function L overall Comprises the following steps:
Figure FDA0003879806730000046
CN202211226059.3A 2022-10-09 2022-10-09 Disguised object detection method based on frequency domain enhancement Pending CN115471675A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211226059.3A CN115471675A (en) 2022-10-09 2022-10-09 Disguised object detection method based on frequency domain enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211226059.3A CN115471675A (en) 2022-10-09 2022-10-09 Disguised object detection method based on frequency domain enhancement

Publications (1)

Publication Number Publication Date
CN115471675A true CN115471675A (en) 2022-12-13

Family

ID=84337310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211226059.3A Pending CN115471675A (en) 2022-10-09 2022-10-09 Disguised object detection method based on frequency domain enhancement

Country Status (1)

Country Link
CN (1) CN115471675A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664990A (en) * 2023-08-01 2023-08-29 苏州浪潮智能科技有限公司 Camouflage target detection method, model training method, device, equipment and medium
CN117828536A (en) * 2024-03-04 2024-04-05 粤港澳大湾区数字经济研究院(福田) Prediction method, model, terminal and medium for node interaction

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116664990A (en) * 2023-08-01 2023-08-29 苏州浪潮智能科技有限公司 Camouflage target detection method, model training method, device, equipment and medium
CN116664990B (en) * 2023-08-01 2023-11-14 苏州浪潮智能科技有限公司 Camouflage target detection method, model training method, device, equipment and medium
CN117828536A (en) * 2024-03-04 2024-04-05 粤港澳大湾区数字经济研究院(福田) Prediction method, model, terminal and medium for node interaction
CN117828536B (en) * 2024-03-04 2024-06-11 粤港澳大湾区数字经济研究院(福田) Prediction method, model, terminal and medium for node interaction

Similar Documents

Publication Publication Date Title
CN112507997B (en) Face super-resolution system based on multi-scale convolution and receptive field feature fusion
CN110728192B (en) High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN107563433B (en) Infrared small target detection method based on convolutional neural network
CN110929736B (en) Multi-feature cascading RGB-D significance target detection method
CN115471675A (en) Disguised object detection method based on frequency domain enhancement
CN111209921A (en) License plate detection model based on improved YOLOv3 network and construction method
CN108491849A (en) Hyperspectral image classification method based on three-dimensional dense connection convolutional neural networks
CN111680176A (en) Remote sensing image retrieval method and system based on attention and bidirectional feature fusion
CN109657582A (en) Recognition methods, device, computer equipment and the storage medium of face mood
CN111191735B (en) Convolutional neural network image classification method based on data difference and multi-scale features
CN115222998B (en) Image classification method
CN112257741A (en) Method for detecting generative anti-false picture based on complex neural network
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
CN117058558A (en) Remote sensing image scene classification method based on evidence fusion multilayer depth convolution network
CN112149662A (en) Multi-mode fusion significance detection method based on expansion volume block
CN115019178A (en) Hyperspectral image classification method based on large kernel convolution attention
CN117422711B (en) Ocean vortex hyperspectral change detection method, device, equipment and medium
CN113205016B (en) River and lake shoreline change detection method of constant residual error type Unet and remote sensing water body index
CN114170154A (en) Remote sensing VHR image change detection method based on Transformer
CN111539434B (en) Infrared weak and small target detection method based on similarity
Hamouda et al. Framework for automatic selection of kernels based on convolutional neural networks and ckmeans clustering algorithm
CN116977747A (en) Small sample hyperspectral classification method based on multipath multi-scale feature twin network
CN117115675A (en) Cross-time-phase light-weight spatial spectrum feature fusion hyperspectral change detection method, system, equipment and medium
CN111339825B (en) Model training method based on characteristic relation atlas learning and data classification method
CN117152621B (en) Building change detection method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination