CN116883681B - Domain generalization target detection method based on countermeasure generation network - Google Patents

Domain generalization target detection method based on countermeasure generation network Download PDF

Info

Publication number
CN116883681B
CN116883681B CN202310999356.XA CN202310999356A CN116883681B CN 116883681 B CN116883681 B CN 116883681B CN 202310999356 A CN202310999356 A CN 202310999356A CN 116883681 B CN116883681 B CN 116883681B
Authority
CN
China
Prior art keywords
domain
network
target
fpn
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310999356.XA
Other languages
Chinese (zh)
Other versions
CN116883681A (en
Inventor
张弘
周炫锋
杨一帆
李亚伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202310999356.XA priority Critical patent/CN116883681B/en
Publication of CN116883681A publication Critical patent/CN116883681A/en
Application granted granted Critical
Publication of CN116883681B publication Critical patent/CN116883681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/094Adversarial learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The invention discloses a domain generalization target detection method based on an countermeasure generation network. Firstly, constructing a feature extraction network to extract the features of an input image; the structural domain excitation attention module further extracts the features extracted by the feature extraction network, so that the domain generalization capability of the structural domain excitation attention module is improved; constructing a feature pyramid network FPN to perform multi-scale fusion on the feature extraction network features; constructing an countermeasure generation network regularization module, aligning the features extracted by the FPN with standard Gaussian distribution, and avoiding the over fitting of the FPN; constructing a detection head network, and predicting the position, the category and the target center position of a detection target; and constructing a target center alignment module, performing countermeasure training on the FPN extracted features, and further improving the domain generalization capability of the FPN extracted features. The network structure adopted by the invention has reasonable design, can overcome the problems of weak generalization capability of extracting features and the like of the existing target detection method, and enhances the robustness of target detection.

Description

Domain generalization target detection method based on countermeasure generation network
Technical Field
The invention relates to the field of pattern recognition, in particular to a domain generalization target detection method based on an countermeasure generation network.
Background
Object detection is the task of locating a specific object in an image, a fundamental problem in computer vision. In recent years, the performance of target detection by a target detection method based on a deep Convolutional Neural Network (CNN) has been remarkably improved under the promotion of the development of the CNN.
Target detection studies can be categorized into anchor-based and anchor-free detectors. An anchor-based detector generates target offers by means of a set of anchors and formulates target detection as a series of classification tasks for the offers. Faster-RCNN is an open-ended anchor-based detector in which a Regional Proposal Network (RPN) is used for proposal generation. Due to its effectiveness, RPN is widely used in many anchor-based detectors. The anchor-free detector skips proposal generation and locates the object directly based on a Full Convolutional Network (FCN). Recently, the anchor-free method has utilized the key point, i.e., the center or corner of the box, to locate and achieve comparable performance to the anchor-based method. However, these methods require complex post-processing to group the detection points. To avoid such a process, FCOS proposes pixel-by-pixel prediction, which directly predicts the class and offset of each position-corresponding object on the feature map. In this work we use the nature of the anchor-free method to identify the distinct regions of the alignment process.
However, in many other scenarios, there is a certain data distribution offset of the data used in CNN training from the data actually subject to target detection, the former being referred to as the source domain and the latter as the target domain. There may be no data for the target domain in CNN training, but it is still required to build an accurate model for the "invisible" target domain. Each type of background or view may be considered a field herein. Due to the distribution offset between the training source domain and the unknown test target domain, detectors trained on the reference data set do not always achieve satisfactory detection results when applied to new scenarios. To overcome the influence of the distribution offset, a Domain Adaptation (DA) method and a Domain Generalization (DG) method are proposed to improve the performance in the target domain. The DA method requires target data to train the new model in the face of a new target scene, so their performance depends largely on the distribution of the target domain. Furthermore, the DA method is based on the assumption that the target domain sample can be obtained in large quantities, which is impractical in some cases. On the other hand, the DG method can be more conveniently implemented in practice by learning a domain invariant model without a target domain sample. The basic idea of DG approach is to combine source data in some way to generate a model that is invariant to specific target data, making the model perform satisfactorily on different target scenarios. However, existing DG approaches degrade when the difference between the source domain and the target domain is large, because the model trained on the source domain may not represent samples from the target domain scene well.
In previous work, the domain generalization problem was mainly solved in two ways. In one aspect, some methods aggregate information from a source domain to learn a domain invariant representation. In particular, the domain invariant transformation is learned by minimizing the distance between domains, i.e., the learning of the detector is done by simply putting together all training data from different domains. On the other hand, there are some works to train the detector or adjust its weight with all the information from the source domain. However, these methods degrade when the difference between the source scene and the target domain scene is large. In the present invention, a domain excitation attention block is used to weight input features according to their domain specific weights. In fact, the proposed method is similar to the first method but essentially different. The present invention attempts to resolve a target domain to a source domain by applying different weights to domain-specific features and ultimately outputs an adaptive representation that is applicable to models trained on the source domain.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a domain generalization target detection method based on an countermeasure generation network, which comprises the steps of firstly constructing a feature extraction network to extract input image features; the structural domain excitation attention module further extracts the features extracted by the feature extraction network, so that the domain generalization capability of the structural domain excitation attention module is improved; constructing a feature pyramid network FPN to perform multi-scale fusion on the feature extraction network features; constructing an countermeasure generation network regularization module, aligning the features extracted by the FPN with standard Gaussian distribution, and avoiding the over fitting of the FPN; constructing a detection head network, and predicting the position, the category and the target center position of a detection target; and constructing a target center alignment module, performing countermeasure training on the FPN extracted features, and further improving the domain generalization capability of the FPN extracted features. The method can solve the problems of weak generalization capability of extracting the characteristics and the like of the existing target detection method, and enhance the robustness of target detection.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a domain generalization target detection method based on an countermeasure generation network comprises the following steps:
step (1) giving annotated images I from K different source domains l L.epsilon. {1, … K }, their true labels y l L epsilon {1, … K } and target domain image I without labels T The goal is to predict annotation y on the target domain image T
Step (2) constructing a domain excitation attention module, wherein the domain excitation attention module inputs a feature map output by a backbone network and outputs the enhanced feature map;
step (3) constructing a backbone network as a feature extractor, inputting an image and outputting the extracted features; the domain excitation attention module is inserted before each pooling operation of the backbone network to realize the domain enhancement of the features;
step (4) constructing an countermeasure generation network regularization module, and realizing regularization of the features by aligning the input features with standard Gaussian distribution, so as to avoid network overfitting;
and (5) constructing FPN, and fusing features of different scales extracted by the backbone network to realize multi-scale domain feature alignment. Inserting an antagonism generation network regularization module after each scale of FPN is output so as to improve the generalization of the features; wherein FPN represents a feature pyramid network;
step (6), constructing a detection head network, and predicting the position, classification and target center position of a detection target;
and (7) constructing a target center alignment module, inputting the classification output by the classification head and the target center position, integrating the classification output by the classification head and the target center position into a domain attention area for focusing on the characteristics output by the FPN, and performing domain countermeasure learning on the characteristics of the attention area output by the FPN to further improve the domain generalization capability of the extracted characteristics of the FPN.
Further, in the step (2), the domain excitation attention module includes three operations of compression, excitation and classification; the compression operation compresses the input feature map from the dimension W×H×C to 1×1×C by adopting a global average pooling operation, wherein W, H, C represents the width, the height and the dimension of the input feature map respectively;
the excitation operation generates a 1×1×c feature map output by the compression operation through the full connection layer and the activation function ReLU, i.e., the modified linear unit E Intermediate feature F of (2) E Generating a 1 multiplied by C feature map through the full connection layer and an activation function ReLU, and multiplying the feature map by W multiplied by H multiplied by C feature input by the domain excitation attention module according to channel weight;
intermediate features F generated by classification operations with excitation operations E For input, the domain category is output through a full connection layer and an activation function, namely a maximum response extraction function SoftMax.
Further, in the step (3), the backbone network is a residual neural network ResNet-101.
Further, in the step (4), the countermeasure generation network regularization module consists of a global average pooling, a discrimination network and a standard gaussian distribution; wherein the global averaging pooling compresses the input feature map from the dimension w×h×c to 1×1×c; the method comprises the steps of judging whether a network is a full convolution layer of two layers with an activation function, namely a logic Stir function Sigmoid, inputting the characteristics of global average pooling output, and judging whether the input characteristics come from an input of an countermeasure generation network regularization module or come from standard Gaussian distribution sampling; the training discrimination network improves the discrimination accuracy and simultaneously promotes the features extracted by the FPN to obey the standard normal distribution so as to improve the generalization capability of the features.
Further, in the step (5), the FPN uses 5 feature maps of different scales, denoted as F i I= {3,4, …,7}; feature map F 3 Corresponding to the object with the smallest scale, the feature map F 7 Corresponding to the largest scale target.
Further, in the step (6), the FCOS detection head is used to predict the location, classification, and center location of the target.
Further, in the step (7), a region map F of the object is estimated from the class output map of the detection head network obi
Wherein F is cls Representing the network class output, σ represents the sigmoid activation function,the highest response value corresponding to each category in the input vector is taken as output;
further combining the object center position map to calculate a foreground position estimation map:
wherein,representing element-by-element multiplication, F ctr Representing a network output object center position diagram, wherein beta represents a scaling factor ranging from 0 to 1;
foreground position estimation map F CA As a region of interest estimation map in FPN output features, F is taken as CA The FPN characteristics corresponding to the FPN characteristics are multiplied by channels to obtain a weighted characteristic diagram F W ,F W Post-infusion through gradient inversion module GRLJudging the class of the network output domain by entering the domain;
the gradient inversion module is composed of a gradient inversion layer R (x), which is defined by the following formula:
R(x)=x
wherein x represents any input feature, and I represents an identity matrix;
the domain discrimination network consists of two layers of convolution layers with the convolution kernel size of 3, the step length of 1, the same input and output dimensions, the convolution layer with the activation function of ReLU and one layer of convolution kernel of 1, the step length of 1, the output dimension of 2 and the convolution layer with the activation function of softMax;
the domain discrimination accuracy is improved by training the discrimination network, meanwhile, the FPN is promoted to pay attention to the domain invariance of the region of interest in the characteristics, and the generalization capability of the FPN for extracting the characteristics is improved.
Compared with the prior art, the invention has the beneficial effects that: the network structure adopted by the invention has reasonable design, can extract the characteristic of strong generalization capability, weakens the influence of domain distribution offset on the detection result, and has the following advantages:
(1) The invention provides a domain excitation attention module, which acquires the importance degree of different channel feature graphs in a main network on different domain samples by constructing a new excitation neural network; when a new sample is input into the network, the excitation neural network can acquire the similarity degree of the current sample and each domain sample in the source domain, strengthen the characteristic channel corresponding to the source domain similar to the sample according to the similarity degree, and inhibit the characteristic channel corresponding to the source domain irrelevant to the sample.
(2) The invention provides a method for regularizing network extracted features by antagonizing network generation, which promotes the network extracted features to follow normal distribution, thereby avoiding network overfitting Yu Yuanyu and improving domain invariance of the network.
(3) The invention provides a target center domain alignment method, which promotes network enhancement to domain generalization of image foreground features so as to weaken the influence of image background special gift on network domain generalization capability.
Drawings
FIG. 1 is a general flow chart of a domain generalization target detection method based on an countermeasure generation network of the present invention;
FIG. 2 is a detailed block diagram of the stimulus attention module;
FIG. 3 is a detailed block diagram of an countermeasure network regularization module;
FIG. 4 is a graph of the effect of target detection by the method of the present invention.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
As shown in fig. 1, the domain generalization target detection method based on the countermeasure generation network of the present invention includes the following steps:
step (1) giving annotated images I from K different source domains l L.epsilon. {1, … K }, their true notation is y l L epsilon {1, … K } and target domain image I without labels T The goal is to predict target domain image I T Marking y on T
And (2) constructing a domain excitation attention module, inputting the feature map output by the backbone network, and outputting the enhanced feature map.
And (3) constructing a backbone network as a feature extractor, inputting the image and outputting the extracted features. The domain enhancement of features is achieved by inserting a domain stimulus attention module before each pooling operation of the backbone network.
And (4) constructing an countermeasure generation network regularization module, and realizing regularization of the features by aligning the input features with standard Gaussian distribution, so as to avoid network overfitting.
And (5) constructing a Feature Pyramid Network (FPN), and fusing features of different scales extracted by the backbone network to realize multi-scale domain feature alignment. An countermeasure generation network regularization module is inserted after each scale output of the FPN to improve generalization of the features.
And (6) constructing a detection head network, and predicting the position, classification and target center position of the detection target.
And (7) constructing a target center alignment module, inputting the classification output by the classification head and the target center position, integrating the classification output by the classification head and the target center position into a domain attention area for focusing on the characteristics output by the FPN, and performing domain countermeasure learning on the characteristics of the attention area output by the FPN to further improve the domain generalization capability of the extracted characteristics of the FPN.
Further, in the step (2), as shown in fig. 2, the domain excitation attention module includes three operations of compression, excitation and classification. Wherein the compressing operation compresses the input feature map from the dimension w×h×c to 1×1×c using a global averaging pooling operation, wherein W, H, C represents the width, height, and dimension of the input feature map, respectively.
The excitation operation generates a 1×1×c feature map output by the compression operation through the full connection layer and the activation function ReLU, i.e., the modified linear unit E Intermediate feature F of (2) E And generating a 1 multiplied by C characteristic diagram through the full connection layer and the ReLU, and multiplying the characteristic diagram with W multiplied by H multiplied by C characteristic inputted by the domain excitation attention module according to channel weight.
Intermediate features F generated by classification operations with excitation operations E For input, the domain category is output through a full connection layer and an activation function, namely a maximum response extraction function SoftMax.
In the step (3), the backbone network is a residual neural network ResNet-101.
In the step (4), as shown in fig. 3, the countermeasure generation network regularization module is composed of a global average pooling, a discrimination network and a standard gaussian distribution. Wherein the global averaging pooling compresses the input feature map from the dimension w×h×c to 1×1×c; the method is characterized in that the network is judged to be a full convolution layer of two layers with an activation function, namely a logic Stirling function Sigmoid, the characteristics of global average pooling output are input, and whether the characteristics of the input come from the input of the module or come from sampling from standard Gaussian distribution is judged. The training discrimination network improves the discrimination accuracy and simultaneously promotes the features extracted by the FPN to obey the standard normal distribution so as to improve the generalization capability of the features.
In the step (5), FPN is 5 differentA feature map of scale, which can be expressed as F i I= {3,4, …,7}. Feature map F 3 Corresponding to the object with the smallest scale, the feature map F 7 Corresponding to the largest scale target.
In the step (6), the FCOS detection head is used to predict the target position, classification and target center position.
In the step (7), the region map F in which the object exists can be estimated from the class output map of the detection head obj
Wherein F is cls Representing the network class output, σ represents the sigmoid activation function,the highest response value corresponding to each category in the input vector is taken as output.
The foreground position estimation map F can be further calculated by combining the object center position map CA
Wherein,representing element-by-element multiplication, F ctr The object center position diagram output by the network is shown, and beta represents a scaling factor ranging from 0 to 1.
Foreground position estimation map F CA As a region of interest estimation map in FPN output features, F is taken as CA The FPN characteristics corresponding to the FPN characteristics are multiplied by channels to obtain a weighted characteristic diagram F W 。F W And judging the category of the network output domain by the input domain after passing through the gradient inversion module GRL.
The gradient inversion module is composed of a gradient inversion layer R (x), which is defined by the following formula:
R(x)=x
where x represents any input feature and I represents an identity matrix.
The domain discrimination network consists of two layers of convolution layers with the convolution kernel size of 3, the step length of 1, the same input and output dimensions, the convolution layer with the activation function of ReLU and one layer of convolution kernel of 1, the step length of 1, the output dimension of 2 and the convolution layer with the activation function of softMax; the domain discrimination accuracy is improved by training the discrimination network, meanwhile, the FPN is promoted to pay attention to the domain invariance of the region of interest in the characteristics, and the generalization capability of the FPN for extracting the characteristics is improved.
In combination with the above steps, the specific formulas of the present invention include:
(1) In order to train the parameters of the domain excitation attention module in step (2), a domain excitation attention loss function L is proposed atten
Wherein N represents the number of input samples, M represents the number of domain categories of the source domain, y id Domain tag truth value, p, representing sample id Representing the domain class of the output of the excitation neural network.
(2) In order to train the parameters of the anti-generation network regularization module in the step (4), a distributed regularization loss function L is provided regular
L regular =E h~q(h) logD(h)+E x~p(x) log(1-D(G(x)))
Wherein q (h) obeys standard normal distribution, p (x) is distribution of input data, D represents domain discrimination network, G represents feature extraction network, E h~q(h) Representing the expectation on the distribution q (h), E x~p(x) Indicating the desire over the distribution p (x).
(3) To train the parameters of the target central domain alignment module in step (7),proposing a loss function L of a target central domain alignment module CA
Wherein D represents a domain tag, D CA Is a domain arbiter;respectively representing foreground position estimation graphs corresponding to a source domain and a target domain; f (F) s ,F t Respectively representing network feature graphs corresponding to a source domain and a target domain; />Representing element-by-element multiplication; for any one of the characteristic diagrams A, A (u,v) The feature map is represented by a feature corresponding to the position (u, v).
(4) Network overall loss function L:
L=L det +αL atten +βL regular +γL CA
wherein alpha, beta, gamma are weights for balancing losses, L det To detect the loss function, and:
L det =L cls +L reg +L ctr
wherein L is cls Representing head loss, L reg Represents regression head loss, L ctr Indicating a loss of center position.
Fig. 4 shows the results of a test in a source domain dataset after training the detector of the present invention using the source domain dataset. The method improves generalization of the extracted features in the detection process, and can obtain a good detection result for the target domain image which does not appear in the training process.
It is emphasized that: the above embodiments are merely preferred embodiments of the present invention, and the present invention is not limited in any way, and any simple modification, equivalent variation and modification made to the above embodiments according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims (6)

1. The domain generalization target detection method based on the countermeasure generation network is characterized by comprising the following steps of:
step (1) giving annotated images I from K different source domains l L.epsilon. {1, … K }, their true labels y l L epsilon {1, … K } and target domain image I without labels T The goal is to predict annotation y on the target domain image T
Step (2) constructing a domain excitation attention module, wherein the domain excitation attention module inputs a feature map output by a backbone network and outputs the enhanced feature map; the domain excitation attention module comprises three operations of compression, excitation and classification; the compression operation compresses the input feature map from the dimension W×H×C to 1×1×C by adopting a global average pooling operation, wherein W, H, C represents the width, the height and the dimension of the input feature map respectively;
the excitation operation generates a 1×1×c feature map output by the compression operation through the full connection layer and the activation function ReLU, i.e., the modified linear unit E Intermediate feature F of (2) E Generating a 1 multiplied by C feature map through the full connection layer and an activation function ReLU, and multiplying the feature map by W multiplied by H multiplied by C feature input by the domain excitation attention module according to channel weight;
intermediate features F generated by classification operations with excitation operations E As input, outputting domain category through a full connection layer and an activation function, namely a maximum response extraction function SoftMax;
step (3) constructing a backbone network as a feature extractor, inputting an image and outputting the extracted features; the domain excitation attention module is inserted before each pooling operation of the backbone network to realize the domain enhancement of the features;
step (4) constructing an countermeasure generation network regularization module, and realizing regularization of the features by aligning the input features with standard Gaussian distribution, so as to avoid network overfitting; the countermeasure generation network regularization module consists of a global average pooling, a discrimination network and standard Gaussian distribution; wherein the global averaging pooling compresses the input feature map from the dimension w×h×c to 1×1×c; the method comprises the steps of judging whether a network is a full convolution layer of two layers with an activation function, namely a logic Stir function Sigmoid, inputting the characteristics of global average pooling output, and judging whether the input characteristics come from an input of an countermeasure generation network regularization module or come from standard Gaussian distribution sampling; training a discrimination network to improve the discrimination accuracy and simultaneously promote the features extracted by the FPN to obey standard normal distribution so as to improve the generalization capability of the features;
constructing FPN, and fusing features of different scales extracted by a backbone network to realize multi-scale domain feature alignment; inserting an antagonism generation network regularization module after each scale of FPN is output so as to improve the generalization of the features; wherein FPN represents a feature pyramid network;
step (6), constructing a detection head network, and predicting the position, classification and target center position of a detection target;
and (7) constructing a target center alignment module, inputting the classification output by the classification head and the target center position, integrating the classification output by the classification head and the target center position into a domain attention area for focusing on the characteristics output by the FPN, and performing domain countermeasure learning on the characteristics of the attention area output by the FPN to further improve the domain generalization capability of the extracted characteristics of the FPN.
2. The method for detecting a domain generalization target based on an countermeasure generation network according to claim 1, wherein: in the step (3), the backbone network is a residual neural network ResNet-101.
3. The method for detecting a domain generalization target based on an countermeasure generation network according to claim 1, wherein: in the step (5), the FPN uses 5 feature maps with different scales, which are respectively denoted as F i I= {3,4, …,7}; feature map F 3 Corresponding to the object with the smallest scale, the feature map F 7 Corresponding to the largest scale target.
4. The method for detecting a domain generalization target based on an countermeasure generation network according to claim 1, wherein: in the step (6), the FCOS detection head is used to predict the target position, classification and target center position.
5. The method for detecting a domain generalization target based on an countermeasure generation network according to claim 1, wherein: in the step (7), estimating a region map F of the object from the class output map of the detection head network obj
Wherein F is cls Representing the network class output, σ represents the sigmoid activation function,the highest response value corresponding to each category in the input vector is taken as output;
further combining the object center position map to calculate a foreground position estimation map:
wherein,representing element-by-element multiplication, F ctr Representing a network output object center position diagram, wherein beta represents a scaling factor ranging from 0 to 1;
foreground position estimation map F CA As a region of interest estimation map in FPN output features, F is taken as CA The FPN characteristics corresponding to the FPN characteristics are multiplied by channels to obtain a weighted characteristic diagram F W ,F W Judging the category of the network output domain by the input domain after passing through the gradient inversion module GRL;
the gradient inversion module is composed of a gradient inversion layer R (x), which is defined by the following formula:
R(x)=x
wherein x represents any input feature, and I represents an identity matrix;
the domain discrimination network consists of two layers of convolution layers with the convolution kernel size of 3, the step length of 1, the same input and output dimensions, the convolution layer with the activation function of ReLU and one layer of convolution kernel of 1, the step length of 1, the output dimension of 2 and the convolution layer with the activation function of softMax;
the domain discrimination accuracy is improved by training the discrimination network, meanwhile, the FPN is promoted to pay attention to the domain invariance of the region of interest in the characteristics, and the generalization capability of the FPN for extracting the characteristics is improved.
6. The method for detecting a domain generalization target based on an countermeasure generation network according to claim 1, wherein: in order to train the parameters of the domain excitation attention module in step (2), a domain excitation attention loss function L is proposed atten
Wherein N represents the number of input samples, M represents the number of domain categories of the source domain, y id Domain tag truth value, p, representing sample id Representing domain categories of the excitation neural network output;
in order to train the parameters of the anti-generation network regularization module in the step (4), a distributed regularization loss function L is provided regular
L regular =E h~q(h) logD(h)+E x~p(x) log(1-D(G(x)))
Wherein q (h) obeys standard normal distribution, p (x) is distribution of input data, D represents domain discrimination network, G represents feature extraction network, E h~q(h) Representing the expectation on the distribution q (h), E x~p(x) Representing the desire over the distribution p (x);
for the purpose ofTraining parameters of the target central domain alignment module in the step (7), and providing a loss function L of the target central domain alignment module CA
Wherein D represents a domain tag, D CA Is a domain arbiter;respectively representing foreground position estimation graphs corresponding to a source domain and a target domain; f (F) s ,F t Respectively representing network feature graphs corresponding to a source domain and a target domain; />Representing element-by-element multiplication; for any one of the characteristic diagrams A, A (u,v) Representing the feature corresponding to the position (u, v) of the feature map;
network overall loss function L:
L=L det +αL atten +βL regular +γL CA
wherein alpha, beta, gamma are weights for balancing losses, L det To detect the loss function, and:
L det =L cls +L reg +L ctr
wherein L is cls Representing head loss, L reg Represents regression head loss, L ctr Indicating a loss of center position.
CN202310999356.XA 2023-08-09 2023-08-09 Domain generalization target detection method based on countermeasure generation network Active CN116883681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310999356.XA CN116883681B (en) 2023-08-09 2023-08-09 Domain generalization target detection method based on countermeasure generation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310999356.XA CN116883681B (en) 2023-08-09 2023-08-09 Domain generalization target detection method based on countermeasure generation network

Publications (2)

Publication Number Publication Date
CN116883681A CN116883681A (en) 2023-10-13
CN116883681B true CN116883681B (en) 2024-01-30

Family

ID=88256935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310999356.XA Active CN116883681B (en) 2023-08-09 2023-08-09 Domain generalization target detection method based on countermeasure generation network

Country Status (1)

Country Link
CN (1) CN116883681B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
AU2020103905A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning
CN112668594A (en) * 2021-01-26 2021-04-16 华南理工大学 Unsupervised image target detection method based on antagonism domain adaptation
CN112800906A (en) * 2021-01-19 2021-05-14 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN113936143A (en) * 2021-09-10 2022-01-14 北京建筑大学 Image identification generalization method based on attention mechanism and generation countermeasure network
WO2022036777A1 (en) * 2020-08-21 2022-02-24 暨南大学 Method and device for intelligent estimation of human body movement posture based on convolutional neural network
CN114596477A (en) * 2022-03-16 2022-06-07 东南大学 Foggy day train fault detection method based on field self-adaption and attention mechanism
CN114692741A (en) * 2022-03-21 2022-07-01 华南理工大学 Generalized face counterfeiting detection method based on domain invariant features
CN116452862A (en) * 2023-03-30 2023-07-18 华南理工大学 Image classification method based on domain generalization learning

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111739078B (en) * 2020-06-15 2022-11-18 大连理工大学 Monocular unsupervised depth estimation method based on context attention mechanism
CN112308158B (en) * 2020-11-05 2021-09-24 电子科技大学 Multi-source field self-adaptive model and method based on partial feature alignment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022036777A1 (en) * 2020-08-21 2022-02-24 暨南大学 Method and device for intelligent estimation of human body movement posture based on convolutional neural network
CN112200161A (en) * 2020-12-03 2021-01-08 北京电信易通信息技术股份有限公司 Face recognition detection method based on mixed attention mechanism
AU2020103905A4 (en) * 2020-12-04 2021-02-11 Chongqing Normal University Unsupervised cross-domain self-adaptive medical image segmentation method based on deep adversarial learning
CN112800906A (en) * 2021-01-19 2021-05-14 吉林大学 Improved YOLOv 3-based cross-domain target detection method for automatic driving automobile
CN112668594A (en) * 2021-01-26 2021-04-16 华南理工大学 Unsupervised image target detection method based on antagonism domain adaptation
CN113936143A (en) * 2021-09-10 2022-01-14 北京建筑大学 Image identification generalization method based on attention mechanism and generation countermeasure network
CN114596477A (en) * 2022-03-16 2022-06-07 东南大学 Foggy day train fault detection method based on field self-adaption and attention mechanism
CN114692741A (en) * 2022-03-21 2022-07-01 华南理工大学 Generalized face counterfeiting detection method based on domain invariant features
CN116452862A (en) * 2023-03-30 2023-07-18 华南理工大学 Image classification method based on domain generalization learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A New Adversarial Domain Generalization Network Based on Class Boundary Feature Detection for Bearing Fault Diagnosis;Jingde Li 等;《IEEE Transactions on Instrumentation and Measurement ( Volume: 71)》;全文 *
基于特征金字塔的多尺度特征融合网络;郭启帆;刘磊;张珹;徐文娟;靖稳峰;;工程数学学报(第05期);全文 *
弱监督场景下的行人重识别研究综述;祁磊;于沛泽;高阳;;软件学报(第09期);全文 *

Also Published As

Publication number Publication date
CN116883681A (en) 2023-10-13

Similar Documents

Publication Publication Date Title
CN110348319B (en) Face anti-counterfeiting method based on face depth information and edge image fusion
Li et al. Learning deep semantic segmentation network under multiple weakly-supervised constraints for cross-domain remote sensing image semantic segmentation
CN111080629B (en) Method for detecting image splicing tampering
Bai et al. Edge-guided recurrent convolutional neural network for multitemporal remote sensing image building change detection
Wan et al. DA-RoadNet: A dual-attention network for road extraction from high resolution satellite imagery
CN111783576A (en) Pedestrian re-identification method based on improved YOLOv3 network and feature fusion
CN107784288A (en) A kind of iteration positioning formula method for detecting human face based on deep neural network
CN111709313B (en) Pedestrian re-identification method based on local and channel combination characteristics
CN111339975A (en) Target detection, identification and tracking method based on central scale prediction and twin neural network
CN113298815A (en) Semi-supervised remote sensing image semantic segmentation method and device and computer equipment
CN112418351B (en) Zero sample learning image classification method based on global and local context sensing
CN113011357A (en) Depth fake face video positioning method based on space-time fusion
Wan et al. AFSar: An anchor-free SAR target detection algorithm based on multiscale enhancement representation learning
CN113936195A (en) Sensitive image recognition model training method and device and electronic equipment
Xu et al. UCDFormer: Unsupervised change detection using a transformer-driven image translation
Li et al. Self-supervised coarse-to-fine monocular depth estimation using a lightweight attention module
Banerjee et al. Explaining deep-learning models using gradient-based localization for reliable tea-leaves classifications
CN116883681B (en) Domain generalization target detection method based on countermeasure generation network
Yang et al. HeadPose-Softmax: Head pose adaptive curriculum learning loss for deep face recognition
CN116912670A (en) Deep sea fish identification method based on improved YOLO model
Pei et al. FGO-Net: Feature and Gaussian Optimization Network for visual saliency prediction
Zhang et al. A Spectrum-Aware Transformer Network for Change Detection in Hyperspectral Imagery
Zhang et al. Research on Mask Wearing Detection Algorithm in Complex Scenes
CN117830874B (en) Remote sensing target detection method under multi-scale fuzzy boundary condition
Peng et al. High-Precision Surface Crack Detection for Rolling Steel Production Equipment in ICPS

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant