WO2023092582A1 - A scene adaptive target detection method based on motion foreground - Google Patents
A scene adaptive target detection method based on motion foreground Download PDFInfo
- Publication number
- WO2023092582A1 WO2023092582A1 PCT/CN2021/134085 CN2021134085W WO2023092582A1 WO 2023092582 A1 WO2023092582 A1 WO 2023092582A1 CN 2021134085 W CN2021134085 W CN 2021134085W WO 2023092582 A1 WO2023092582 A1 WO 2023092582A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- domain
- features
- source domain
- boxes
- Prior art date
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 43
- 230000003044 adaptive effect Effects 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 claims abstract description 23
- 230000000694 effects Effects 0.000 claims abstract description 13
- 230000006870 function Effects 0.000 claims description 31
- 238000005070 sampling Methods 0.000 claims description 25
- 238000012549 training Methods 0.000 claims description 17
- 230000002776 aggregation Effects 0.000 claims description 16
- 238000004220 aggregation Methods 0.000 claims description 16
- 238000000605 extraction Methods 0.000 claims description 11
- 238000005259 measurement Methods 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 6
- 238000010606 normalization Methods 0.000 claims description 4
- 230000004913 activation Effects 0.000 claims description 2
- 238000004364 calculation method Methods 0.000 claims description 2
- 230000008030 elimination Effects 0.000 claims 1
- 238000003379 elimination reaction Methods 0.000 claims 1
- 238000002372 labelling Methods 0.000 claims 1
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000005516 engineering process Methods 0.000 abstract description 2
- 238000005286 illumination Methods 0.000 abstract 1
- 230000008569 process Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 3
- 238000012360 testing method Methods 0.000 description 2
- 101100149551 Dictyostelium discoideum fpaA gene Proteins 0.000 description 1
- 101100149554 Dictyostelium discoideum fpaB-2 gene Proteins 0.000 description 1
- 101100156375 Fowlpox virus (strain NVSL) VLTF2 gene Proteins 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/62—Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Definitions
- the invention relates to a scene adaptive target detection method based on motion foreground.
- target detection is an important topic.
- the task of target detection is to find the region of interest in images and videos, and to determine its categories and locations.
- many methods based on deep learning can achieve good results on benchmark data sets.
- the simplest and most effective way to solve this problem is to train models in the same domain. But, on the one hand, using manual annotation of datasets costs a lot of manpower and resources, on the other hand, many practical fields cannot be manually annotated. Therefore, in order to solve the degradation of model performance caused by different data distribution, the target detection method based on domain adaptive came into being.
- target detection methods based on domain adaptive include feature-based methods and model-based methods.
- the most classical method is to minimize the field difference of features through adversarial training, so that the domain ofproposal box features cannot be distinguished.
- Many relevant algorithms are improved based on this algorithm. This algorithm is called DA-FasterRcnn.
- Another class of algorithms realizes pixel level field alignment through adversarial generation.
- the present invention provides a scene adaptive target detection method based on motion foreground.
- a scene adaptive target detection method based on motion foreground includes the following steps:
- the source domain dataset contains source domain RGB images, target detection manual labels and motion foreground target boxes labels;
- the target domain dataset contains target domain RGB images and motion foreground target boxes labels;
- step C) sending the source domain features and the target domain features from step B) into first proposal boxes and foreground boxes aggregation module to obtain the source domain instance features and the target domain instance features;
- step D) sending the source domain features from step B) into second proposal boxes and foreground boxes aggregation module to obtain the source domain classification regression features;
- step E) sending the source domain instance features and the target domain instance features from step C) into generating similarity measurement module to compute loss; the network is optimized, and the domain difference is reduced by this way;
- step F) sending the source domain classification regression features from step D) into classification regression module to compute loss and to optimize the network;
- step G) sending the source domain features and the target domain features from step B) into global feature alignment module to compute loss and to optimize the network.
- the acquisition methods of the motion foreground target boxes in step A) includes VIBE, Gaussian Mixture Model, frame difference method and optical flow.
- the RPN proposal boxes with high confidence is combined with the motion foreground target boxes.
- the source domain instance features and the target domain instance features are extracted.
- the RPN poposal boxes is combined with the motion foreground target boxes. Then, the source domain classification regression features are extracted.
- the decoder is used to reconstruct the source domain instance features and the target domain instance features, and the decoding features are obtained. Then, the similarity loss of decoding features is calculated to achieve instance feature alignment.
- the classification regression loss of the source domain datasets is calculated by the source target boxes truth label for ensuring the accuracy of source domain target detection.
- the global feature alignment module includes gradient reversal layer (GRL) and classifier for achieving image level feature alignment.
- GTL gradient reversal layer
- the improvements of the present invention over the prior art include that a scene adaptive target detection method based on motion foreground is provided in which the prior knowledge of motion foreground is effectively utilized, that the decoder is used for feature alignment to obtain better detection effect, and that the generalization performance of the model in new scenes is improved.
- FIG. 1 is a flow chart of a scene adaptive target detection method based on the motion foreground according to an embodiment of the invention
- FIG. 2 is the schematic diagram of first proposal boxes and foreground boxes aggregation module (PFA1) according to an embodiment of the invention
- FIG. 3 is the schematic diagram of second proposal boxes and foreground boxes aggregation module (PFA2) according to an embodiment of the invention
- FIG. 4 is the schematic diagram of generative similarity measurement (GSM) module according to an embodiment of the invention.
- FIG. 5 is the schematic diagram of global feature alignment (GFA) module according to an embodiment of the invention.
- GFA global feature alignment
- the source domain dataset is , where ns represents the actual number of samples in the source domain, represents the source domain sample I, represents the set of target box coordinate values for the source domain sample i, and represents the target category for the source domain sample i.
- ns represents the actual number of samples in the source domain
- the target domain dataset is nT represents the actual number of samples in the target domain.
- represents the target domain sample i. represents the set ofmotion foreground target boxes coordinate values for the target domain sample i.
- the scene adaptive target detection method based on the motion foreground trains model using the source domain data and the motion foreground data of the target domain, so that the model can have good detection effect even without the target domain (T) annotation data set; the method includes the following steps:
- ResNet-101 for example is used as the backbone network of the feature extraction module in an embodiment of the invention
- GSM similarity measurement
- G sending the source domain features f1 and the target domain features f2 into global feature alignment (GFA) module, to allow that the features of source domain and target domain are similar as much as possible by training the source domain data set and target domain data set, that the network weights of feature extraction module and GFA module (gradient reversal layer (GRL) and classifier) is optimized, and that the domain of source domain feature f1 and target domain feature f2 cannot be distinguished.
- GFA global feature alignment
- the first proposal boxes and foreground boxes aggregation module PFA1 comprises sub-modules for performing the following steps respectively:
- Step S201 sending the continuous frame sample set in the source domain and the continuous frame sample set in the target domain into RPN network (region proposal network) to obtain the set of positive and negative proposal boxes where represents jth proposal box in the ith image sample of source domain and target domain, C represents the number of proposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention, represents ith image sample in the source domain, and represents ith image sample in the target domain;
- RPN network region proposal network
- Step S202 sending the continuous frame sample set in the source domain and the continuous frame sample set in the target domain into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes and the target domain motion foreground target boxes where fb i represents the ith set of motion foreground target boxes;
- Step S211 selecting the proposal boxes whose confidence is greater than the preset threshold TH in the set of positive and negative proposal boxes where TH is set to 0.7 in an embodiment of the invention
- Step S212 merging the proposal boxes obtained in S211 with the motion foreground target boxes in S202;
- fixed sample number f_num is set, which is set to 8 in an embodiment of the invention, so that the number of PFA1 proposal boxes in the ith sample in the source domain (S) is consistent with that in the ith sample in the target domain (T) , so as to eliminate sample imbalance.
- the second proposal boxes and foreground boxes aggregation module PFA2 comprises sub-modules for performing the following steps respectively:
- Step S301 sending the continuous frame sample set in the source domain into RPN network (region proposal network) to get the set of source domain positive and negative proposal boxes where represents the jth proposal box in the ith image sample of source domain, and C represents the number ofproposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention;
- Step S302 sending the continuous frame sample set in the source domain into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes where fb i represents the ith set ofmotion foreground target boxes;
- Step S311 merging the set of source domain positive and negative proposal boxes obtained in S301 with the source domain motion foreground target boxes in S302 to obtain the source domain PFA2 proposal boxes set to overcome the problem that the two domains cannot generate accurate proposal boxes when the target size difference is large by adding motion foreground target boxes, where ⁇ b ia ⁇ j represents jth set of the PFA2 proposal boxes in the ith image sample of the dataset, “a” represents the set of the proposal boxes generated in PFA2, S represents the source domain, and C Sa represents the number of boxes in the set of proposal boxes and motion foreground target boxes in the source domain.
- the source domain feature f1 and PFA2 proposal boxes set are input into classifiers and regressors to complete regression and classification of samples and the loss function of this part is as follows:
- L det represents the loss function of the source domain detection including L RPN (RPN loss function) and L T (classification regression loss function)
- det refers to the name of the total loss function of the classification regression module
- RPN refers to the name of the loss function of the first stage RPN stage of the two-stage target detection framework
- T refers to the name of the loss function of the second stage classification regression stage of the two-stage target detection framework.
- cross entropy loss is used for classification loss and mean square error (MSE) loss is used for regression loss.
- the generative similarity measurement module includes sub-modules for performing the following operations respectively:
- Step S401 sending PFA1 proposal boxes set generated by PFA1 module into the source domain feature f1 and the target domain feature f2 extracted by the feature extraction module, to obtain the source domain instance featuref S and the target domain instance feature f T ;
- Step S402 sending the source domain instance feature f S and the target domain instance feature f T into an adaptive average pooling layer to obtain source domain pooling feature and target domain pooling feature f Ss402 , f Ts402 , which has an output size of8*8 and a number of channels that is equal to the number of channels of the source domain instance features f S ;
- Step S403 sending the output from S402 into a first 1*1 convolution layer to obtain first source domain convolution feature and second target domain convolution feature f Ss403 , f Ts403 , which each has a number of channels of 1024 in an embodiment of the invention;
- Step S404 sending the output from S403 into a first up-sampling layer to obtain first source domain up-sampling feature and first target domain up-sampling feature f Ss404 , f Ts404 , which each has an output size of 16*16 and a number of channels of 256 in an embodiment of the invention, where the up-sampling module comprises an up-sampling layer of interpolation, a convolution layer and a batch normalization layer;
- Step S405 sending the output from S404 into a second up-sampling layer to obtain second source domain up-sampling feature and second target domain up-sampling feature f Ss405 , f Ts405 , which each has an output size of32*32 and a number of channels of256 in an embodiment of the invention;
- Step S406 sending the output from S405 into a third up-sampling layer to obtain third source domain up-sampling feature and third target domain up-sampling feature f Ss406 , f Ts406 , which each has an output size of 64*64 and a number of channels of 256 in an embodiment of the invention;
- Step S407 sending the output from S406 into a second 1*1 convolution layer to obtain source domain decoding feature and target domain decoding feature f SG , f TG , which each has a number of channels of 3 in an embodiment of the invention;
- E represents the perceived loss, which is a loss function used to measure the similarity between images
- L ins represents the perceived loss value of the source domain decoding feature and the target domain decoding feature f SG , f TG
- E is the calculation function of perceived loss (the perceived loss function is the existing technology)
- G (S) , G (T) respectively represent the source domain decoding features and the target domain decoding features f SG , f TG generated by steps S402-S407, when the source domain instance feature and the target domain instance feature f S , f T are input shared decoder G.
- This scheme can effectively measure the similarity between the instance features of the two domains (the source domain and the target domain) .
- the instance features of the source domain and the instance features of the target domain can be as similar as possible, and the accuracy of classification regression module in target domain can be guaranteed.
- the use of decoder enhances the generalization performance of the model, reduces the risk of the model overfitting and reduces the failure rate of the model training.
- the global feature alignment module GFA comprises sub-modules for performing the following operations respectively:
- Step S501 obtaining the source domain feature f1 and the target domain feature f2generated
- Step S502 sending the source domain feature f1 and the target domain feature f2 into GRL (gradient reversal layer) .
- GRL gradient reversal layer
- the loss the difference between the predicted value and the real value
- each layer network calculates the gradient according to the transfer loss, and then updates the parameters of the layer network.
- GRL layer inverts the errors transmitted to this layer, so that the network training objectives before and after GRL are opposite, so as to achieve the effect of confrontation;
- Step S503 sending the source domain feature f1 and the target domain feature f2 into a classifier to distinguish the source domain feature from the target domain feature, where the classifier includes the convolution layer, activation layer, for performing the operations in Steps S511-S513 respectively;
- loss function of global feature alignment module GFA is the loss function ofclassifierL img .
- it is the cross entropy loss function:
- N is the number of all samples in the source domain and target domain
- i is the sample label number
- y i is whether the actual label of the sample belongs to the source domain or the target domain
- p i is the probability that the sample belongs to different categories after classifier.
- the final global loss function is:
- ⁇ 1 , ⁇ 2 are empirical values for measuring the contribution value of three losses to the final loss. In this embodiment, it is taken as 1.
- the advantages of the invention include:
- the present invention abandens the existing feature alignment mode based classifier.
- a decoder is used to reduce over-fitting. Its loss function is the perceptual loss. The effect of the model in the target domain is greatly improved by this way.
- sample equalization is effectively realized through sample equalization filter.
- the inventors conducted the following experiments, in which the testing process only followed the testing process of the two-stage detection algorithm, so the speed was consistent with the conventional two-stage algorithm.
- the trained model achieved good results in both source and target domains.
- the source domain dataset and target domain dataset adopted in the specific implementation scheme were captured in a real world scenario. They were named DML dataset and ZN dataset respectively, where DML dataset was the source domain dataset and ZN dataset was the target domain dataset.
- DA-FasterRCNN is an algorithm of the classical domain adaptive detection.
- Method PFA1 is an improved method by adding first proposal boxes and foreground boxes aggregation (PFA1) module to the classical algorithm, that is, adding the motion foreground target boxes to the RPN proposal boxes. It can be seen that the method of the invention effectively improved the detection effect in the target domain.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Medical Informatics (AREA)
- Image Analysis (AREA)
Abstract
With the development of deep learning technology, the requirements for model generalization performance in real environment is increasing gradually. The influence of illumination and background on model generalization performance has been widely concerned. The invention discloses a scene adaptive target detection method based on motion foreground. In this method, we use the motion foreground target boxes effectively by using a prior of the consistency of motion foreground and global target data distribution, and calculate the instance feature similarity through the decoder, which greatly improves the effect of the model in the target domain. The experimental results show that the target detection effect of the method is greatly improved in the real environment.
Description
The invention relates to a scene adaptive target detection method based on motion foreground.
Background art
In the field of computer vision, target detection is an important topic. The task of target detection is to find the region of interest in images and videos, and to determine its categories and locations. At present, many methods based on deep learning can achieve good results on benchmark data sets. However, due to the existence of domain differences, when the target size, camera angle, lighting and background environment change, the effect of the model has been reduced. The simplest and most effective way to solve this problem is to train models in the same domain. But, on the one hand, using manual annotation of datasets costs a lot of manpower and resources, on the other hand, many practical fields cannot be manually annotated. Therefore, in order to solve the degradation of model performance caused by different data distribution, the target detection method based on domain adaptive came into being.
At present, target detection methods based on domain adaptive include feature-based methods and model-based methods. The most classical method is to minimize the field difference of features through adversarial training, so that the domain ofproposal box features cannot be distinguished. Many relevant algorithms are improved based on this algorithm. This algorithm is called DA-FasterRcnn. Another class of algorithms realizes pixel level field alignment through adversarial generation.
But in the above algorithms only the domain differences in classification is considered, while the domain differences in regression is not considered, leading to that the model effect is not ideal when the scene changes. In addition, due to that data distribution is not known, suitable proposal boxes cannot be extracted in the one-stage extraction of candidate regions RPN stage of two-stage target detection, and it is also impossible to determine which regions of features need to be aligned during feature alignment stage.
Summary of the invention
To solve the above technical problems in the prior art, the present invention provides a scene adaptive target detection method based on motion foreground.
According to one aspect of the invention, a scene adaptive target detection method based on motion foreground is provided, which includes the following steps:
A) Obtaining the source domain dataset and the target domain dataset; the source domain dataset contains source domain RGB images, target detection manual labels and motion foreground target boxes labels; the target domain dataset contains target domain RGB images and motion foreground target boxes labels;
B) sending the source domain dataset and the target domain dataset into the feature extraction module to obtain the source domain features and the target domain features;
C) sending the source domain features and the target domain features from step B) into first proposal boxes and foreground boxes aggregation module to obtain the source domain instance features and the target domain instance features;
D) sending the source domain features from step B) into second proposal boxes and foreground boxes aggregation module to obtain the source domain classification regression features;
E) sending the source domain instance features and the target domain instance features from step C) into generating similarity measurement module to compute loss; the network is optimized, and the domain difference is reduced by this way;
F) sending the source domain classification regression features from step D) into classification regression module to compute loss and to optimize the network;
G) sending the source domain features and the target domain features from step B) into global feature alignment module to compute loss and to optimize the network.
The acquisition methods of the motion foreground target boxes in step A) includes VIBE, Gaussian Mixture Model, frame difference method and optical flow.
In the training process, the RPN proposal boxes with high confidence is combined with the motion foreground target boxes. After the samples is equalized, the source domain instance features and the target domain instance features are extracted.
In the training process, the RPN poposal boxes is combined with the motion foreground target boxes. Then, the source domain classification regression features are extracted.
In the training process, the decoder is used to reconstruct the source domain instance features and the target domain instance features, and the decoding features are obtained. Then, the similarity loss of decoding features is calculated to achieve instance feature alignment.
In the training process, the classification regression loss of the source domain datasets is calculated by the source target boxes truth label for ensuring the accuracy of source domain target detection.
The global feature alignment module includes gradient reversal layer (GRL) and classifier for achieving image level feature alignment.
The improvements of the present invention over the prior art include that a scene adaptive target detection method based on motion foreground is provided in which the prior knowledge of motion foreground is effectively utilized, that the decoder is used for feature alignment to obtain better detection effect, and that the generalization performance of the model in new scenes is improved.
FIG. 1 is a flow chart of a scene adaptive target detection method based on the motion foreground according to an embodiment of the invention;
FIG. 2 is the schematic diagram of first proposal boxes and foreground boxes aggregation module (PFA1) according to an embodiment of the invention;
FIG. 3 is the schematic diagram of second proposal boxes and foreground boxes aggregation module (PFA2) according to an embodiment of the invention;
FIG. 4 is the schematic diagram of generative similarity measurement (GSM) module according to an embodiment of the invention;
FIG. 5 is the schematic diagram of global feature alignment (GFA) module according to an embodiment of the invention.
In the embodiment according to the present invention as shown in Fig. 1, the source domain dataset is
, where ns represents the actual number of samples in the source domain,
represents the source domain sample I,
represents the set of target box coordinate values for the source domain sample i, and
represents the target category for the source domain sample i. There is only a single category of pedestrian in this embodiment.
represents the set of motion foreground target boxes coordinate values for the source domain sample i. The num ofboxes in
and the number of boxes in
are inconsistent. The target domain dataset is
nT represents the actual number of samples in the target domain.
represents the target domain sample i.
represents the set ofmotion foreground target boxes coordinate values for the target domain sample i.
The scene adaptive target detection method based on the motion foreground according to the present invention trains model using the source domain data and the motion foreground data of the target domain, so that the model can have good detection effect even without the target domain (T) annotation data set; the method includes the following steps:
A) sending the continuous frame sample set in the source domain
and the continuous frame sample set in the target domain
into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes and the target domain motion foreground target boxes
(S represents the source domain and T represents the target domain) , so the source domain dataset D
S and the target domain dataset D
T are obtained;
B) sending the source domain dataset D
S and the target domain dataset D
T into the feature extraction module (S101) to obtain the source domain features f1 and the target domain features f2; ResNet-101 for example is used as the backbone network of the feature extraction module in an embodiment of the invention;
C) sending the source domain features f1 and the source domain motion foreground target boxes
into PFA1 to obtain the source domain instance features pfs (S112) , and sending the target domain features f2 and the target domain motion foreground target boxes
into PFA1 to obtain the target domain instance features pft (S113) ;
D) sending the source domain features f1 and the source domain motion foreground target boxes
into PFA2 to obtain the source domain classification regression features crs (S112) ;
E) sending the source domain classification regression features crs into classification regression module (S121) , where the classification regression loss of the source domain dataset is calculated by the source target boxes truth label, so that the network weights of feature extraction module and classification regression module are optimized by training the source domain dataset;
F) sending the source domain instance features pfs and the target domain instance features pft into generating similarity measurement (GSM) module (S122) , to allow that the instance features of source domain and target domain are as similar as possible, by training the source domain dataset and target domain dataset, that the network weights of feature extraction module and GSM module are optimized, and that the generalization performance of the model is improved;
G) sending the source domain features f1 and the target domain features f2 into global feature alignment (GFA) module, to allow that the features of source domain and target domain are similar as much as possible by training the source domain data set and target domain data set, that the network weights of feature extraction module and GFA module (gradient reversal layer (GRL) and classifier) is optimized, and that the domain of source domain feature f1 and target domain feature f2 cannot be distinguished.
According to a further aspect of the invention, as shown in FIG. 2, the first proposal boxes and foreground boxes aggregation module PFA1 comprises sub-modules for performing the following steps respectively:
Step S201: sending the continuous frame sample set in the source domain
and the continuous frame sample set in the target domain
into RPN network (region proposal network) to obtain the set of positive and negative proposal boxes
where
represents jth proposal box in the ith image sample of source domain and target domain, C represents the number of proposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention,
represents ith image sample in the source domain, and
represents ith image sample in the target domain;
Step S202: sending the continuous frame sample set in the source domain
and the continuous frame sample set in the target domain
into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes and the target domain motion foreground target boxes
where fb
i represents the ith set of motion foreground target boxes;
Step S211: selecting the proposal boxes whose confidence is greater than the preset threshold TH in the set of positive and negative proposal boxes
where TH is set to 0.7 in an embodiment of the invention;
Step S212: merging the proposal boxes obtained in S211 with the motion foreground target boxes
in S202;
Step S213: sending the output of S212 into sample equalization filter to obtain the source domain PFA1 proposal boxes set and the target domain PFA1 proposal boxes set
where {b
if}
jrepresents jth set of the PFA1 proposal boxes in the ith image sample of the dataset, f represents the set of the proposal boxes generated in PFA1, S represents the source domain, T represents the target domain, C
Sf, C
Tf respectively represent the number of boxes in the set of proposal boxes and motion foreground target boxes in the source domain and the number of boxes in the set ofproposal boxes and motion foreground target boxes in the target domain, and C
Sf=C
Tf.
In the sample equalization filter, fixed sample number f_num is set, which is set to 8 in an embodiment of the invention, so that the number of PFA1 proposal boxes in the ith sample in the source domain (S) is consistent with that in the ith sample in the target domain (T) , so as to eliminate sample imbalance.
According to a further aspect of the invention, as shown in FIG. 3, the second proposal boxes and foreground boxes aggregation module PFA2 comprises sub-modules for performing the following steps respectively:
Step S301: sending the continuous frame sample set in the source domain
into RPN network (region proposal network) to get the set of source domain positive and negative proposal boxes
where
represents the jth proposal box in the ith image sample of source domain, and C represents the number ofproposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention;
Step S302: sending the continuous frame sample set in the source domain
into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes
where fb
i represents the ith set ofmotion foreground target boxes;
Step S311: merging the set of source domain positive and negative proposal boxes obtained in S301
with the source domain motion foreground target boxes
in S302 to obtain the source domain PFA2 proposal boxes set
to overcome the problem that the two domains cannot generate accurate proposal boxes when the target size difference is large by adding motion foreground target boxes, where {b
ia}
j represents jth set of the PFA2 proposal boxes in the ith image sample of the dataset, “a” represents the set of the proposal boxes generated in PFA2, S represents the source domain, and C
Sa represents the number of boxes in the set of proposal boxes and motion foreground target boxes in the source domain.
According to a further aspect of the invention, in the classification and regression module S121, the source domain feature f1 and PFA2 proposal boxes set
are input into classifiers and regressors to complete regression and classification of samples and the loss function of this part is as follows:
L
det=L
RPN+L
T,
where L
det represents the loss function of the source domain detection including L
RPN (RPN loss function) and L
T (classification regression loss function) ,
det refers to the name of the total loss function of the classification regression module, RPN refers to the name of the loss function of the first stage RPN stage of the two-stage target detection framework, and T refers to the name of the loss function of the second stage classification regression stage of the two-stage target detection framework. In an embodiment, cross entropy loss is used for classification loss and mean square error (MSE) loss is used for regression loss.
According to a further aspect of the invention, as shown in FIG. 4, the generative similarity measurement module includes sub-modules for performing the following operations respectively:
Step S401: sending PFA1 proposal boxes set
generated by PFA1 module into the source domain feature f1 and the target domain feature f2 extracted by the feature extraction module, to obtain the source domain instance featuref
S and the target domain instance feature f
T;
Step S402: sending the source domain instance feature f
S and the target domain instance feature f
T into an adaptive average pooling layer to obtain source domain pooling feature and target domain pooling feature f
Ss402, f
Ts402, which has an output size of8*8 and a number of channels that is equal to the number of channels of the source domain instance features f
S;
Step S403: sending the output from S402 into a first 1*1 convolution layer to obtain first source domain convolution feature and second target domain convolution feature f
Ss403, f
Ts403, which each has a number of channels of 1024 in an embodiment of the invention;
Step S404: sending the output from S403 into a first up-sampling layer to obtain first source domain up-sampling feature and first target domain up-sampling feature f
Ss404, f
Ts404, which each has an output size of 16*16 and a number of channels of 256 in an embodiment of the invention, where the up-sampling module comprises an up-sampling layer of interpolation, a convolution layer and a batch normalization layer;
Step S405: sending the output from S404 into a second up-sampling layer to obtain second source domain up-sampling feature and second target domain up-sampling feature f
Ss405, f
Ts405, which each has an output size of32*32 and a number of channels of256 in an embodiment of the invention;
Step S406: sending the output from S405 into a third up-sampling layer to obtain third source domain up-sampling feature and third target domain up-sampling feature f
Ss406, f
Ts406, which each has an output size of 64*64 and a number of channels of 256 in an embodiment of the invention;
Step S407: sending the output from S406 into a second 1*1 convolution layer to obtain source domain decoding feature and target domain decoding feature f
SG, f
TG, which each has a number of channels of 3 in an embodiment of the invention;
computing the perceived loss of the source domain decoding feature and the target domain decoding feature f
SG, f
TG to obtain loss L
ins:
L
ins=E (G (S) , G (T) ) ,
where E represents the perceived loss, which is a loss function used to measure the similarity between images; L
ins represents the perceived loss value of the source domain decoding feature and the target domain decoding feature f
SG, f
TG; E is the calculation function of perceived loss (the perceived loss function is the existing technology) ; G (S) , G (T) respectively represent the source domain decoding features and the target domain decoding features f
SG, f
TG generated by steps S402-S407, when the source domain instance feature and the target domain instance feature f
S, f
T are input shared decoder G. This scheme can effectively measure the similarity between the instance features of the two domains (the source domain and the target domain) . Through the training of feature extraction module and GSM module, the instance features of the source domain and the instance features of the target domain can be as similar as possible, and the accuracy of classification regression module in target domain can be guaranteed. And, the use of decoder enhances the generalization performance of the model, reduces the risk of the model overfitting and reduces the failure rate of the model training.
According to a further aspect of the invention, as shown in Figure 5, the global feature alignment module GFA comprises sub-modules for performing the following operations respectively:
Step S501: obtaining the source domain feature f1 and the target domain feature f2generated;
Step S502: sending the source domain feature f1 and the target domain feature f2 into GRL (gradient reversal layer) . In conventional back propagation, the loss (the difference between the predicted value and the real value) is passed forward layer by layer, and each layer network calculates the gradient according to the transfer loss, and then updates the parameters of the layer network. GRL layer inverts the errors transmitted to this layer, so that the network training objectives before and after GRL are opposite, so as to achieve the effect of confrontation;
Step S503: sending the source domain feature f1 and the target domain feature f2 into a classifier to distinguish the source domain feature from the target domain feature, where the classifier includes the convolution layer, activation layer, for performing the operations in Steps S511-S513 respectively;
where the loss function of global feature alignment module GFA is the loss function ofclassifierL
img. In an embodiment of the invention, it is the cross entropy loss function:
where N is the number of all samples in the source domain and target domain, i is the sample label number, y
i is whether the actual label of the sample belongs to the source domain or the target domain, and p
i is the probability that the sample belongs to different categories after classifier.
In an embodiment of the invention, the final global loss function is:
L=L
det+λ
1L
ins+`
2L
img,
whereλ
1, λ
2 are empirical values for measuring the contribution value of three losses to the final loss. In this embodiment, it is taken as 1.
The advantages of the invention include:
(1) The priori ofmotion foreground is fully utilized in the invention and is well integrated into the training framework. By using FPA1 and FPA2, the proposal boxes extracted from RPN network are effectively fused with the motion foreground target boxes, so that the effect of the model is optimized by complementing each other of the two kinds of proposal boxes.
(2) In order to reduce the risk ofmodel overfitting and improve the accuracy of target box regression, the present invention abandens the existing feature alignment mode based classifier. In the case of instance feature alignment, a decoder is used to reduce over-fitting. Its loss function is the perceptual loss. The effect of the model in the target domain is greatly improved by this way.
(3) In the fusion ofproposal boxes extracted from RPN network and motion foreground target boxes, sample equalization is effectively realized through sample equalization filter.
In order to verify the validity of the method, the inventors conducted the following experiments, in which the testing process only followed the testing process of the two-stage detection algorithm, so the speed was consistent with the conventional two-stage algorithm. By adding some components during model training process, the trained model achieved good results in both source and target domains.
The source domain dataset and target domain dataset adopted in the specific implementation scheme were captured in a real world scenario. They were named DML dataset and ZN dataset respectively, where DML dataset was the source domain dataset and ZN dataset was the target domain dataset.
Experimental details: In all experiments, the parameters were consistent with the original DA-FasterRCNN algorithm. ResNET-50 was used for the backbone network, and ImageNet's pre-training weight was used for the initialization of the backbone network. After training 70,000 images, the average precision (map) of the target domain was calculated. All experiments were based on the PyTorch framework, the NVIDIA GTX-2080TI hardware platform was used.
The comparison diagram of experimental results are showed in Table 1. DA-FasterRCNN is an algorithm of the classical domain adaptive detection. Method PFA1 is an improved method by adding first proposal boxes and foreground boxes aggregation (PFA1) module to the classical algorithm, that is, adding the motion foreground target boxes to the RPN proposal boxes. It can be seen that the method of the invention effectively improved the detection effect in the target domain.
Table 1: The Result ofdomain adaptive detection
Claims (6)
- A scene adaptive target detecting method based on motion foreground, for training model based on source domain data and target domain foreground data to allow that the model has a good detection effect also in target domain, including the steps of:A) feeding set of source domain consecutive frame samples and set of target domain consecutive frame samples into motion target detection algorithm to obtain output of motion foreground target box of the source domain consecutive frame samples and motion foreground target box of the target domain consecutive frame samples, wherein the motion foreground target boxes forms source domain data set and target domain data set together with source domain labeling tags;B) feeding the source domain data set and target domain data set into feature extraction module to obtain source domain features and target domain features;C) feeding the source domain features, the target domain features, and the motion foreground target box into first proposal box and foreground box aggregation module (PFA1) to obtain source domain instance features and target domain instance features;D) feeding the source domain features and source domain motion foreground target box into second proposal box and foreground box aggregation module (PFA2) to obtain source domain classification regression features;E) sending the source domain classification regression features into classification regression module to calculate loss with source target boxes truth label so as to obtain optimized detection effect on source domain;F) sending the source domain instance features and target domain instance features into generative similarity measurement module (GSM) to allow the source domain instance features and target domain instance features to be as similar as possible and to improve generalization performance, thereby reducing over-fitting;G) sending the source domain features and the target domain features into global feature alignment module (GFA) to align image features so that the domains which the source domain features and target domain features belong to cannot be distinguished,wherein the first proposal box and foreground box aggregation module (PFA1) includes sub-modules for performing the following operations respectively:step S201: sending the source domain consecutive frame samples and the target domain consecutive frame samples into RPN network to generate set of source domain positive and negative proposal boxes and set of target domain positive and negative proposal boxes;step S211: selecting the source domain positive and negative proposal boxes and the target domain positive and negative proposal boxes whose confidence is greater than a preset threshold TH from the set of source domain positive and negative proposal boxes and the set of target domain positive and negative proposal boxes generated in step S201;step S202: obtaining source domain motion foreground target boxes and target domain motion foreground target boxes by the motion target detection algorithm;step S212: obtaining source domain combined target boxes and target domain combined target boxes by combining the source domain positive and negative proposal boxes and the target domain positive and negative proposal boxes with confidence greater than the preset threshold TH obtained in step S211 with the source domain motion foreground target boxes and the target domain motion foreground target boxes obtained in step S202;step S213: obtaining source domain proposal boxes and target domain proposal boxes of the first proposal box and foreground box aggregation module (PFA1) by sample equalization filter,wherein the sample equalization filter, by copying or deleting the source domain combined target boxes and the target domain combined target boxes obtained in step S212, allows that the number of source domain combined target boxes included in the ith sample in the source domain (S) and the number of the target domain combined target boxes included in the ith sample in the target domain (T) are the same, to effectively use motion foreground priori and eliminate sample imbalance,wherein said second proposal box and foreground box aggregation module (PFA2) includes sub-modules for performing the follow operations respectively:step S301: allowing the source domain consecutive frame samples to go through the RPN network to generate set of source domain positive and negative proposal boxes;step S302: obtaining the source domain motion foreground target box using the motion target detection algorithm;step S311: adding the set of source domain positive and negative proposal boxes and the source domain motion foreground target box to generate set of source domain proposal boxes of the second proposal box and foreground box aggregation module (PFA2) , to avoid that good proposal target boxes cannot be generated when the target sizes of the source domain and the target domain differ too much,wherein the generative similarity measurement module (GSM) includes sub-modules for performing the following operations respectively:step S401: intercepting the source domain instance features in the source domain features using the source domain proposal box of the first proposal box and foreground box aggregation module (PFA1) generated in step S213, and intercepting the target domain instance features in the target domain features using the target domain proposal box of the first proposal box and foreground box aggregation module (PFA1) generated in step S213;step S402: sending the source domain instance features and the target domain instance features into adaptive average pooling layer, which outputs pooling features having a size of 8*8 and a channel number that equals to the channel number of the source domain instance features;step S403: sending the pooling features obtained in step S402 into a first 1*1 convolution layer, which is a 1*1 convolution layer and which outputs first source domain convolution layer features and first target domain convolution layer features;step S404: sending the first source domain convolution layer features and first target domain convolution layer features obtained in step S403 into a first up-sampling module for performing interpolation up-sampling, convolution, and/or batch normalization layer operations, to output first source domain up-sampling layer features and first target domain up-sampling layer features;step S405, sending the output of the first up-sampling module into a second up-sampling module for performing interpolation up-sampling, convolution, and/or batch normalization layer operations, to output second source domain up-sampling layer features and second target domain up-sampling layer features;step S406, sending the output of the second up-sampling module into a third up-sampling module for performing interpolation up-sampling, convolution, and/or batch normalization layer operations, to output third source domain up-sampling layer features and third target domain up-sampling layer features;step S407, sending the output of the third upsampling module into a second 1*1 convolution layer, which is a 1*1 convolution layer, to generate source domain decoding features having a channel number of 3 and target domain decoding features having a channel number of 3 and to calculate perceptual loss of the source domain decoding features and the target domain decoding features to obtain loss L ins as:L ins=E (G (S) , G (T) ) ,where L ins is the perceptual loss value of the source domain decoding features and the target domain decoding features, E is perceptual loss calculation function, G (S) refers to the source domain decoding feature generated from the source domain instance feature by steps S402-S407, and G (T) refers to the target domain decoding feature generated from the target domain instance feature by steps S402-S407,where the global feature alignment module (GFA) includes sub-modules for performing the following operations respectively:step S501: obtaining the source domain features and the target domain features;step S502: sending the source domain features and the target domain features into a gradient reversal layer, which reverses the errors transmitted thereto so that the network training goals before and after the gradient reversal layer are opposite to each other so as to achieve the effect of confrontation, and which outputs classification features,step S503: sending the classification features into a classifier, which includes a first classifier convolution layer, a first classifier activation layer, and a second classifier convolution layer, to distinguish the source domain features and the target domain features,where the gradient reversal layer achieves a certain degree of feature alignment at image level, and the loss function of the global feature alignment module (GFA) is the loss function of the classifier.
- The scene adaptive target detection method based on motion foreground according to claim 1, wherein:the step B) includes sending the source domain consecutive frame samples and target domain consecutive frame samples into an ResNet-101 that functions as a feature extraction network, and using the last layer of the obtained features as the source domain features and the target domain features.
- The scene adaptive target detection method based on motion foreground according to claim 1, wherein:the classification regression module, using the source domain proposal boxes of the second proposal box and foreground box aggregation module (PFA2) generated in step S311, accesses a first classification regression module convolution layer to perform regression and classification of the source domain consecutive frame samples and target domain consecutive frame, the loss functions involved include classification regression loss function L T and RPN loss function L RPN, and the source domain target detection algorithm loss function L det is:L det=L RPN+L T,where L RPN, L T are RPN loss function and classification regression loss function respectively, the subscript det represents the total loss function of the classification regression module, RPN represents the loss function of the RPN of the first stage of a two-stage target detection framework, and T represents the loss function of the classification and regression of the second stage of the two-stage target detection framework.
- The scene adaptive target detection method based on motion foreground according to claim 1, wherein:the loss function L img of the global feature alignment module is a cross-entropy loss function as:where N is the number of all samples in the source domain and the target domain, i is the sample label, y i is the actual label of the sample indicating whether the sample belongs to the source domain or the target domain, p i is the probability of belonging to respective categories after passing through the classifier.
- The scene adaptive target detection method based on motion foreground according to claim 4, wherein the global loss function is:L=L det+λ 1L ins+λ 2L img,where λ 1, λ 2 take empirical values, for measuring the contribution of each of the three losses to the final loss.
- The scene adaptive target detection method based on motion foreground according to claim 1, wherein:the motion target detection algorithm includes frame difference method and/or background elimination method.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111416174.2 | 2021-11-25 | ||
CN202111416174.2A CN114399697A (en) | 2021-11-25 | 2021-11-25 | Scene self-adaptive target detection method based on moving foreground |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023092582A1 true WO2023092582A1 (en) | 2023-06-01 |
Family
ID=81225521
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2021/134085 WO2023092582A1 (en) | 2021-11-25 | 2021-11-29 | A scene adaptive target detection method based on motion foreground |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN114399697A (en) |
WO (1) | WO2023092582A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115049870A (en) * | 2022-05-07 | 2022-09-13 | 电子科技大学 | Target detection method based on small sample |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110321813A (en) * | 2019-06-18 | 2019-10-11 | 南京信息工程大学 | Cross-domain pedestrian recognition methods again based on pedestrian's segmentation |
US20200125925A1 (en) * | 2018-10-18 | 2020-04-23 | Deepnorth Inc. | Foreground Attentive Feature Learning for Person Re-Identification |
CN112183274A (en) * | 2020-09-21 | 2021-01-05 | 深圳中兴网信科技有限公司 | Mud car detection method and computer-readable storage medium |
CN113052187A (en) * | 2021-03-23 | 2021-06-29 | 电子科技大学 | Global feature alignment target detection method based on multi-scale feature fusion |
CN113052184A (en) * | 2021-03-12 | 2021-06-29 | 电子科技大学 | Target detection method based on two-stage local feature alignment |
CN113158943A (en) * | 2021-04-29 | 2021-07-23 | 杭州电子科技大学 | Cross-domain infrared target detection method |
CN113343989A (en) * | 2021-07-09 | 2021-09-03 | 中山大学 | Target detection method and system based on self-adaption of foreground selection domain |
-
2021
- 2021-11-25 CN CN202111416174.2A patent/CN114399697A/en active Pending
- 2021-11-29 WO PCT/CN2021/134085 patent/WO2023092582A1/en unknown
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200125925A1 (en) * | 2018-10-18 | 2020-04-23 | Deepnorth Inc. | Foreground Attentive Feature Learning for Person Re-Identification |
CN110321813A (en) * | 2019-06-18 | 2019-10-11 | 南京信息工程大学 | Cross-domain pedestrian recognition methods again based on pedestrian's segmentation |
CN112183274A (en) * | 2020-09-21 | 2021-01-05 | 深圳中兴网信科技有限公司 | Mud car detection method and computer-readable storage medium |
CN113052184A (en) * | 2021-03-12 | 2021-06-29 | 电子科技大学 | Target detection method based on two-stage local feature alignment |
CN113052187A (en) * | 2021-03-23 | 2021-06-29 | 电子科技大学 | Global feature alignment target detection method based on multi-scale feature fusion |
CN113158943A (en) * | 2021-04-29 | 2021-07-23 | 杭州电子科技大学 | Cross-domain infrared target detection method |
CN113343989A (en) * | 2021-07-09 | 2021-09-03 | 中山大学 | Target detection method and system based on self-adaption of foreground selection domain |
Also Published As
Publication number | Publication date |
---|---|
CN114399697A (en) | 2022-04-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | A free lunch for unsupervised domain adaptive object detection without source data | |
US11514272B2 (en) | Apparatus and method for training classification model and apparatus for performing classification by using classification model | |
WO2022077646A1 (en) | Method and apparatus for training student model for image processing | |
WO2023077821A1 (en) | Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image | |
CN113033537A (en) | Method, apparatus, device, medium and program product for training a model | |
CN115393687A (en) | RGB image semi-supervised target detection method based on double pseudo-label optimization learning | |
CN115797736B (en) | Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium | |
WO2023092582A1 (en) | A scene adaptive target detection method based on motion foreground | |
CN112966553A (en) | Strong coupling target tracking method, device, medium and equipment based on twin network | |
Jeong et al. | Enriching SAR ship detection via multistage domain alignment | |
CN115861229A (en) | YOLOv5 s-based X-ray detection method for packaging defects of components | |
Niu et al. | Boundary-aware RGBD salient object detection with cross-modal feature sampling | |
JP2023029236A (en) | Method for training object detection model and object detection method | |
Zhang et al. | An industrial interference-resistant gear defect detection method through improved YOLOv5 network using attention mechanism and feature fusion | |
Shi et al. | Anchor free remote sensing detector based on solving discrete polar coordinate equation | |
Huang et al. | Drone-based car counting via density map learning | |
Chen et al. | Small target detection algorithm for printing defects detection based on context structure perception and multi-scale feature fusion | |
Liu et al. | A coarse to fine framework for object detection in high resolution image | |
Ding et al. | DeoT: an end-to-end encoder-only Transformer object detector | |
JP2023069083A (en) | Learning apparatus, learning method, learning program, object detection apparatus, object detection method, object detection method, learning support system, learning support method, and learning support program | |
Liangjun et al. | MSFA-YOLO: A Multi-Scale SAR Ship Detection Algorithm Based on Fused Attention | |
An et al. | Enhancing Small Object Detection in Aerial Images: A Novel Approach with PCSG Model | |
Song et al. | Target representation and classification with limited data in synthetic aperture radar images | |
Zhao et al. | Refined Infrared Small Target Detection Scheme with Single-Point Supervision | |
Xu et al. | CFM-YOLOv5: CFPNet moudle and muti-target prediction head incorporating YOLOv5 for metal surface defect detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21965311 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |