WO2023092582A1 - A scene adaptive target detection method based on motion foreground - Google Patents

A scene adaptive target detection method based on motion foreground Download PDF

Info

Publication number
WO2023092582A1
WO2023092582A1 PCT/CN2021/134085 CN2021134085W WO2023092582A1 WO 2023092582 A1 WO2023092582 A1 WO 2023092582A1 CN 2021134085 W CN2021134085 W CN 2021134085W WO 2023092582 A1 WO2023092582 A1 WO 2023092582A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
domain
features
source domain
boxes
Prior art date
Application number
PCT/CN2021/134085
Other languages
French (fr)
Inventor
Haimiao Hu
Mingzhu Li
Yidan ZHANG
Hongxu Jiang
Original Assignee
Hangzhou Innovation Institute, Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Innovation Institute, Beihang University filed Critical Hangzhou Innovation Institute, Beihang University
Publication of WO2023092582A1 publication Critical patent/WO2023092582A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the invention relates to a scene adaptive target detection method based on motion foreground.
  • target detection is an important topic.
  • the task of target detection is to find the region of interest in images and videos, and to determine its categories and locations.
  • many methods based on deep learning can achieve good results on benchmark data sets.
  • the simplest and most effective way to solve this problem is to train models in the same domain. But, on the one hand, using manual annotation of datasets costs a lot of manpower and resources, on the other hand, many practical fields cannot be manually annotated. Therefore, in order to solve the degradation of model performance caused by different data distribution, the target detection method based on domain adaptive came into being.
  • target detection methods based on domain adaptive include feature-based methods and model-based methods.
  • the most classical method is to minimize the field difference of features through adversarial training, so that the domain ofproposal box features cannot be distinguished.
  • Many relevant algorithms are improved based on this algorithm. This algorithm is called DA-FasterRcnn.
  • Another class of algorithms realizes pixel level field alignment through adversarial generation.
  • the present invention provides a scene adaptive target detection method based on motion foreground.
  • a scene adaptive target detection method based on motion foreground includes the following steps:
  • the source domain dataset contains source domain RGB images, target detection manual labels and motion foreground target boxes labels;
  • the target domain dataset contains target domain RGB images and motion foreground target boxes labels;
  • step C) sending the source domain features and the target domain features from step B) into first proposal boxes and foreground boxes aggregation module to obtain the source domain instance features and the target domain instance features;
  • step D) sending the source domain features from step B) into second proposal boxes and foreground boxes aggregation module to obtain the source domain classification regression features;
  • step E) sending the source domain instance features and the target domain instance features from step C) into generating similarity measurement module to compute loss; the network is optimized, and the domain difference is reduced by this way;
  • step F) sending the source domain classification regression features from step D) into classification regression module to compute loss and to optimize the network;
  • step G) sending the source domain features and the target domain features from step B) into global feature alignment module to compute loss and to optimize the network.
  • the acquisition methods of the motion foreground target boxes in step A) includes VIBE, Gaussian Mixture Model, frame difference method and optical flow.
  • the RPN proposal boxes with high confidence is combined with the motion foreground target boxes.
  • the source domain instance features and the target domain instance features are extracted.
  • the RPN poposal boxes is combined with the motion foreground target boxes. Then, the source domain classification regression features are extracted.
  • the decoder is used to reconstruct the source domain instance features and the target domain instance features, and the decoding features are obtained. Then, the similarity loss of decoding features is calculated to achieve instance feature alignment.
  • the classification regression loss of the source domain datasets is calculated by the source target boxes truth label for ensuring the accuracy of source domain target detection.
  • the global feature alignment module includes gradient reversal layer (GRL) and classifier for achieving image level feature alignment.
  • GTL gradient reversal layer
  • the improvements of the present invention over the prior art include that a scene adaptive target detection method based on motion foreground is provided in which the prior knowledge of motion foreground is effectively utilized, that the decoder is used for feature alignment to obtain better detection effect, and that the generalization performance of the model in new scenes is improved.
  • FIG. 1 is a flow chart of a scene adaptive target detection method based on the motion foreground according to an embodiment of the invention
  • FIG. 2 is the schematic diagram of first proposal boxes and foreground boxes aggregation module (PFA1) according to an embodiment of the invention
  • FIG. 3 is the schematic diagram of second proposal boxes and foreground boxes aggregation module (PFA2) according to an embodiment of the invention
  • FIG. 4 is the schematic diagram of generative similarity measurement (GSM) module according to an embodiment of the invention.
  • FIG. 5 is the schematic diagram of global feature alignment (GFA) module according to an embodiment of the invention.
  • GFA global feature alignment
  • the source domain dataset is , where ns represents the actual number of samples in the source domain, represents the source domain sample I, represents the set of target box coordinate values for the source domain sample i, and represents the target category for the source domain sample i.
  • ns represents the actual number of samples in the source domain
  • the target domain dataset is nT represents the actual number of samples in the target domain.
  • represents the target domain sample i. represents the set ofmotion foreground target boxes coordinate values for the target domain sample i.
  • the scene adaptive target detection method based on the motion foreground trains model using the source domain data and the motion foreground data of the target domain, so that the model can have good detection effect even without the target domain (T) annotation data set; the method includes the following steps:
  • ResNet-101 for example is used as the backbone network of the feature extraction module in an embodiment of the invention
  • GSM similarity measurement
  • G sending the source domain features f1 and the target domain features f2 into global feature alignment (GFA) module, to allow that the features of source domain and target domain are similar as much as possible by training the source domain data set and target domain data set, that the network weights of feature extraction module and GFA module (gradient reversal layer (GRL) and classifier) is optimized, and that the domain of source domain feature f1 and target domain feature f2 cannot be distinguished.
  • GFA global feature alignment
  • the first proposal boxes and foreground boxes aggregation module PFA1 comprises sub-modules for performing the following steps respectively:
  • Step S201 sending the continuous frame sample set in the source domain and the continuous frame sample set in the target domain into RPN network (region proposal network) to obtain the set of positive and negative proposal boxes where represents jth proposal box in the ith image sample of source domain and target domain, C represents the number of proposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention, represents ith image sample in the source domain, and represents ith image sample in the target domain;
  • RPN network region proposal network
  • Step S202 sending the continuous frame sample set in the source domain and the continuous frame sample set in the target domain into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes and the target domain motion foreground target boxes where fb i represents the ith set of motion foreground target boxes;
  • Step S211 selecting the proposal boxes whose confidence is greater than the preset threshold TH in the set of positive and negative proposal boxes where TH is set to 0.7 in an embodiment of the invention
  • Step S212 merging the proposal boxes obtained in S211 with the motion foreground target boxes in S202;
  • fixed sample number f_num is set, which is set to 8 in an embodiment of the invention, so that the number of PFA1 proposal boxes in the ith sample in the source domain (S) is consistent with that in the ith sample in the target domain (T) , so as to eliminate sample imbalance.
  • the second proposal boxes and foreground boxes aggregation module PFA2 comprises sub-modules for performing the following steps respectively:
  • Step S301 sending the continuous frame sample set in the source domain into RPN network (region proposal network) to get the set of source domain positive and negative proposal boxes where represents the jth proposal box in the ith image sample of source domain, and C represents the number ofproposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention;
  • Step S302 sending the continuous frame sample set in the source domain into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes where fb i represents the ith set ofmotion foreground target boxes;
  • Step S311 merging the set of source domain positive and negative proposal boxes obtained in S301 with the source domain motion foreground target boxes in S302 to obtain the source domain PFA2 proposal boxes set to overcome the problem that the two domains cannot generate accurate proposal boxes when the target size difference is large by adding motion foreground target boxes, where ⁇ b ia ⁇ j represents jth set of the PFA2 proposal boxes in the ith image sample of the dataset, “a” represents the set of the proposal boxes generated in PFA2, S represents the source domain, and C Sa represents the number of boxes in the set of proposal boxes and motion foreground target boxes in the source domain.
  • the source domain feature f1 and PFA2 proposal boxes set are input into classifiers and regressors to complete regression and classification of samples and the loss function of this part is as follows:
  • L det represents the loss function of the source domain detection including L RPN (RPN loss function) and L T (classification regression loss function)
  • det refers to the name of the total loss function of the classification regression module
  • RPN refers to the name of the loss function of the first stage RPN stage of the two-stage target detection framework
  • T refers to the name of the loss function of the second stage classification regression stage of the two-stage target detection framework.
  • cross entropy loss is used for classification loss and mean square error (MSE) loss is used for regression loss.
  • the generative similarity measurement module includes sub-modules for performing the following operations respectively:
  • Step S401 sending PFA1 proposal boxes set generated by PFA1 module into the source domain feature f1 and the target domain feature f2 extracted by the feature extraction module, to obtain the source domain instance featuref S and the target domain instance feature f T ;
  • Step S402 sending the source domain instance feature f S and the target domain instance feature f T into an adaptive average pooling layer to obtain source domain pooling feature and target domain pooling feature f Ss402 , f Ts402 , which has an output size of8*8 and a number of channels that is equal to the number of channels of the source domain instance features f S ;
  • Step S403 sending the output from S402 into a first 1*1 convolution layer to obtain first source domain convolution feature and second target domain convolution feature f Ss403 , f Ts403 , which each has a number of channels of 1024 in an embodiment of the invention;
  • Step S404 sending the output from S403 into a first up-sampling layer to obtain first source domain up-sampling feature and first target domain up-sampling feature f Ss404 , f Ts404 , which each has an output size of 16*16 and a number of channels of 256 in an embodiment of the invention, where the up-sampling module comprises an up-sampling layer of interpolation, a convolution layer and a batch normalization layer;
  • Step S405 sending the output from S404 into a second up-sampling layer to obtain second source domain up-sampling feature and second target domain up-sampling feature f Ss405 , f Ts405 , which each has an output size of32*32 and a number of channels of256 in an embodiment of the invention;
  • Step S406 sending the output from S405 into a third up-sampling layer to obtain third source domain up-sampling feature and third target domain up-sampling feature f Ss406 , f Ts406 , which each has an output size of 64*64 and a number of channels of 256 in an embodiment of the invention;
  • Step S407 sending the output from S406 into a second 1*1 convolution layer to obtain source domain decoding feature and target domain decoding feature f SG , f TG , which each has a number of channels of 3 in an embodiment of the invention;
  • E represents the perceived loss, which is a loss function used to measure the similarity between images
  • L ins represents the perceived loss value of the source domain decoding feature and the target domain decoding feature f SG , f TG
  • E is the calculation function of perceived loss (the perceived loss function is the existing technology)
  • G (S) , G (T) respectively represent the source domain decoding features and the target domain decoding features f SG , f TG generated by steps S402-S407, when the source domain instance feature and the target domain instance feature f S , f T are input shared decoder G.
  • This scheme can effectively measure the similarity between the instance features of the two domains (the source domain and the target domain) .
  • the instance features of the source domain and the instance features of the target domain can be as similar as possible, and the accuracy of classification regression module in target domain can be guaranteed.
  • the use of decoder enhances the generalization performance of the model, reduces the risk of the model overfitting and reduces the failure rate of the model training.
  • the global feature alignment module GFA comprises sub-modules for performing the following operations respectively:
  • Step S501 obtaining the source domain feature f1 and the target domain feature f2generated
  • Step S502 sending the source domain feature f1 and the target domain feature f2 into GRL (gradient reversal layer) .
  • GRL gradient reversal layer
  • the loss the difference between the predicted value and the real value
  • each layer network calculates the gradient according to the transfer loss, and then updates the parameters of the layer network.
  • GRL layer inverts the errors transmitted to this layer, so that the network training objectives before and after GRL are opposite, so as to achieve the effect of confrontation;
  • Step S503 sending the source domain feature f1 and the target domain feature f2 into a classifier to distinguish the source domain feature from the target domain feature, where the classifier includes the convolution layer, activation layer, for performing the operations in Steps S511-S513 respectively;
  • loss function of global feature alignment module GFA is the loss function ofclassifierL img .
  • it is the cross entropy loss function:
  • N is the number of all samples in the source domain and target domain
  • i is the sample label number
  • y i is whether the actual label of the sample belongs to the source domain or the target domain
  • p i is the probability that the sample belongs to different categories after classifier.
  • the final global loss function is:
  • ⁇ 1 , ⁇ 2 are empirical values for measuring the contribution value of three losses to the final loss. In this embodiment, it is taken as 1.
  • the advantages of the invention include:
  • the present invention abandens the existing feature alignment mode based classifier.
  • a decoder is used to reduce over-fitting. Its loss function is the perceptual loss. The effect of the model in the target domain is greatly improved by this way.
  • sample equalization is effectively realized through sample equalization filter.
  • the inventors conducted the following experiments, in which the testing process only followed the testing process of the two-stage detection algorithm, so the speed was consistent with the conventional two-stage algorithm.
  • the trained model achieved good results in both source and target domains.
  • the source domain dataset and target domain dataset adopted in the specific implementation scheme were captured in a real world scenario. They were named DML dataset and ZN dataset respectively, where DML dataset was the source domain dataset and ZN dataset was the target domain dataset.
  • DA-FasterRCNN is an algorithm of the classical domain adaptive detection.
  • Method PFA1 is an improved method by adding first proposal boxes and foreground boxes aggregation (PFA1) module to the classical algorithm, that is, adding the motion foreground target boxes to the RPN proposal boxes. It can be seen that the method of the invention effectively improved the detection effect in the target domain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

With the development of deep learning technology, the requirements for model generalization performance in real environment is increasing gradually. The influence of illumination and background on model generalization performance has been widely concerned. The invention discloses a scene adaptive target detection method based on motion foreground. In this method, we use the motion foreground target boxes effectively by using a prior of the consistency of motion foreground and global target data distribution, and calculate the instance feature similarity through the decoder, which greatly improves the effect of the model in the target domain. The experimental results show that the target detection effect of the method is greatly improved in the real environment.

Description

A scene adaptive target detection method based on motion foreground Field of the invention
The invention relates to a scene adaptive target detection method based on motion foreground.
Background art
In the field of computer vision, target detection is an important topic. The task of target detection is to find the region of interest in images and videos, and to determine its categories and locations. At present, many methods based on deep learning can achieve good results on benchmark data sets. However, due to the existence of domain differences, when the target size, camera angle, lighting and background environment change, the effect of the model has been reduced. The simplest and most effective way to solve this problem is to train models in the same domain. But, on the one hand, using manual annotation of datasets costs a lot of manpower and resources, on the other hand, many practical fields cannot be manually annotated. Therefore, in order to solve the degradation of model performance caused by different data distribution, the target detection method based on domain adaptive came into being.
At present, target detection methods based on domain adaptive include feature-based methods and model-based methods. The most classical method is to minimize the field difference of features through adversarial training, so that the domain ofproposal box features cannot be distinguished. Many relevant algorithms are improved based on this algorithm. This algorithm is called DA-FasterRcnn. Another class of algorithms realizes pixel level field alignment through adversarial generation.
But in the above algorithms only the domain differences in classification is considered, while the domain differences in regression is not considered, leading to that the model effect is not ideal when the scene changes. In addition, due to that data distribution is not known, suitable proposal boxes cannot be extracted in the one-stage extraction of candidate regions RPN stage of two-stage target detection, and it is also impossible to determine which regions of features need to be aligned during feature alignment stage.
Summary of the invention
To solve the above technical problems in the prior art, the present invention provides a scene adaptive target detection method based on motion foreground.
According to one aspect of the invention, a scene adaptive target detection method based on motion foreground is provided, which includes the following steps:
A) Obtaining the source domain dataset and the target domain dataset; the source domain dataset contains source domain RGB images, target detection manual labels and motion foreground target boxes labels; the target domain dataset contains target domain RGB images and motion foreground target boxes labels;
B) sending the source domain dataset and the target domain dataset into the feature extraction module to obtain the source domain features and the target domain features;
C) sending the source domain features and the target domain features from step B) into first proposal boxes and foreground boxes aggregation module to obtain the source domain instance features and the target domain instance features;
D) sending the source domain features from step B) into second proposal boxes and foreground boxes aggregation module to obtain the source domain classification regression features;
E) sending the source domain instance features and the target domain instance features from step C) into generating similarity measurement module to compute loss; the network is optimized, and the domain difference is reduced by this way;
F) sending the source domain classification regression features from step D) into classification regression module to compute loss and to optimize the network;
G) sending the source domain features and the target domain features from step B) into global feature alignment module to compute loss and to optimize the network.
The acquisition methods of the motion foreground target boxes in step A) includes VIBE, Gaussian Mixture Model, frame difference method and optical flow.
In the training process, the RPN proposal boxes with high confidence is combined with the motion foreground target boxes. After the samples is equalized, the source domain instance features and the target domain instance features are extracted.
In the training process, the RPN poposal boxes is combined with the  motion foreground target boxes. Then, the source domain classification regression features are extracted.
In the training process, the decoder is used to reconstruct the source domain instance features and the target domain instance features, and the decoding features are obtained. Then, the similarity loss of decoding features is calculated to achieve instance feature alignment.
In the training process, the classification regression loss of the source domain datasets is calculated by the source target boxes truth label for ensuring the accuracy of source domain target detection.
The global feature alignment module includes gradient reversal layer (GRL) and classifier for achieving image level feature alignment.
The improvements of the present invention over the prior art include that a scene adaptive target detection method based on motion foreground is provided in which the prior knowledge of motion foreground is effectively utilized, that the decoder is used for feature alignment to obtain better detection effect, and that the generalization performance of the model in new scenes is improved.
Brief description of the drawings
FIG. 1 is a flow chart of a scene adaptive target detection method based on the motion foreground according to an embodiment of the invention;
FIG. 2 is the schematic diagram of first proposal boxes and foreground boxes aggregation module (PFA1) according to an embodiment of the invention;
FIG. 3 is the schematic diagram of second proposal boxes and foreground boxes aggregation module (PFA2) according to an embodiment of the invention;
FIG. 4 is the schematic diagram of generative similarity measurement (GSM) module according to an embodiment of the invention;
FIG. 5 is the schematic diagram of global feature alignment (GFA) module according to an embodiment of the invention.
Detailed description of the invention
In the embodiment according to the present invention as shown in Fig. 1, the source domain dataset is
Figure PCTCN2021134085-appb-000001
, where ns represents the actual number of samples in the source domain, 
Figure PCTCN2021134085-appb-000002
represents the source domain sample I, 
Figure PCTCN2021134085-appb-000003
represents the set of target box coordinate values for the source domain sample i, and
Figure PCTCN2021134085-appb-000004
represents  the target category for the source domain sample i. There is only a single category of pedestrian in this embodiment. 
Figure PCTCN2021134085-appb-000005
represents the set of motion foreground target boxes coordinate values for the source domain sample i. The num ofboxes in
Figure PCTCN2021134085-appb-000006
and the number of boxes in
Figure PCTCN2021134085-appb-000007
are inconsistent. The target domain dataset is
Figure PCTCN2021134085-appb-000008
nT represents the actual number of samples in the target domain. 
Figure PCTCN2021134085-appb-000009
represents the target domain sample i. 
Figure PCTCN2021134085-appb-000010
represents the set ofmotion foreground target boxes coordinate values for the target domain sample i.
The scene adaptive target detection method based on the motion foreground according to the present invention trains model using the source domain data and the motion foreground data of the target domain, so that the model can have good detection effect even without the target domain (T) annotation data set; the method includes the following steps:
A) sending the continuous frame sample set in the source domain 
Figure PCTCN2021134085-appb-000011
and the continuous frame sample set in the target domain 
Figure PCTCN2021134085-appb-000012
into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes and the target domain motion foreground target boxes
Figure PCTCN2021134085-appb-000013
 (S represents the source domain and T represents the target domain) , so the source domain dataset D S and the target domain dataset D T are obtained;
B) sending the source domain dataset D S and the target domain dataset D T into the feature extraction module (S101) to obtain the source domain features f1 and the target domain features f2; ResNet-101 for example is used as the backbone network of the feature extraction module in an embodiment of the invention;
C) sending the source domain features f1 and the source domain motion foreground target boxes
Figure PCTCN2021134085-appb-000014
into PFA1 to obtain the source domain instance features pfs (S112) , and sending the target domain features f2 and the target domain motion foreground target boxes
Figure PCTCN2021134085-appb-000015
into PFA1 to obtain the target domain instance features pft (S113) ;
D) sending the source domain features f1 and the source domain motion foreground target boxes
Figure PCTCN2021134085-appb-000016
into PFA2 to obtain the source domain classification regression features crs (S112) ;
E) sending the source domain classification regression features crs into classification regression module (S121) , where the classification regression loss of the source domain dataset is calculated by the source target boxes truth label, so that the  network weights of feature extraction module and classification regression module are optimized by training the source domain dataset;
F) sending the source domain instance features pfs and the target domain instance features pft into generating similarity measurement (GSM) module (S122) , to allow that the instance features of source domain and target domain are as similar as possible, by training the source domain dataset and target domain dataset, that the network weights of feature extraction module and GSM module are optimized, and that the generalization performance of the model is improved;
G) sending the source domain features f1 and the target domain features f2 into global feature alignment (GFA) module, to allow that the features of source domain and target domain are similar as much as possible by training the source domain data set and target domain data set, that the network weights of feature extraction module and GFA module (gradient reversal layer (GRL) and classifier) is optimized, and that the domain of source domain feature f1 and target domain feature f2 cannot be distinguished.
According to a further aspect of the invention, as shown in FIG. 2, the first proposal boxes and foreground boxes aggregation module PFA1 comprises sub-modules for performing the following steps respectively:
Step S201: sending the continuous frame sample set in the source domain
Figure PCTCN2021134085-appb-000017
and the continuous frame sample set in the target domain 
Figure PCTCN2021134085-appb-000018
into RPN network (region proposal network) to obtain the set of positive and negative proposal boxes
Figure PCTCN2021134085-appb-000019
where
Figure PCTCN2021134085-appb-000020
represents jth proposal box in the ith image sample of source domain and target domain, C represents the number of proposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention, 
Figure PCTCN2021134085-appb-000021
represents ith image sample in the source domain, and
Figure PCTCN2021134085-appb-000022
represents ith image sample in the target domain;
Step S202: sending the continuous frame sample set in the source domain
Figure PCTCN2021134085-appb-000023
and the continuous frame sample set in the target domain 
Figure PCTCN2021134085-appb-000024
into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes and the target domain motion foreground target boxes
Figure PCTCN2021134085-appb-000025
where fb i represents the ith set of motion foreground target boxes;
Step S211: selecting the proposal boxes whose confidence is greater than the preset threshold TH in the set of positive and negative proposal boxes
Figure PCTCN2021134085-appb-000026
where TH is set to 0.7 in an embodiment of the invention;
Step S212: merging the proposal boxes obtained in S211 with the motion foreground target boxes
Figure PCTCN2021134085-appb-000027
in S202;
Step S213: sending the output of S212 into sample equalization filter to obtain the source domain PFA1 proposal boxes set and the target domain PFA1 proposal boxes set
Figure PCTCN2021134085-appb-000028
where {b ifjrepresents jth set of the PFA1 proposal boxes in the ith image sample of the dataset, f represents the set of the proposal boxes generated in PFA1, S represents the source domain, T represents the target domain, C Sf, C Tf respectively represent the number of boxes in the set of proposal boxes and motion foreground target boxes in the source domain and the number of boxes in the set ofproposal boxes and motion foreground target boxes in the target domain, and C Sf=C Tf.
In the sample equalization filter, fixed sample number f_num is set, which is set to 8 in an embodiment of the invention, so that the number of PFA1 proposal boxes in the ith sample in the source domain (S) is consistent with that in the ith sample in the target domain (T) , so as to eliminate sample imbalance.
According to a further aspect of the invention, as shown in FIG. 3, the second proposal boxes and foreground boxes aggregation module PFA2 comprises sub-modules for performing the following steps respectively:
Step S301: sending the continuous frame sample set in the source domain
Figure PCTCN2021134085-appb-000029
into RPN network (region proposal network) to get the set of source domain positive and negative proposal boxes
Figure PCTCN2021134085-appb-000030
where
Figure PCTCN2021134085-appb-000031
represents the jth proposal box in the ith image sample of source domain, and C represents the number ofproposal boxes that generated in the RPN network, which is set to 64 in an embodiment of the invention;
Step S302: sending the continuous frame sample set in the source domain
Figure PCTCN2021134085-appb-000032
into Vibe motion target detection algorithm to obtain the source domain motion foreground target boxes
Figure PCTCN2021134085-appb-000033
where fb i represents the ith set ofmotion foreground target boxes;
Step S311: merging the set of source domain positive and negative proposal boxes obtained in S301
Figure PCTCN2021134085-appb-000034
with the source domain motion foreground target boxes
Figure PCTCN2021134085-appb-000035
in S302 to obtain the source domain PFA2 proposal boxes set
Figure PCTCN2021134085-appb-000036
to overcome the problem that the two domains cannot generate accurate proposal boxes when the target size difference is large by adding motion foreground target boxes, where {b iaj represents jth set of the PFA2 proposal boxes in the ith image sample of the dataset, “a” represents the set of the proposal boxes generated in PFA2, S  represents the source domain, and C Sa represents the number of boxes in the set of proposal boxes and motion foreground target boxes in the source domain.
According to a further aspect of the invention, in the classification and regression module S121, the source domain feature f1 and PFA2 proposal boxes set
Figure PCTCN2021134085-appb-000037
are input into classifiers and regressors to complete regression and classification of samples and the loss function of this part is as follows:
L det=L RPN+L T,
where L det represents the loss function of the source domain detection including L RPN (RPN loss function) and L T (classification regression loss function) ,  det refers to the name of the total loss function of the classification regression module, RPN refers to the name of the loss function of the first stage RPN stage of the two-stage target detection framework, and T refers to the name of the loss function of the second stage classification regression stage of the two-stage target detection framework. In an embodiment, cross entropy loss is used for classification loss and mean square error (MSE) loss is used for regression loss.
According to a further aspect of the invention, as shown in FIG. 4, the generative similarity measurement module includes sub-modules for performing the following operations respectively:
Step S401: sending PFA1 proposal boxes set
Figure PCTCN2021134085-appb-000038
generated by PFA1 module into the source domain feature f1 and the target domain feature f2 extracted by the feature extraction module, to obtain the source domain instance featuref S and the target domain instance feature f T;
Step S402: sending the source domain instance feature f S and the target domain instance feature f T into an adaptive average pooling layer to obtain source domain pooling feature and target domain pooling feature f Ss402, f Ts402, which has an output size of8*8 and a number of channels that is equal to the number of channels of the source domain instance features f S;
Step S403: sending the output from S402 into a first 1*1 convolution layer to obtain first source domain convolution feature and second target domain convolution feature f Ss403, f Ts403, which each has a number of channels of 1024 in an embodiment of the invention;
Step S404: sending the output from S403 into a first up-sampling layer to obtain first source domain up-sampling feature and first target  domain up-sampling feature f Ss404, f Ts404, which each has an output size of 16*16 and a number of channels of 256 in an embodiment of the invention, where the up-sampling module comprises an up-sampling layer of interpolation, a convolution layer and a batch normalization layer;
Step S405: sending the output from S404 into a second up-sampling layer to obtain second source domain up-sampling feature and second target domain up-sampling feature f Ss405, f Ts405, which each has an output size of32*32 and a number of channels of256 in an embodiment of the invention;
Step S406: sending the output from S405 into a third up-sampling layer to obtain third source domain up-sampling feature and third target domain up-sampling feature f Ss406, f Ts406, which each has an output size of 64*64 and a number of channels of 256 in an embodiment of the invention;
Step S407: sending the output from S406 into a second 1*1 convolution layer to obtain source domain decoding feature and target domain decoding feature f SG, f TG, which each has a number of channels of 3 in an embodiment of the invention;
computing the perceived loss of the source domain decoding feature and the target domain decoding feature f SG, f TG to obtain loss L ins:
L ins=E (G (S) , G (T) ) ,
where E represents the perceived loss, which is a loss function used to measure the similarity between images; L ins represents the perceived loss value of the source domain decoding feature and the target domain decoding feature f SG, f TG; E is the calculation function of perceived loss (the perceived loss function is the existing technology) ; G (S) , G (T) respectively represent the source domain decoding features and the target domain decoding features f SG, f TG generated by steps S402-S407, when the source domain instance feature and the target domain instance feature f S, f T are input shared decoder G. This scheme can effectively measure the similarity between the instance features of the two domains (the source domain and the target domain) . Through the training of feature extraction module and GSM module, the instance features of the source domain and the instance features of the target domain can be as similar as possible, and the accuracy of classification regression module in target domain can be guaranteed. And, the use of decoder enhances the generalization performance of the model, reduces the risk of the model overfitting and reduces the failure rate of the model training.
According to a further aspect of the invention, as shown in Figure 5, the global feature alignment module GFA comprises sub-modules for performing the following operations respectively:
Step S501: obtaining the source domain feature f1 and the target domain feature f2generated;
Step S502: sending the source domain feature f1 and the target domain feature f2 into GRL (gradient reversal layer) . In conventional back propagation, the loss (the difference between the predicted value and the real value) is passed forward layer by layer, and each layer network calculates the gradient according to the transfer loss, and then updates the parameters of the layer network. GRL layer inverts the errors transmitted to this layer, so that the network training objectives before and after GRL are opposite, so as to achieve the effect of confrontation;
Step S503: sending the source domain feature f1 and the target domain feature f2 into a classifier to distinguish the source domain feature from the target domain feature, where the classifier includes the convolution layer, activation layer, for performing the operations in Steps S511-S513 respectively;
where the loss function of global feature alignment module GFA is the loss function ofclassifierL img. In an embodiment of the invention, it is the cross entropy loss function:
Figure PCTCN2021134085-appb-000039
where N is the number of all samples in the source domain and target domain, i is the sample label number, y i is whether the actual label of the sample belongs to the source domain or the target domain, and p i is the probability that the sample belongs to different categories after classifier.
In an embodiment of the invention, the final global loss function is:
L=L det1L ins+` 2L img,
whereλ 1, λ 2 are empirical values for measuring the contribution value of three losses to the final loss. In this embodiment, it is taken as 1.
The advantages of the invention include:
(1) The priori ofmotion foreground is fully utilized in the invention and is well integrated into the training framework. By using FPA1 and FPA2, the proposal boxes extracted from RPN network are effectively fused with the motion foreground target boxes, so that the effect of the model is optimized by complementing each other of the two kinds of proposal boxes.
(2) In order to reduce the risk ofmodel overfitting and improve the accuracy of target box regression, the present invention abandens the  existing feature alignment mode based classifier. In the case of instance feature alignment, a decoder is used to reduce over-fitting. Its loss function is the perceptual loss. The effect of the model in the target domain is greatly improved by this way.
(3) In the fusion ofproposal boxes extracted from RPN network and motion foreground target boxes, sample equalization is effectively realized through sample equalization filter.
In order to verify the validity of the method, the inventors conducted the following experiments, in which the testing process only followed the testing process of the two-stage detection algorithm, so the speed was consistent with the conventional two-stage algorithm. By adding some components during model training process, the trained model achieved good results in both source and target domains.
The source domain dataset and target domain dataset adopted in the specific implementation scheme were captured in a real world scenario. They were named DML dataset and ZN dataset respectively, where DML dataset was the source domain dataset and ZN dataset was the target domain dataset.
Experimental details: In all experiments, the parameters were consistent with the original DA-FasterRCNN algorithm. ResNET-50 was used for the backbone network, and ImageNet's pre-training weight was used for the initialization of the backbone network. After training 70,000 images, the average precision (map) of the target domain was calculated. All experiments were based on the PyTorch framework, the NVIDIA GTX-2080TI hardware platform was used.
The comparison diagram of experimental results are showed in Table 1. DA-FasterRCNN is an algorithm of the classical domain adaptive detection. Method PFA1 is an improved method by adding first proposal boxes and foreground boxes aggregation (PFA1) module to the classical algorithm, that is, adding the motion foreground target boxes to the RPN proposal boxes. It can be seen that the method of the invention effectively improved the detection effect in the target domain.
Table 1: The Result ofdomain adaptive detection
Figure PCTCN2021134085-appb-000040

Claims (6)

  1. A scene adaptive target detecting method based on motion foreground, for training model based on source domain data and target domain foreground data to allow that the model has a good detection effect also in target domain, including the steps of:
    A) feeding set of source domain consecutive frame samples and set of target domain consecutive frame samples into motion target detection algorithm to obtain output of motion foreground target box of the source domain consecutive frame samples and motion foreground target box of the target domain consecutive frame samples, wherein the motion foreground target boxes forms source domain data set and target domain data set together with source domain labeling tags;
    B) feeding the source domain data set and target domain data set into feature extraction module to obtain source domain features and target domain features;
    C) feeding the source domain features, the target domain features, and the motion foreground target box into first proposal box and foreground box aggregation module (PFA1) to obtain source domain instance features and target domain instance features;
    D) feeding the source domain features and source domain motion foreground target box into second proposal box and foreground box aggregation module (PFA2) to obtain source domain classification regression features;
    E) sending the source domain classification regression features into classification regression module to calculate loss with source target boxes truth label so as to obtain optimized detection effect on source domain;
    F) sending the source domain instance features and target domain instance features into generative similarity measurement module (GSM) to allow the source domain instance features and target domain instance features to be as similar as possible and to improve generalization performance, thereby reducing over-fitting;
    G) sending the source domain features and the target domain features into global feature alignment module (GFA) to align image features so that the domains which the source domain features and target domain features belong to cannot be distinguished,
    wherein the first proposal box and foreground box aggregation module (PFA1) includes sub-modules for performing the following  operations respectively:
    step S201: sending the source domain consecutive frame samples and the target domain consecutive frame samples into RPN network to generate set of source domain positive and negative proposal boxes and set of target domain positive and negative proposal boxes;
    step S211: selecting the source domain positive and negative proposal boxes and the target domain positive and negative proposal boxes whose confidence is greater than a preset threshold TH from the set of source domain positive and negative proposal boxes and the set of target domain positive and negative proposal boxes generated in step S201;
    step S202: obtaining source domain motion foreground target boxes and target domain motion foreground target boxes by the motion target detection algorithm;
    step S212: obtaining source domain combined target boxes and target domain combined target boxes by combining the source domain positive and negative proposal boxes and the target domain positive and negative proposal boxes with confidence greater than the preset threshold TH obtained in step S211 with the source domain motion foreground target boxes and the target domain motion foreground target boxes obtained in step S202;
    step S213: obtaining source domain proposal boxes and target domain proposal boxes of the first proposal box and foreground box aggregation module (PFA1) by sample equalization filter,
    wherein the sample equalization filter, by copying or deleting the source domain combined target boxes and the target domain combined target boxes obtained in step S212, allows that the number of source domain combined target boxes included in the ith sample in the source domain (S) and the number of the target domain combined target boxes included in the ith sample in the target domain (T) are the same, to effectively use motion foreground priori and eliminate sample imbalance,
    wherein said second proposal box and foreground box aggregation module (PFA2) includes sub-modules for performing the follow operations respectively:
    step S301: allowing the source domain consecutive frame samples to go through the RPN network to generate set of source domain positive and negative proposal boxes;
    step S302: obtaining the source domain motion foreground target box using the motion target detection algorithm;
    step S311: adding the set of source domain positive and negative proposal boxes and the source domain motion foreground target box to generate set of source domain proposal boxes of the second proposal box and foreground box aggregation module (PFA2) , to avoid that good  proposal target boxes cannot be generated when the target sizes of the source domain and the target domain differ too much,
    wherein the generative similarity measurement module (GSM) includes sub-modules for performing the following operations respectively:
    step S401: intercepting the source domain instance features in the source domain features using the source domain proposal box of the first proposal box and foreground box aggregation module (PFA1) generated in step S213, and intercepting the target domain instance features in the target domain features using the target domain proposal box of the first proposal box and foreground box aggregation module (PFA1) generated in step S213;
    step S402: sending the source domain instance features and the target domain instance features into adaptive average pooling layer, which outputs pooling features having a size of 8*8 and a channel number that equals to the channel number of the source domain instance features;
    step S403: sending the pooling features obtained in step S402 into a first 1*1 convolution layer, which is a 1*1 convolution layer and which outputs first source domain convolution layer features and first target domain convolution layer features;
    step S404: sending the first source domain convolution layer features and first target domain convolution layer features obtained in step S403 into a first up-sampling module for performing interpolation up-sampling, convolution, and/or batch normalization layer operations, to output first source domain up-sampling layer features and first target domain up-sampling layer features;
    step S405, sending the output of the first up-sampling module into a second up-sampling module for performing interpolation up-sampling, convolution, and/or batch normalization layer operations, to output second source domain up-sampling layer features and second target domain up-sampling layer features;
    step S406, sending the output of the second up-sampling module into a third up-sampling module for performing interpolation up-sampling, convolution, and/or batch normalization layer operations, to output third source domain up-sampling layer features and third target domain up-sampling layer features;
    step S407, sending the output of the third upsampling module into a second 1*1 convolution layer, which is a 1*1 convolution layer, to generate source domain decoding features having a channel number of 3 and target domain decoding features having a channel number of 3 and to calculate perceptual loss of the source domain decoding features and the target domain decoding features to obtain loss L ins as:
    L ins=E (G (S) , G (T) ) ,
    where L ins is the perceptual loss value of the source domain decoding features and the target domain decoding features, E is perceptual loss calculation function, G (S) refers to the source domain decoding feature generated from the source domain instance feature by steps S402-S407, and G (T) refers to the target domain decoding feature generated from the target domain instance feature by steps S402-S407,
    where the global feature alignment module (GFA) includes sub-modules for performing the following operations respectively:
    step S501: obtaining the source domain features and the target domain features;
    step S502: sending the source domain features and the target domain features into a gradient reversal layer, which reverses the errors transmitted thereto so that the network training goals before and after the gradient reversal layer are opposite to each other so as to achieve the effect of confrontation, and which outputs classification features,
    step S503: sending the classification features into a classifier, which includes a first classifier convolution layer, a first classifier activation layer, and a second classifier convolution layer, to distinguish the source domain features and the target domain features,
    where the gradient reversal layer achieves a certain degree of feature alignment at image level, and the loss function of the global feature alignment module (GFA) is the loss function of the classifier.
  2. The scene adaptive target detection method based on motion foreground according to claim 1, wherein:
    the step B) includes sending the source domain consecutive frame samples and target domain consecutive frame samples into an ResNet-101 that functions as a feature extraction network, and using the last layer of the obtained features as the source domain features and the target domain features.
  3. The scene adaptive target detection method based on motion foreground according to claim 1, wherein:
    the classification regression module, using the source domain proposal boxes of the second proposal box and foreground box aggregation module (PFA2) generated in step S311, accesses a first classification regression module convolution layer to perform regression and classification of the source domain consecutive frame samples and target domain consecutive frame, the loss functions involved include classification regression loss function L T and RPN loss function L RPN, and the source domain target detection algorithm loss function L det is:
    L det=L RPN+L T,
    where L RPN, L T are RPN loss function and classification regression loss function respectively, the subscript  det represents the total loss function of the classification regression module, RPN represents the loss function of the RPN of the first stage of a two-stage target detection framework, and T represents the loss function of the classification and regression of the second stage of the two-stage target detection framework.
  4. The scene adaptive target detection method based on motion foreground according to claim 1, wherein:
    the loss function L img of the global feature alignment module is a cross-entropy loss function as:
    Figure PCTCN2021134085-appb-100001
    where N is the number of all samples in the source domain and the target domain, i is the sample label, y i is the actual label of the sample indicating whether the sample belongs to the source domain or the target domain, p i is the probability of belonging to respective categories after passing through the classifier.
  5. The scene adaptive target detection method based on motion foreground according to claim 4, wherein the global loss function is:
    L=L det1L ins2L img,
    where λ 1, λ 2 take empirical values, for measuring the contribution of each of the three losses to the final loss.
  6. The scene adaptive target detection method based on motion foreground according to claim 1, wherein:
    the motion target detection algorithm includes frame difference method and/or background elimination method.
PCT/CN2021/134085 2021-11-25 2021-11-29 A scene adaptive target detection method based on motion foreground WO2023092582A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111416174.2 2021-11-25
CN202111416174.2A CN114399697A (en) 2021-11-25 2021-11-25 Scene self-adaptive target detection method based on moving foreground

Publications (1)

Publication Number Publication Date
WO2023092582A1 true WO2023092582A1 (en) 2023-06-01

Family

ID=81225521

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/134085 WO2023092582A1 (en) 2021-11-25 2021-11-29 A scene adaptive target detection method based on motion foreground

Country Status (2)

Country Link
CN (1) CN114399697A (en)
WO (1) WO2023092582A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115049870A (en) * 2022-05-07 2022-09-13 电子科技大学 Target detection method based on small sample

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110321813A (en) * 2019-06-18 2019-10-11 南京信息工程大学 Cross-domain pedestrian recognition methods again based on pedestrian's segmentation
US20200125925A1 (en) * 2018-10-18 2020-04-23 Deepnorth Inc. Foreground Attentive Feature Learning for Person Re-Identification
CN112183274A (en) * 2020-09-21 2021-01-05 深圳中兴网信科技有限公司 Mud car detection method and computer-readable storage medium
CN113052187A (en) * 2021-03-23 2021-06-29 电子科技大学 Global feature alignment target detection method based on multi-scale feature fusion
CN113052184A (en) * 2021-03-12 2021-06-29 电子科技大学 Target detection method based on two-stage local feature alignment
CN113158943A (en) * 2021-04-29 2021-07-23 杭州电子科技大学 Cross-domain infrared target detection method
CN113343989A (en) * 2021-07-09 2021-09-03 中山大学 Target detection method and system based on self-adaption of foreground selection domain

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200125925A1 (en) * 2018-10-18 2020-04-23 Deepnorth Inc. Foreground Attentive Feature Learning for Person Re-Identification
CN110321813A (en) * 2019-06-18 2019-10-11 南京信息工程大学 Cross-domain pedestrian recognition methods again based on pedestrian's segmentation
CN112183274A (en) * 2020-09-21 2021-01-05 深圳中兴网信科技有限公司 Mud car detection method and computer-readable storage medium
CN113052184A (en) * 2021-03-12 2021-06-29 电子科技大学 Target detection method based on two-stage local feature alignment
CN113052187A (en) * 2021-03-23 2021-06-29 电子科技大学 Global feature alignment target detection method based on multi-scale feature fusion
CN113158943A (en) * 2021-04-29 2021-07-23 杭州电子科技大学 Cross-domain infrared target detection method
CN113343989A (en) * 2021-07-09 2021-09-03 中山大学 Target detection method and system based on self-adaption of foreground selection domain

Also Published As

Publication number Publication date
CN114399697A (en) 2022-04-26

Similar Documents

Publication Publication Date Title
Li et al. A free lunch for unsupervised domain adaptive object detection without source data
US11514272B2 (en) Apparatus and method for training classification model and apparatus for performing classification by using classification model
WO2022077646A1 (en) Method and apparatus for training student model for image processing
WO2023077821A1 (en) Multi-resolution ensemble self-training-based target detection method for small-sample low-quality image
CN113033537A (en) Method, apparatus, device, medium and program product for training a model
CN115393687A (en) RGB image semi-supervised target detection method based on double pseudo-label optimization learning
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
WO2023092582A1 (en) A scene adaptive target detection method based on motion foreground
CN112966553A (en) Strong coupling target tracking method, device, medium and equipment based on twin network
Jeong et al. Enriching SAR ship detection via multistage domain alignment
CN115861229A (en) YOLOv5 s-based X-ray detection method for packaging defects of components
Niu et al. Boundary-aware RGBD salient object detection with cross-modal feature sampling
JP2023029236A (en) Method for training object detection model and object detection method
Zhang et al. An industrial interference-resistant gear defect detection method through improved YOLOv5 network using attention mechanism and feature fusion
Shi et al. Anchor free remote sensing detector based on solving discrete polar coordinate equation
Huang et al. Drone-based car counting via density map learning
Chen et al. Small target detection algorithm for printing defects detection based on context structure perception and multi-scale feature fusion
Liu et al. A coarse to fine framework for object detection in high resolution image
Ding et al. DeoT: an end-to-end encoder-only Transformer object detector
JP2023069083A (en) Learning apparatus, learning method, learning program, object detection apparatus, object detection method, object detection method, learning support system, learning support method, and learning support program
Liangjun et al. MSFA-YOLO: A Multi-Scale SAR Ship Detection Algorithm Based on Fused Attention
An et al. Enhancing Small Object Detection in Aerial Images: A Novel Approach with PCSG Model
Song et al. Target representation and classification with limited data in synthetic aperture radar images
Zhao et al. Refined Infrared Small Target Detection Scheme with Single-Point Supervision
Xu et al. CFM-YOLOv5: CFPNet moudle and muti-target prediction head incorporating YOLOv5 for metal surface defect detection

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21965311

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE