CN113052184A - Target detection method based on two-stage local feature alignment - Google Patents

Target detection method based on two-stage local feature alignment Download PDF

Info

Publication number
CN113052184A
CN113052184A CN202110270152.3A CN202110270152A CN113052184A CN 113052184 A CN113052184 A CN 113052184A CN 202110270152 A CN202110270152 A CN 202110270152A CN 113052184 A CN113052184 A CN 113052184A
Authority
CN
China
Prior art keywords
feature
training
point
domain
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110270152.3A
Other languages
Chinese (zh)
Other versions
CN113052184B (en
Inventor
贾海涛
莫超杰
鲜维富
刘博文
许文波
任利
周焕来
贾宇明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110270152.3A priority Critical patent/CN113052184B/en
Publication of CN113052184A publication Critical patent/CN113052184A/en
Application granted granted Critical
Publication of CN113052184B publication Critical patent/CN113052184B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a target detection technology based on two-stage local feature alignment. The method can further enhance the generalization performance of a target detection algorithm represented by Faster R-CNN in different application scenes. The prior target detection technology based on feature alignment generally has two problems: firstly, the foreground target characteristic region is inaccurately positioned in the early stage of network training; secondly, feature alignment is not carried out according to foreground object classification, and the fine granularity of the algorithm is low. According to the two-stage feature alignment method, in the first stage of training, a standard candidate frame generated by a central feature point diagram is used for solving the problem that the foreground target feature area is inaccurately positioned; and in the second stage of training, performing feature alignment on the foreground target features according to the classification result, and improving the fine granularity of the algorithm. The method provided by the invention has a simple network structure and can be transplanted to other target detection algorithms.

Description

Target detection method based on two-stage local feature alignment
Technical Field
The invention relates to the field of transfer learning in deep learning, and aims at application of a sub-class technology of transfer learning, namely feature alignment (domain adaptation), in a target detection task.
Background
In recent years, deep learning has been developed in a breakthrough in the field of object detection, in which a one-stage object detection network representative SSD, YOLO series, etc. and a two-stage object detection network representative fast R-CNN are excellent in the object detection task. But two conditions are needed for the excellent performance of the target detection network, namely, a data set with larger sample number and perfect label is provided for the training of the target detection network; secondly, the feature spaces of the training scene and the application scene are independently and identically distributed or approximately similar in feature distribution. However, in practical applications, the application scenario (referred to as a target domain in the transfer learning) and the training scenario (referred to as a source domain in the transfer learning) corresponding to the network training sample are often different. For example, a vehicle detector is trained, a training set used is a vehicle image sample shot on a road in a clear weather, the network training is finished, and the performance is excellent in the clear weather, but the detection performance is greatly reduced in rainy days and foggy days. In summary, when the feature distribution difference between the source domain and the target domain is large, that is, the inter-domain difference is large, the deep neural network performance is greatly reduced. This indicates that the generalization performance of deep neural networks is generally poor, and the same is true in the field of target detection. Although a special data set is established for a specific application scenario, it is very costly to establish an effective data set, and not only a huge number of samples need to be collected, but also manual labeling needs to be performed on the samples. And in some specific application scenes, such as enemy military facilities, medical images and the like, it is difficult to collect enough samples for network training.
Inspired by the ability of human beings to hold up one and three in the process of learning knowledge, the migration learning migrates the knowledge from the source domain data set to the target domain, so that when the target detection network trained on the source domain data set is applied to the target domain with a different feature space from the source domain, the generalization performance of the target detection algorithm can be improved with low cost. The "knowledge" of the migration learning migration is commonly owned by the source domain and the target domain. In the current transfer learning algorithm, the effect of a feature alignment method (domain adaptation) is the best, and the core idea is to reduce inter-domain differences, so that features extracted by a feature extractor of a target detection network have domain invariance, that is, the feature extractor can ignore the differences of a source domain and a target domain in the aspects of backgrounds and the like and extract common feature parts in the two domains. The existing target detection algorithm based on feature alignment adopts an Faster R-CNN network as a target detection framework, and the methods have the following two problems in the feature alignment process: 1) selecting a foreground target area by using a candidate frame generated by RPN, wherein the accuracy of the RPN candidate frame is low in the early stage of training, and a characteristic area corresponding to the candidate frame contains a large amount of background noise, so that the characteristic alignment is negatively influenced; 2) the feature alignment does not perform feature alignment on the foreground target according to classification, the feature alignment is performed on the foreground target with larger difference, the feature alignment effect is also influenced, and the fine granularity of the algorithm is lower. Due to the above disadvantages, the performance improvement of the existing target detection algorithm based on feature migration is always limited. The invention aims to solve the defects of the existing algorithm, and further increases the feature transfer learning as the improvement of the generalization performance of the target detection algorithm.
Disclosure of Invention
In order to overcome the defects of the existing algorithm, the invention provides a target detection algorithm based on two-stage local feature alignment. The algorithm takes Faster R-CNN as a target detection frame, and solves the problems of low positioning precision of an RPN candidate frame at the early stage of training, excessive background noise, unclassified feature alignment and low fine granularity by a two-stage feature alignment mode, so that the problem of low generalization performance of a target detection network caused by inter-domain difference is solved (as shown in figure 1, the detection effect of the target detection network trained on a sample shown in figure (a) on a figure (b) is poor).
The technical scheme adopted by the invention is as follows:
step 1: the invention takes fast R-CNN as a target detection framework, and a feature extraction backbone network of the fast R-CNN is VGG16-D and comprises a first convolution layer, a first lower sampling layer, a second convolution layer, a second lower sampling layer, a third convolution layer, a third lower sampling layer, a fourth convolution layer, a fourth lower sampling layer and a fifth convolution layer, wherein a feature diagram output by the fifth convolution layer is marked as F, and the size of the feature diagram is marked as M multiplied by N multiplied by 512;
step 2: the step is an RPN module, a RoI Head module and a classification regression module of the original Faster R-CNN; the RPN module generates a candidate frame and maps the candidate frame to a feature map F to obtain a feature candidate area AiI represents the ith feature candidate region; the candidate feature regions with different sizes are scaled to be 7 x 7 in size by the RoI Head module and unfolded into one-dimensional features, and then the feature vectors A 'are obtained through two layers of full connection layers FC1 and FC 2'i;A′iAs input to the classification and regression module, a classification result F is obtainediAnd regression offset O of predicted frame coordinatesi(ii) a In the two-stage feature alignment algorithm, in the first stage and the second stage of training, the Faster R-CNN module participates in training;
and step 3: the step is one of the core contents of the patent, in a first training stage (the first 5 training periods, 20 training periods in total), on the basis of step 2, a feature map F output by a fifth convolutional layer is additionally input into a 1 × 1 × 512 convolutional network, the number of output feature channels is 1, a new feature map F ' is output, the size of the feature map F ' is M × N × 1, sigmoid operation is performed on each feature point of F ', a result of which the operation result is greater than or equal to λ (λ ═ 0.8) is reserved, a feature center point map C before clustering is obtained, the center distinguishes which foreground target each feature point belongs to specifically by using a K-means + + algorithm, the foreground targets are divided into N types in total, the feature points of each type of foreground targets are 3 feature points before probability, and the feature points of each type of foreground targets are reservedThe remaining feature points are feature center points, the rest are set to 0, and finally a clustered feature center point diagram is obtained and recorded as P, and each feature center point on P is recorded as Pk
And 4, step 4: the step is one of the core contents of the patent, and based on step 3, each central feature point P on the feature central point diagram P is basedkGenerating a series of standard feature candidate frames on the feature map F, wherein the standard feature candidate frames are respectively 1 × 1, 3 × 3 and 5 × 5 (the unit is one feature point on the feature map F), and the center of the candidate frame is Pk(ii) a Based on central feature point PkThe characteristic region of the generated standard candidate frame mapped on the characteristic map M is recorded as
Figure BDA0002973994550000031
And
Figure BDA0002973994550000032
for the
Figure BDA0002973994550000033
And
Figure BDA0002973994550000034
if the feature value of the feature point in the range is smaller than 1/2 of the feature value corresponding to the central feature point on the feature map M, the value is set to 0, that is, the feature point is paired with the feature point in the range
Figure BDA0002973994550000035
And
Figure BDA0002973994550000036
performing background suppression to generate a feature region after background suppression
Figure BDA0002973994550000037
And
Figure BDA0002973994550000038
and 5: for the characteristic region generated in step 4
Figure BDA0002973994550000039
And
Figure BDA00029739945500000310
transversely unfolding and splicing the two-dimensional characteristic vectors into a one-dimensional characteristic vector LkThe size is (35X 3X N) X1, and is input into a domain classifier D1Judging the one-dimensional feature vector LkBelonging to a source domain or a target domain; domain classifier D1The structure of (2) is three full connection layers, which are marked as FC3, FC4 and FC5, the input dimension of FC3 is 35 multiplied by 3, the output dimension is 1024, the input dimension of FC4 is 1024, the output dimension is 1024, the input dimension of FC5 is 1024, and the output dimension is 1; the loss function due to domain classification is shown in equation (1), where DiAs a domain tag, D i0 denotes that the characteristic region is from the source domain, D i1 indicates that the feature region is from the target domain,
Figure BDA00029739945500000311
representing a feature vector corresponding to a feature point with coordinates (u, v) in a feature area corresponding to a kth (k ∈ 0,1,2) standard frame of a jth central feature point in the ith image sample;
Figure BDA00029739945500000312
step 6: the step is one of the core contents of the patent, and in the last 15 training periods, the contents of the steps 3-5 do not participate in the training any more; on the basis of the step 2, in each training turn, for the feature vector output by the RoI Head module, the feature vector of which the classification result belongs to the class c in the source domain sample updates the feature vector
Figure BDA00029739945500000313
Updating the feature vector of which the classification result belongs to the class c in the target domain sample
Figure BDA00029739945500000314
The updating mode is shown as formulas (2) and (3), wherein S and T respectively represent that the sample features come from a source domain and a target domain, and T represents the T-th training round; using current training roundsGenerated feature vector
Figure BDA00029739945500000315
And
Figure BDA00029739945500000316
updating the accumulated characteristics of each class of foreground targets from the second training stage to the current t training turn according to the classification
Figure BDA00029739945500000317
And
Figure BDA00029739945500000318
then, carrying out feature alignment; the updating mode of the feature accumulation vector is shown in formulas (4) and (5);
Figure BDA0002973994550000041
Figure BDA0002973994550000042
Figure BDA0002973994550000043
Figure BDA0002973994550000044
and 7: step 6, outputting feature accumulations of each class in each training turn
Figure BDA0002973994550000045
And
Figure BDA0002973994550000046
using L2Distance measures the difference between the same class of target features between the source domain and the target domain, using a loss function LcatIs represented as formula (6);
Figure BDA0002973994550000047
and 8: for the foreground target feature vector p output by the RoI Head in each training turni,jInput domain classifier D2Domain classifier D2The network structure of (2) is also three full connection layers, and the domain classifier D1Consistently, but the input size of the first fully-connected layer is 49, the loss function is shown in equation (7), where i represents the ith image sample and j represents the jth foreground object.
Lins=-∑i,j[DilogD2(pi,j)+(1-Di)log(1-D2(pi,j))] (7)
Compared with the prior art, the invention has the beneficial effects that:
(1) in the local feature alignment process, by the method of firstly finding the central point of the feature region and then positioning the feature region, the condition that excessive background noise is introduced due to low accuracy of the feature candidate region at the early stage of training is avoided, and the adverse effect of the background noise on feature alignment is reduced;
(2) in the local feature alignment process, feature alignment is carried out on the foreground target feature region according to the classification, so that the fine granularity of the algorithm is improved, the inter-domain difference of the foreground target features of the same class is further reduced, and the feature alignment effect is enhanced.
Drawings
FIG. 1 is a diagram: and the performance of the target detection algorithm is reduced under the condition of inter-domain difference.
FIG. 2 is a diagram of: and (3) a schematic diagram of a target detection algorithm based on two-stage feature alignment.
FIG. 3 is a diagram of: the feature region center points before clustering illustrate the intent.
FIG. 4 is a diagram of: the feature region center points after clustering illustrate the intent.
FIG. 5 is a diagram: the standard area features after background suppression illustrate the intent.
FIG. 6 is a diagram of: domain classifier D1And (4) a network structure schematic diagram.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
First, a structure diagram of a target detection algorithm based on two-stage local feature alignment is shown in fig. 2. The method is divided into a Faster R-CNN module and a two-stage feature alignment module, wherein the first stage feature alignment of the two-stage feature alignment is applied to the front 5 training periods, and the second stage feature alignment is applied to the rear 15 training periods. The Faster R-CNN module participates in the whole training process, and the feature extraction network adopts VGG16 and comprises 13 convolutional layers and 5 pooling layers.
In the first stage of feature alignment, a feature graph F generated by a feature extraction network is utilized, whether each feature point on the F is a feature center point is judged through a 1 x 1 convolution network and a sigmoid layer, the feature center points with the probability value larger than 0.8 are filtered out to obtain a feature center point graph C before clustering, as shown in an attached figure 3, the feature center points are clustered, the feature center points are divided into N types, and the feature center point of k before the probability is selected from each type of feature center points to obtain a feature center point graph P after clustering. Based on each feature center point on P, standard candidate frames with the sizes of 1 × 1, 3 × 3 and 5 × 5 are generated on the feature map M. The standard candidate box generated for each feature center point is shown in fig. 4. And taking 1/2 of the feature value corresponding to the feature central point as a background suppression reference value, performing background suppression on the feature region mapped by the standard candidate frame, and resetting the feature value smaller than the background suppression reference value to 0, as shown in fig. 5. After background suppression, the feature areas mapped by the standard candidate frames corresponding to each feature central point are characterized by expansion and splicing into 35 multiplied by 1 feature vectors, and the feature vectors are input into a domain classifier D1Performing a classification judgment, D1The network structure is shown in figure 6. The accuracy of the standard candidate frame generated based on the feature center point is far higher than that of the RPN candidate frame in the early stage of training, and the feature region mapped by the standard candidate frame is subjected to background suppression, so that the problem of background noise introduced by the conventional feature alignment is solved.
The second stage feature alignment is not using standard candidate boxes, but using the RoI Head and the classification network, the feature alignment is obtained from the method of equations (2) and (3)Feature vectors for each class originating from and target domain
Figure BDA0002973994550000051
And
Figure BDA0002973994550000052
and utilized in the manner of the formulas (4) and (5)
Figure BDA0002973994550000053
And
Figure BDA0002973994550000054
to update the cumulative characteristics of each class of foreground objects
Figure BDA0002973994550000055
And
Figure BDA0002973994550000056
and inputs it to the domain classifier D2. In the later stage of training, the accuracy of the RPN candidate frame is improved to a satisfactory value, the size specification of the RPN candidate frame is more, and the RPN candidate frame is more fit with a foreground target than a standard candidate frame, so that the RPN candidate frame is adopted in the later stage instead of the standard candidate frame; and for the feature vectors corresponding to the RPN candidate frames, feature alignment is carried out according to classification, so that the fine granularity of the algorithm is improved, and the feature difference of the same class target between two domains is further reduced.
While the invention has been described with reference to specific embodiments, any feature disclosed in this specification may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise; all of the disclosed features, or all of the method or process steps, may be combined in any combination, except combinations where mutually exclusive features or/and steps are present.

Claims (3)

1. A target detection method based on two-stage feature alignment is characterized by comprising the following steps:
step 1: the invention takes fast R-CNN as a target detection framework, and a feature extraction backbone network of the fast R-CNN is VGG16-D and comprises a first convolution layer, a first lower sampling layer, a second convolution layer, a second lower sampling layer, a third convolution layer, a third lower sampling layer, a fourth convolution layer, a fourth lower sampling layer and a fifth convolution layer, wherein a feature diagram output by the fifth convolution layer is marked as F, and the size of the feature diagram is marked as M multiplied by N multiplied by 512;
step 2: the step is an RPN module, a RoI Head module and a classification regression module of the original Faster R-CNN; the RPN module generates a candidate frame and maps the candidate frame to a feature map F to obtain a feature candidate area AiI represents the ith feature candidate region; the candidate feature regions with different sizes are scaled to be 7 x 7 in size by the RoI Head module and unfolded into one-dimensional features, and then the feature vectors A 'are obtained through two layers of full connection layers FC1 and FC 2'i;A′iAs input to the classification and regression module, a classification result F is obtainediAnd regression offset O of predicted frame coordinatesi(ii) a In the two-stage feature alignment algorithm, in the first stage and the second stage of training, the Faster R-CNN module participates in training;
and step 3: this step is one of the core contents of the patent, and in the first training phase (the first 5 training periods, 20 training periods in total), on the basis of step 2, inputting 1 × 1 × 512 convolution network to the feature map F output by the fifth convolution layer, outputting a new feature map F 'with the number of output feature channels being 1 and the size being M × N × 1, and performing sigmoid operation on each feature point of F', reserving the result of the operation result which is more than or equal to the threshold lambda to obtain a feature center point diagram C before clustering, distinguishing which foreground target each feature point belongs to by using a clustering algorithm, dividing the foreground targets into N types, taking the feature point of k before probability as the feature point of each type of foreground target, taking the reserved feature point as the feature center point, setting 0 for the rest, and finally obtaining a clustered feature center point diagram which is marked as P, wherein each feature center point on P is marked as P.k
And 4, step 4: the step is one of the core contents of the patent, and based on step 3, each central feature point P on the feature central point diagram P is basedkGenerating a series of standard feature candidate frames with the sizes of 1 × 1, 3 × 3 and 5 × 5 (the unit is on the feature map F)One feature point), the center of the candidate frame is Pk(ii) a Based on central feature point PkThe characteristic region of the generated standard candidate frame mapped on the characteristic map M is recorded as
Figure FDA0002973994540000011
Figure FDA0002973994540000012
And
Figure FDA0002973994540000013
for the
Figure FDA0002973994540000014
And
Figure FDA0002973994540000015
if the characteristic value of the characteristic point in the range is less than the delta of the corresponding characteristic value of the central characteristic point on the characteristic map M, the characteristic point is set to be 0, namely, the characteristic point is paired with the central characteristic point
Figure FDA0002973994540000016
And
Figure FDA0002973994540000017
performing background suppression to generate a feature region after background suppression
Figure FDA0002973994540000018
And
Figure FDA0002973994540000019
and 5: for the characteristic region generated in step 4
Figure FDA00029739945400000110
And
Figure FDA00029739945400000111
transversely unfolded and spliced intoOne-dimensional feature vector LkThe size is (35X 3X N) X1, and is input into a domain classifier D1Judging the one-dimensional feature vector LkBelonging to a source domain or a target domain; domain classifier D1The structure of (2) is three full connection layers, which are marked as FC3, FC4 and FC5, the input dimension of FC3 is 35 multiplied by 3, the output dimension is 1024, the input dimension of FC4 is 1024, the output dimension is 1024, the input dimension of FC5 is 1024, and the output dimension is 1; the loss function due to domain classification is shown in equation (1), where DiAs a domain tag, Di0 denotes that the characteristic region is from the source domain, Di1 indicates that the feature region is from the target domain,
Figure FDA0002973994540000021
representing a feature vector corresponding to a feature point with coordinates (u, v) in a feature area corresponding to a kth (k ∈ 0,1,2) standard frame of a jth central feature point in the ith image sample;
step 6: the step is one of the core contents of the patent, and in the last 15 training periods, the contents of the steps 3-5 do not participate in the training any more; on the basis of the step 2, in each training turn, for the feature vector output by the RoI Head module, the feature vector of which the classification result belongs to the class c in the source domain sample updates the feature vector
Figure FDA0002973994540000022
Updating the feature vector of which the classification result belongs to the class c in the target domain sample
Figure FDA0002973994540000023
The updating mode is shown as formulas (2) and (3), wherein S and T respectively represent that the sample features come from a source domain and a target domain, and T represents the T-th training round; feature vectors generated with a current training round
Figure FDA0002973994540000024
And
Figure FDA0002973994540000025
update by classification starting from the second training phaseThe cumulative features of each class of foreground object up to the current tth training round are recorded as
Figure FDA0002973994540000026
And
Figure FDA0002973994540000027
then, carrying out feature alignment; the updating mode of the feature accumulation vector is shown in formulas (4) and (5);
and 7: step 6, outputting feature accumulations of each class in each training turn
Figure FDA0002973994540000028
And
Figure FDA0002973994540000029
using L2Measuring the difference between the target characteristics of the same category between the source domain and the target domain by distance to obtain a loss function formula (6);
and 8: for the foreground target feature vector p output by the RoI Head in each training turni,jInput domain classifier D2The loss function (7) is obtained.
2. As described in claim 1, step 3 is characterized by a threshold λ of 0.8, a clustering algorithm of K-means + +, a classification number N of 5, and K of 3.
3. The method of claim 1, wherein the background feature is suppressed by 1/2 with a small center feature value δ.
CN202110270152.3A 2021-03-12 2021-03-12 Target detection method based on two-stage local feature alignment Active CN113052184B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110270152.3A CN113052184B (en) 2021-03-12 2021-03-12 Target detection method based on two-stage local feature alignment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110270152.3A CN113052184B (en) 2021-03-12 2021-03-12 Target detection method based on two-stage local feature alignment

Publications (2)

Publication Number Publication Date
CN113052184A true CN113052184A (en) 2021-06-29
CN113052184B CN113052184B (en) 2022-11-18

Family

ID=76512631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110270152.3A Active CN113052184B (en) 2021-03-12 2021-03-12 Target detection method based on two-stage local feature alignment

Country Status (1)

Country Link
CN (1) CN113052184B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343989A (en) * 2021-07-09 2021-09-03 中山大学 Target detection method and system based on self-adaption of foreground selection domain
CN114821152A (en) * 2022-03-23 2022-07-29 湖南大学 Domain self-adaptive target detection method and system based on foreground-class perception alignment
CN115131590A (en) * 2022-09-01 2022-09-30 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related equipment
WO2023092582A1 (en) * 2021-11-25 2023-06-01 Hangzhou Innovation Institute, Beihang University A scene adaptive target detection method based on motion foreground

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977790A (en) * 2019-03-04 2019-07-05 浙江工业大学 A kind of video smoke detection and recognition methods based on transfer learning
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning
CN110321891A (en) * 2019-03-21 2019-10-11 长沙理工大学 A kind of big infusion medical fluid foreign matter object detection method of combined depth neural network and clustering algorithm
CN110363122A (en) * 2019-07-03 2019-10-22 昆明理工大学 A kind of cross-domain object detection method based on multilayer feature alignment
CN110516671A (en) * 2019-08-27 2019-11-29 腾讯科技(深圳)有限公司 Training method, image detecting method and the device of neural network model
AU2019101224A4 (en) * 2019-10-05 2020-01-16 Shu, Zikai MR Method of Human detection research and implement based on deep learning
CN111832513A (en) * 2020-07-21 2020-10-27 西安电子科技大学 Real-time football target detection method based on neural network
CN111861978A (en) * 2020-05-29 2020-10-30 陕西师范大学 Bridge crack example segmentation method based on Faster R-CNN
CN111882055A (en) * 2020-06-15 2020-11-03 电子科技大学 Method for constructing target detection self-adaptive model based on cycleGAN and pseudo label
CN112465752A (en) * 2020-11-16 2021-03-09 电子科技大学 Improved Faster R-CNN-based small target detection method

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109977790A (en) * 2019-03-04 2019-07-05 浙江工业大学 A kind of video smoke detection and recognition methods based on transfer learning
CN109977812A (en) * 2019-03-12 2019-07-05 南京邮电大学 A kind of Vehicular video object detection method based on deep learning
CN110321891A (en) * 2019-03-21 2019-10-11 长沙理工大学 A kind of big infusion medical fluid foreign matter object detection method of combined depth neural network and clustering algorithm
CN110363122A (en) * 2019-07-03 2019-10-22 昆明理工大学 A kind of cross-domain object detection method based on multilayer feature alignment
CN110516671A (en) * 2019-08-27 2019-11-29 腾讯科技(深圳)有限公司 Training method, image detecting method and the device of neural network model
AU2019101224A4 (en) * 2019-10-05 2020-01-16 Shu, Zikai MR Method of Human detection research and implement based on deep learning
CN111861978A (en) * 2020-05-29 2020-10-30 陕西师范大学 Bridge crack example segmentation method based on Faster R-CNN
CN111882055A (en) * 2020-06-15 2020-11-03 电子科技大学 Method for constructing target detection self-adaptive model based on cycleGAN and pseudo label
CN111832513A (en) * 2020-07-21 2020-10-27 西安电子科技大学 Real-time football target detection method based on neural network
CN112465752A (en) * 2020-11-16 2021-03-09 电子科技大学 Improved Faster R-CNN-based small target detection method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘博文,彭祝亮,范程岸: "基于Cascade-Rcnn的行人检测", 《无线互联科技》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113343989A (en) * 2021-07-09 2021-09-03 中山大学 Target detection method and system based on self-adaption of foreground selection domain
WO2023092582A1 (en) * 2021-11-25 2023-06-01 Hangzhou Innovation Institute, Beihang University A scene adaptive target detection method based on motion foreground
CN114821152A (en) * 2022-03-23 2022-07-29 湖南大学 Domain self-adaptive target detection method and system based on foreground-class perception alignment
CN115131590A (en) * 2022-09-01 2022-09-30 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related equipment
CN115131590B (en) * 2022-09-01 2022-12-06 浙江大华技术股份有限公司 Training method of target detection model, target detection method and related equipment

Also Published As

Publication number Publication date
CN113052184B (en) 2022-11-18

Similar Documents

Publication Publication Date Title
CN113052184B (en) Target detection method based on two-stage local feature alignment
CN113378632B (en) Pseudo-label optimization-based unsupervised domain adaptive pedestrian re-identification method
CN107301383B (en) Road traffic sign identification method based on Fast R-CNN
CN113408492B (en) Pedestrian re-identification method based on global-local feature dynamic alignment
CN107633226B (en) Human body motion tracking feature processing method
CN111881714A (en) Unsupervised cross-domain pedestrian re-identification method
CN109086777B (en) Saliency map refining method based on global pixel characteristics
CN108846404B (en) Image significance detection method and device based on related constraint graph sorting
CN111160407A (en) Deep learning target detection method and system
CN108230330B (en) Method for quickly segmenting highway pavement and positioning camera
CN113129335B (en) Visual tracking algorithm and multi-template updating strategy based on twin network
Liu et al. A deep fully convolution neural network for semantic segmentation based on adaptive feature fusion
CN111898665A (en) Cross-domain pedestrian re-identification method based on neighbor sample information guidance
Yang et al. A fast and effective video vehicle detection method leveraging feature fusion and proposal temporal link
CN113128308B (en) Pedestrian detection method, device, equipment and medium in port scene
Asgarian Dehkordi et al. Vehicle type recognition based on dimension estimation and bag of word classification
CN115661777A (en) Semantic-combined foggy road target detection algorithm
CN116229112A (en) Twin network target tracking method based on multiple attentives
CN114743126A (en) Lane line sign segmentation method based on graph attention machine mechanism network
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
Prabhakar et al. Cdnet++: Improved change detection with deep neural network feature correlation
CN113378704B (en) Multi-target detection method, equipment and storage medium
CN114067240A (en) Pedestrian single-target tracking method based on online updating strategy and fusing pedestrian characteristics
CN115131671A (en) Cross-domain high-resolution remote sensing image typical target fine-grained identification method
CN114187301A (en) X-ray image segmentation and classification prediction model based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant