CN110544269A - twin network infrared target tracking method based on characteristic pyramid - Google Patents

twin network infrared target tracking method based on characteristic pyramid Download PDF

Info

Publication number
CN110544269A
CN110544269A CN201910720012.4A CN201910720012A CN110544269A CN 110544269 A CN110544269 A CN 110544269A CN 201910720012 A CN201910720012 A CN 201910720012A CN 110544269 A CN110544269 A CN 110544269A
Authority
CN
China
Prior art keywords
scale
frame
feature
classification
branch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910720012.4A
Other languages
Chinese (zh)
Inventor
周慧鑫
刘国均
周腾飞
宋江鲁奇
李欢
于跃
张嘉嘉
杜娟
吴娜娜
成宽洪
秦翰林
王炳健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Electronic Science and Technology
Original Assignee
Xian University of Electronic Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Electronic Science and Technology filed Critical Xian University of Electronic Science and Technology
Priority to CN201910720012.4A priority Critical patent/CN110544269A/en
Publication of CN110544269A publication Critical patent/CN110544269A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10048Infrared image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

the invention discloses a twin network infrared target tracking method based on a feature pyramid, which comprises the steps of respectively carrying out bottom-up full convolution operation on a template frame and a detection frame, then respectively carrying out top-up operation and transverse connection on convolution layers C5, C4, C3 and C2, and respectively and correspondingly generating P2, P3, P4 and P5 scale feature layers; performing convolution operation on the scale feature layers corresponding to the detection frames respectively according to the classification weight and the regression weight of each scale feature layer of the template frame to respectively determine scale proposals corresponding to each scale feature layer of the detection frames; and carrying out non-maximum suppression on scale suggestions corresponding to all scale feature layers of the detection frame, reserving the highest scale suggestion, and taking the scale feature layer corresponding to the highest scale suggestion as an output tracking result. The method has better discrimination capability on the target deformation, and is more suitable for complex scenes and target deformation.

Description

Twin network infrared target tracking method based on characteristic pyramid
Technical Field
the invention belongs to the field of infrared target tracking, and particularly relates to a twin network infrared target tracking method based on a characteristic pyramid.
background
In recent years, moving target detection and tracking technology based on visible light computer vision is rapidly developed, and has been widely applied in various fields, such as man-machine interaction, intelligent video monitoring, accurate guidance and the like. However, the visible light system cannot play an effective role in severe weather and at night, and compared with the visible light system, the infrared imaging system has relative advantages in this respect because the infrared imaging system acquires information by sensing infrared signals radiated by an object, and therefore the infrared imaging system has the advantages of working all day long, good concealment, strong smoke penetration capability, strong interference resistance capability and the like.
the infrared target tracking is a process of finding a target in an infrared image sequence and implementing effective tracking, and has a very important position in tasks such as defense, alarm and countermeasure of an infrared system as a key technology of an infrared imaging system, so that the research on the infrared target tracking technology has a very high application value.
At present, many scholars at home and abroad carry out intensive research on the infrared target detection and tracking technology. Nevertheless, there are still many problems to be solved due to the inherent limitations of infrared systems. The infrared image lacks visual color information, the image signal-to-noise ratio is low, the similarity is difficult to distinguish, the background clutter is serious, the resolution ratio is low, and the like, so that the infrared target is easily submerged in the background. In addition, if similar objects exist in the background, the infrared tracker is easy to miss the target, and the difficulty of re-detection after the target is shielded is high. Such as the twin network tracking algorithm (siamppn), which cannot output an accurate target size and is highly error-prone when there is an object similar to the target in the background. Therefore, the research on the infrared target tracking technology under the complex background is a challenging subject and has important significance for the deep research thereof.
disclosure of Invention
In view of the above, the main objective of the present invention is to provide a twin network infrared target tracking method based on a feature pyramid.
In order to achieve the purpose, the technical scheme of the invention is realized as follows:
the embodiment of the invention provides a twin network infrared target tracking method based on a characteristic pyramid, which comprises the following steps:
Taking a current frame of an original infrared image sequence as a detection frame of a detection branch in the feature pyramid-based twin network, and taking a previous frame as a template frame of a template branch in the feature pyramid-based twin network;
Respectively carrying out bottom-up full convolution operation on the template frame and the detection frame, then respectively carrying out top-up operation and transverse connection on the convolution layers C5, C4, C3 and C2, and respectively and correspondingly generating P2, P3, P4 and P5 scale feature layers;
Performing 2k 'and 4 k' channel expansion on the P2, P3, P4 and P5 scale feature layers of the template frame respectively to generate classification weight and regression weight respectively;
Performing convolution operation on the scale feature layers corresponding to the detection frames respectively according to the classification weight and the regression weight of each scale feature layer of the template frame to respectively determine scale proposals corresponding to each scale feature layer of the detection frames;
and carrying out non-maximum suppression on scale suggestions corresponding to all scale feature layers of the detection frame, reserving the highest scale suggestion, and taking the scale feature layer corresponding to the highest scale suggestion as an output tracking result.
In the above scheme, the taking the current frame of the original infrared image sequence as the detection frame of the detection branch in the feature pyramid-based twin network and the taking the previous frame as the template frame of the template branch in the feature pyramid-based twin network specifically includes: the current frame of the original infrared image sequence is used as a detection frame of a detection branch in the feature extraction sub-network, and the previous frame is used as a template frame of a template branch in the feature extraction sub-network.
In the above scheme, the performing a bottom-up full convolution operation on the template frame and the detection frame, and then performing a top-up operation and a horizontal connection on the convolutional layers C5, C4, C3, and C2, respectively, to generate the P2, P3, P4, and P5 scale feature layers, specifically: the network of each branch of the feature extraction sub-network adopts a three-route mode, namely bottom-up, top-down and transverse connection, and adopts five layers of networks which are the same as the area selection network from bottom to top, so that the scale is reduced layer by layer; performing up-sampling on the top-down route by adopting deconvolution; the transverse connection is to fuse the up-sampling result and the feature maps with the same size generated from bottom to top to generate feature maps from P2 to P5, which are in one-to-one correspondence with the feature maps from C2 to C5 from bottom to top; and performing target prediction by using P2, P3, P4 and P5 layers fused with the characteristics of each layer.
In the above scheme, the performing 2k 'and 4 k' channel expansion on the P2, P3, P4 and P5 scale feature layers of the template frame respectively to generate classification weights and regression weights respectively specifically includes: dividing the region selection sub-network into two branches, one branch for target-background classification and the other branch for regression of the target region; assuming that a region selection sub-network sets k ' anchor points, 2k ' channels need to be output for classification, and 4k ' channels need to be regressed.
In the above scheme, the performing 2k 'and 4 k' channel expansion on the P2, P3, P4 and P5 scale feature layers of the template frame respectively to generate classification weights and regression weights respectively is specifically implemented by the following steps:
(1) Adding the template φ p (z) to two branches [ φ p (z)) ] cls and [ φ p (z)) ] reg, extending to 2k 'and 4 k' times the number of channels through the two convolutional layers, respectively;
(2) The detection frame φ p (x) is also divided into two branches [ φ p (x)) ] cls and [ φ p (x)) ] reg by the two convolutional layers, but the number of channels remains unchanged;
(3) The [ Phip (z) ] and [ Phip (x) ] on the classification branch and the regression branch are respectively related by convolution operation, i.e. the correlation is
Wherein, [ phi p (z)) ] cls, [ phi p (z)) ] reg, [ phi p (x)) ] cls, [ phi p (x)) ] reg are classification branch of template frame, regression branch of template frame, classification branch of detection frame, regression branch of detection frame, classification branch relativity and regression branch relativity.
in the scheme, when a plurality of anchor training networks are used, a regularized and smoothed L1 loss function is adopted, and the expression is
Wherein Ax, Ay, Aw, Ah represent the center point, width and height of the anchor, Tx, Ty, Tw, Th represent the center point coordinate, width and height of the real target frame, and delta 0, delta 1, delta 2 and delta 3 are respectively the regularization distances of abscissa, ordinate, width and height;
smoothL1 loss of
the loss function of the network as a whole is
L=L+λL
Where λ is the hyperparameter, Lreg regression loss function, Llcs is the classification loss function, expressed using the cross-entropy loss function, i.e.
In the above scheme, the respectively performing convolution operation on the scale feature layer corresponding to the detection frame according to the classification weight and the regression weight of each scale feature layer of the template frame to respectively determine the scale proposal corresponding to each scale feature layer of the detection frame specifically includes: and dividing each layer of scale feature layer of the detection frame into a classification branch and a regression branch, respectively combining the classification weight and the regression weight determined by the template frame corresponding to the scale feature layer to obtain a classification confidence map and a regression confidence map of each anchor, and determining a scale proposal corresponding to the scale feature layer of the detection frame according to the correlation between the classification confidence map and the regression confidence map of each anchor.
In the above scheme, the dividing each layer of scale feature layer of the detection frame into a classification branch and a regression branch, obtaining a classification confidence map and a regression confidence map of each anchor by respectively combining the classification weight and the regression weight determined by the template frame corresponding to the scale feature layer, and determining the scale proposal corresponding to the layer of scale feature layer of the detection frame according to the correlation between the classification confidence map and the regression confidence map of each anchor specifically includes:
Representing classification and regression output feature maps R as sets of points
Wherein i belongs to [0, w ], j belongs to [0, h ], l belongs to [0,2 k'), and p belongs to {2,3,4,5 };
Wherein i belongs to [0, w), j belongs to [0, h), and m belongs to [0, k').
Let variables i and j represent the position of the respective anchor, and l represents the index number of the respective anchor, deriving the anchor set as K resolution sets output as well, activating ANC above to obtain the respective improved coordinates
Compared with the prior art, the method has better discrimination capability on the target deformation, and is more suitable for complex scenes and target deformation; the detailed characteristics of the target are fully utilized, and the method has good adaptability to infrared images with few detailed characteristics; performing parallel calculation on classification and scale regression, and predicting the aspect ratio of the target to ensure that the tracking has more accuracy and real-time performance; the intelligent perception capability to the infrared target is stronger, and the problem of shielding the target, being similar to the background is more adaptive.
Drawings
FIG. 1 is a flowchart of a twin network infrared target tracking method based on a feature pyramid according to an embodiment of the present invention
FIG. 2 is a diagram of a twin network structure of a twin network infrared target tracking method based on a feature pyramid according to an embodiment of the present invention;
FIG. 3 is a diagram of selecting a target frame with a distance center not exceeding 7 in a classification feature map in a feature pyramid-based twin network infrared target tracking method according to an embodiment of the present invention;
FIG. 4 shows the tracking results of frame 1, 243, 700, 944 of the Boat2 sequence;
Fig. 5 is an accuracy chart and a success rate chart of six algorithms of the algorithm (Gif-siamfn), the pyramid-based twin network algorithm (siamfn), the context-aware-based scale estimation algorithm (Gif-SECA), the full convolution twin network algorithm (SiamFC), the discriminant scale space tracking algorithm (DSST), and the correlation filter network algorithm (CFNet) according to the present invention, respectively.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The embodiment of the invention provides a twin network infrared target tracking method based on a characteristic pyramid, which comprises the following steps of:
Step 1: taking a current frame of an original infrared image sequence as a detection frame of a detection branch in the feature pyramid-based twin network, and taking a previous frame as a template frame of a template branch in the feature pyramid-based twin network;
In particular, in the feature extraction sub-network, a full convolution network without a full connectivity layer is employed.
Dividing the feature extraction subnetwork into two branches, one of which is a template branch, which takes a target image block z in the first frame as input; the other is a detection branch, which takes as input the target candidate box x in the current frame, these two branches sharing parameters in the convolutional layer.
The single detection is considered as a discriminant task with the aim of finding the parameter W that minimizes the average loss L of the prediction function ψ p (xi; W). It is calculated over n samples xi and corresponding labels yi
The purpose of one-time learning is to learn W from the target template z. z represents a template frame, x represents a detection frame, a function phi p represents a certain layer of feature map, and a function zeta represents a prediction function of a selected sub-network in an area, so that the one-time detection task can be represented as
Step 2: respectively carrying out bottom-up full convolution operation on the template frame and the detection frame, then respectively carrying out top-up operation and transverse connection on the convolution layers C5, C4, C3 and C2, and respectively and correspondingly generating P2, P3, P4 and P5 scale feature layers;
specifically, the network of each branch of the feature extraction subnetwork adopts three routes, namely bottom-up (red arrow), top-down (green arrow) and transverse connection (blue arrow), and adopts five layers of networks which are the same as the area selection network from bottom to top, such as C1 to C5 in fig. 2, and the scale is reduced layer by layer; performing up-sampling on the top-down route by adopting deconvolution; the transverse connection is to fuse the up-sampling result and the feature maps with the same size generated from bottom to top to generate the feature maps of P2 to P5 shown in FIG. 2, which are in one-to-one correspondence with the feature maps of C2 to C5 from bottom to top; and performing target prediction by using P2, P3, P4 and P5 layers fused with the characteristics of each layer.
for convenience, φ p (z) and φ p (x) are represented as feature maps for some layer of output, where p ∈ {2,3,4,5 }.
and step 3: performing 2k 'and 4 k' channel expansion on the P2, P3, P4 and P5 scale feature layers of the template frame respectively to generate classification weight and regression weight respectively;
specifically, the region selection subnetwork is divided into two branches, one for target-background classification and the other for target region regression. Assuming that k ' anchor points are set for the network, the network needs to output 2k ' channels for classification, and 4k ' channels for regression.
the method is specifically realized by the following steps:
(1) The template φ p (z) is added to two branches [ φ p (z)) ] cls and [ φ p (z)) ] reg, which are extended to 2k 'and 4 k' times the number of channels through the two convolutional layers, respectively.
(2) The frame to be monitored φ p (x) is also divided into two branches [ φ p (x)) ] cls and [ φ p (x)) ] reg by the two convolutional layers, but the number of channels remains unchanged.
(3) the [ Phip (z) ] and [ Phip (x) ] on the classification branch and the regression branch are respectively related by convolution operation, i.e. the correlation is
as shown in fig. 2, the output of the classification branch contains 2 k' channels, which represent the positive and negative activation functions of each anchor at the corresponding position on the output graph. Similarly, the output of the regression branch contains 4 k' channels, which represent dx, dy, dw, dh the distance between the anchor and the corresponding real target box.
And 4, step 4: performing convolution operation on the scale feature layers corresponding to the detection frames respectively according to the classification weight and the regression weight of each scale feature layer of the template frame to respectively determine scale proposals corresponding to each scale feature layer of the detection frames;
Specifically, the output of the template branch is treated as a convolution kernel of local detection, and correlation calculation is carried out on the output of the detection branch to obtain classification and regression output so as to obtain a proposal.
Representing classification and regression output feature maps R as sets of points
wherein i belongs to [0, w ], j belongs to [0, h ], l belongs to [0,2 k'), and p belongs to {2,3,4,5 }.
Wherein i belongs to [0, w), j belongs to [0, h), and m belongs to [0, k').
Let the variables i and j represent the position of the respective anchor and l the index number of the respective anchor, respectively, so that the anchor set can be derived as, in addition, the K resolution sets output upon activation of ANC above to obtain the respective improved coordinates are taken as
the scale proposals are determined in the same manner for the P2-P5 scale feature layers of the detected frames.
And 5: and carrying out non-maximum suppression on scale suggestions corresponding to all scale feature layers of the detection frame, reserving the highest scale suggestion, and taking the scale feature layer corresponding to the highest scale suggestion as an output tracking result.
in order to adapt the one-time detection method to the tracking task, the strategy of choosing the scale proposal is divided into two steps:
Because the target does not move too much in the tracking problem, the grid far away from the center is abandoned in the output classification response map, namely only g ' × g ' sub-area is reserved to obtain g ' × g ' × k ' proposed boxes (g ' takes 7, k ' takes 5), so as to remove the abnormal value. Fig. 3 is an illustration of selecting a target box within the classification feature map that is not more than 7 a from the center.
② the scores of the events are rearranged using cosine window and scale change penalty to obtain the best score. After discarding outliers, a cosine window is added to suppress large displacements, and then a penalty Pe is added to suppress large changes in size and ratio, Pe being expressed as
Wherein, it represents hyper-parameter, r represents the ratio of height and width of the proposed box, and r' represents the aspect ratio of the current frame target box. s and s' represent the overall dimensions of the proposed box and the current frame, and are calculated as
(w+p)×(h+p)=s
Where w and h represent the width and height of the target and p ═ 2 (w + h).
After these operations, the proposals returned by the current anchor are reordered after multiplying the classification score by the penalty. Then performing non-maximum suppression to obtain a final tracking bounding box; after the final bounding box is selected, the target size is updated by linear interpolation.
Setting network models and parameters:
1. When a network is trained by using a plurality of anchors, a regularized and smoothed L1 loss function is adopted, and the expression is
wherein Ax, Ay, Aw, Ah represent the center point, width and height of the anchor, Tx, Ty, Tw, Th represent the center point coordinate, width and height of the real target frame, and delta 0, delta 1, delta 2 and delta 3 are respectively the regularization distances of abscissa, ordinate, width and height. smoothL1 loss of
the loss function of the network as a whole is
L=L+λL
Where λ is the hyperparameter, Lreg is the regression loss function, Llcs is the classification loss function, expressed using the cross-entropy loss function, i.e.
2. The intersection ratio IoU of the prediction box and the true target box and two thresholds thi and thlo are used as metrics in the training phase. Positive samples are defined as having anchors and negative samples are defined as having anchors IoU < thlo.
The threshold thlo is set to 0.3 and thhi is set to 0.6.
3. For each scale feature layer, anchors of size 8 × 8 pixels are set, and each anchor has 5 aspect ratios: {0.33,0.5,1,2,3}, i.e., k' ═ 5. There are 20 anchors for the entire feature pyramid.
4. The length g of the sub-region of the best proposed choice is 7. For the training process, the loss function was optimized using the stochastic gradient descent method (SGD) with the initial learning rate value set to 0.01.
5. The bottom-up route of the full convolutional network uses a 5-layer convolution of AlexNet, using maximum pooling after the first two convolutional layers. With the exception of conv5, each convolutional layer employs a ReLU nonlinear activation function. The information of the convolution kernel size, the number of channels, the step size, the sizes of the detection frame and the template frame, and the like of each layer is shown in table 1.
TABLE 1 convolutional layer parameters
6. FIG. 4 shows frames 1, 243, 700, 944 of the Boat2 sequence, where the target is a ship at the sea surface, the target is initially about 8 × 20 pixels, the target is first out-of-plane inverted, the apparent morphology changes from FIG. 4(a) to FIG. 4(b), and all algorithms effectively track the target; then, the target is turned in the plane once again, the appearance of the 700 th frame is the same as that of the initial frame, but the GIF-SECA algorithm drifts due to the camera shake; around the 944 th frame, the target moves rapidly, and the scale becomes large rapidly, so that the algorithm of the invention has the highest accuracy.
7. Fig. 5(a) and 5(b) are accuracy maps and success rate maps respectively drawn by using 6 tracking algorithms of the present invention (Gif-siamfn), pyramid-based twin network algorithm (siamfn), context-aware-based scale estimation algorithm (Gif-SECA), full convolution twin network algorithm (SiamFC), discriminant scale space tracking algorithm (DSST), and correlation filter network algorithm (CFNet) for 16 infrared sequences, where the position error is 20 pairs of accuracy, the accuracy of the algorithm of the present invention (Gif-siamfn) reaches 0.914, the success rate curves are sorted by using the area under the curve, and the success rate of the present invention is relatively the highest. Therefore, the algorithm of the present invention surpasses the more classical tracking methods of recent years in both tracking accuracy and overlap rate.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention.

Claims (8)

1. A twin network infrared target tracking method based on a characteristic pyramid is characterized by comprising the following steps:
taking a current frame of an original infrared image sequence as a detection frame of a detection branch in the feature pyramid-based twin network, and taking a previous frame as a template frame of a template branch in the feature pyramid-based twin network;
Respectively carrying out bottom-up full convolution operation on the template frame and the detection frame, then respectively carrying out top-up operation and transverse connection on the convolution layers C5, C4, C3 and C2, and respectively and correspondingly generating P2, P3, P4 and P5 scale feature layers;
Performing 2k 'and 4 k' channel expansion on the P2, P3, P4 and P5 scale feature layers of the template frame respectively to generate classification weight and regression weight respectively;
Performing convolution operation on the scale feature layers corresponding to the detection frames respectively according to the classification weight and the regression weight of each scale feature layer of the template frame to respectively determine scale proposals corresponding to each scale feature layer of the detection frames;
and carrying out non-maximum suppression on scale suggestions corresponding to all scale feature layers of the detection frame, reserving the highest scale suggestion, and taking the scale feature layer corresponding to the highest scale suggestion as an output tracking result.
2. The method for tracking the infrared target of the twin network based on the feature pyramid as claimed in claim 1, wherein the step of taking the current frame of the original infrared image sequence as the detection frame of the detection branch in the twin network based on the feature pyramid and taking the previous frame as the template frame of the template branch in the twin network based on the feature pyramid specifically comprises: the current frame of the original infrared image sequence is used as a detection frame of a detection branch in the feature extraction sub-network, and the previous frame is used as a template frame of a template branch in the feature extraction sub-network.
3. the feature pyramid based twin network infrared target tracking method according to claim 1 or 2, wherein the full convolution operation from bottom to top is performed on the template frame and the detection frame respectively, and then the convolution layers C5, C4, C3 and C2 are performed from top to top and connected laterally respectively, so as to generate the P2, P3, P4 and P5 scale feature layers respectively, specifically: the network of each branch of the feature extraction sub-network adopts a three-route mode, namely bottom-up, top-down and transverse connection, and adopts five layers of networks which are the same as the area selection network from bottom to top, so that the scale is reduced layer by layer; performing up-sampling on the top-down route by adopting deconvolution; the transverse connection is to fuse the up-sampling result and the feature maps with the same size generated from bottom to top to generate feature maps from P2 to P5, which are in one-to-one correspondence with the feature maps from C2 to C5 from bottom to top; and performing target prediction by using P2, P3, P4 and P5 layers fused with the characteristics of each layer.
4. the feature pyramid based twin network infrared target tracking method according to claim 3, wherein the channel expansion of 2k 'and 4 k' is performed on the P2, P3, P4 and P5 scale feature layers of the template frame respectively, and classification weights and regression weights are generated respectively, specifically: dividing the region selection sub-network into two branches, one branch for target-background classification and the other branch for regression of the target region; assuming that a region selection sub-network sets k ' anchor points, 2k ' channels need to be output for classification, and 4k ' channels need to be regressed.
5. The feature pyramid based twin network infrared target tracking method according to claim 4, wherein the channel expansion of 2k 'and 4 k' is performed on the P2, P3, P4 and P5 scale feature layers of the template frame respectively, and classification weights and regression weights are generated respectively, which is implemented by the following steps:
(1) Adding the template φ p (z) to two branches [ φ p (z)) ] cls and [ φ p (z)) ] reg, extending to 2k 'and 4 k' times the number of channels through the two convolutional layers, respectively;
(2) The detection frame φ p (x) is also divided into two branches [ φ p (x)) ] cls and [ φ p (x)) ] reg by the two convolutional layers, but the number of channels remains unchanged;
(3) the [ Phip (z) ] and [ Phip (x) ] on the classification branch and the regression branch are respectively related by convolution operation, i.e. the correlation is
Wherein, [ phi p (z)) ] cls, [ phi p (z)) ] reg, [ phi p (x)) ] cls, [ phi p (x)) ] reg are classification branch of template frame, regression branch of template frame, classification branch of detection frame, regression branch of detection frame, classification branch relativity and regression branch relativity.
6. the feature pyramid based twin network infrared target tracking method of claim 5, wherein when multiple anchor training networks are used, a regularized smooth L1 loss function is adopted, and the expression is
Wherein Ax, Ay, Aw, Ah represent the center point, width and height of the anchor, Tx, Ty, Tw, Th represent the center point coordinate, width and height of the real target frame, and delta 0, delta 1, delta 2 and delta 3 are respectively the regularization distances of abscissa, ordinate, width and height;
smoothL1 loss of
the loss function of the network as a whole is
L=L+λL
Where λ is the hyperparameter, Lreg regression loss function, Llcs is the classification loss function, expressed using the cross-entropy loss function, i.e.
7. The feature pyramid-based twin network infrared target tracking method according to claim 6, wherein the determining the scale proposal corresponding to each scale feature layer of the detection frame by performing convolution operation on the scale feature layer corresponding to the detection frame according to the classification weight and the regression weight of each scale feature layer of the template frame respectively specifically comprises: and dividing each layer of scale feature layer of the detection frame into a classification branch and a regression branch, respectively combining the classification weight and the regression weight determined by the template frame corresponding to the scale feature layer to obtain a classification confidence map and a regression confidence map of each anchor, and determining a scale proposal corresponding to the scale feature layer of the detection frame according to the correlation between the classification confidence map and the regression confidence map of each anchor.
8. The feature pyramid-based twin network infrared target tracking method according to claim 7, wherein each scale feature layer of the detection frame is also divided into a classification branch and a regression branch, a classification confidence map and a regression confidence map of each anchor are obtained by respectively combining the classification weight and the regression weight determined by the template frame corresponding to the scale feature layer, and a scale proposal corresponding to the scale feature layer of the detection frame is determined according to the correlation between the classification confidence map and the regression confidence map of each anchor, specifically:
Representing classification and regression output feature maps R as sets of points
Wherein i belongs to [0, w ], j belongs to [0, h ], l belongs to [0,2 k'), and p belongs to {2,3,4,5 };
Wherein i belongs to [0, w), j belongs to [0, h), and m belongs to [0, k').
Let variables i and j represent the position of the respective anchor, and l represents the index number of the respective anchor, deriving the anchor set as K resolution sets output as well, activating ANC above to obtain the respective improved coordinates
CN201910720012.4A 2019-08-06 2019-08-06 twin network infrared target tracking method based on characteristic pyramid Pending CN110544269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910720012.4A CN110544269A (en) 2019-08-06 2019-08-06 twin network infrared target tracking method based on characteristic pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910720012.4A CN110544269A (en) 2019-08-06 2019-08-06 twin network infrared target tracking method based on characteristic pyramid

Publications (1)

Publication Number Publication Date
CN110544269A true CN110544269A (en) 2019-12-06

Family

ID=68710234

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910720012.4A Pending CN110544269A (en) 2019-08-06 2019-08-06 twin network infrared target tracking method based on characteristic pyramid

Country Status (1)

Country Link
CN (1) CN110544269A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111640136A (en) * 2020-05-23 2020-09-08 西北工业大学 Depth target tracking method in complex environment
CN111696137A (en) * 2020-06-09 2020-09-22 电子科技大学 Target tracking method based on multilayer feature mixing and attention mechanism
CN111724409A (en) * 2020-05-18 2020-09-29 浙江工业大学 Target tracking method based on densely connected twin neural network
CN111797716A (en) * 2020-06-16 2020-10-20 电子科技大学 Single target tracking method based on Siamese network
CN111915644A (en) * 2020-07-09 2020-11-10 苏州科技大学 Real-time target tracking method of twin guiding anchor frame RPN network
CN112308013A (en) * 2020-11-16 2021-02-02 电子科技大学 Football player tracking method based on deep learning
CN113920159A (en) * 2021-09-15 2022-01-11 河南科技大学 Infrared aerial small target tracking method based on full convolution twin network
CN114862904A (en) * 2022-03-21 2022-08-05 哈尔滨工程大学 Twin network target continuous tracking method of underwater robot

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129934A1 (en) * 2016-11-07 2018-05-10 Qualcomm Incorporated Enhanced siamese trackers
CN109242019A (en) * 2018-09-01 2019-01-18 哈尔滨工程大学 A kind of water surface optics Small object quickly detects and tracking

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180129934A1 (en) * 2016-11-07 2018-05-10 Qualcomm Incorporated Enhanced siamese trackers
CN109242019A (en) * 2018-09-01 2019-01-18 哈尔滨工程大学 A kind of water surface optics Small object quickly detects and tracking

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BO LI ET AL.: "High performance visual tracking with Siamese region proposal network", 《2018 IEEE/CVF CONFERENCE VISION AND PATTERN RECOGNITION》 *
TSUNG YI LIN ET AL.: "Feature pyramid networks for object detection", 《2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION(CVPR)》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111724409A (en) * 2020-05-18 2020-09-29 浙江工业大学 Target tracking method based on densely connected twin neural network
CN111640136B (en) * 2020-05-23 2022-02-25 西北工业大学 Depth target tracking method in complex environment
CN111640136A (en) * 2020-05-23 2020-09-08 西北工业大学 Depth target tracking method in complex environment
CN111696137A (en) * 2020-06-09 2020-09-22 电子科技大学 Target tracking method based on multilayer feature mixing and attention mechanism
CN111696137B (en) * 2020-06-09 2022-08-02 电子科技大学 Target tracking method based on multilayer feature mixing and attention mechanism
CN111797716A (en) * 2020-06-16 2020-10-20 电子科技大学 Single target tracking method based on Siamese network
CN111797716B (en) * 2020-06-16 2022-05-03 电子科技大学 Single target tracking method based on Siamese network
CN111915644A (en) * 2020-07-09 2020-11-10 苏州科技大学 Real-time target tracking method of twin guiding anchor frame RPN network
CN111915644B (en) * 2020-07-09 2023-07-04 苏州科技大学 Real-time target tracking method of twin guide anchor frame RPN network
CN112308013A (en) * 2020-11-16 2021-02-02 电子科技大学 Football player tracking method based on deep learning
CN113920159A (en) * 2021-09-15 2022-01-11 河南科技大学 Infrared aerial small target tracking method based on full convolution twin network
CN114862904A (en) * 2022-03-21 2022-08-05 哈尔滨工程大学 Twin network target continuous tracking method of underwater robot
CN114862904B (en) * 2022-03-21 2023-12-12 哈尔滨工程大学 Twin network target continuous tracking method of underwater robot

Similar Documents

Publication Publication Date Title
CN110544269A (en) twin network infrared target tracking method based on characteristic pyramid
WO2019101221A1 (en) Ship detection method and system based on multidimensional scene characteristics
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN103077539A (en) Moving object tracking method under complicated background and sheltering condition
CN112257569B (en) Target detection and identification method based on real-time video stream
CN111161309B (en) Searching and positioning method for vehicle-mounted video dynamic target
CN110414439A (en) Anti- based on multi-peak detection blocks pedestrian tracting method
CN113763427B (en) Multi-target tracking method based on coarse-to-fine shielding processing
Garg et al. Look no deeper: Recognizing places from opposing viewpoints under varying scene appearance using single-view depth estimation
CN113807188A (en) Unmanned aerial vehicle target tracking method based on anchor frame matching and Simese network
CN108256567A (en) A kind of target identification method and system based on deep learning
Tashlinskii et al. Pixel-by-pixel estimation of scene motion in video
CN112329764A (en) Infrared dim target detection method based on TV-L1 model
Liu et al. Self-correction ship tracking and counting with variable time window based on YOLOv3
CN116363694A (en) Multi-target tracking method of unmanned system crossing cameras matched with multiple pieces of information
CN115223056A (en) Multi-scale feature enhancement-based optical remote sensing image ship target detection method
CN113205494B (en) Infrared small target detection method and system based on adaptive scale image block weighting difference measurement
Liu et al. Target detection and tracking algorithm based on improved Mask RCNN and LMB
CN112101113B (en) Lightweight unmanned aerial vehicle image small target detection method
CN116563343A (en) RGBT target tracking method based on twin network structure and anchor frame self-adaptive thought
CN112184767A (en) Method, device, equipment and storage medium for tracking moving object track
CN116777956A (en) Moving target screening method based on multi-scale track management
CN110111358B (en) Target tracking method based on multilayer time sequence filtering
Duan Deep learning-based multitarget motion shadow rejection and accurate tracking for sports video
Xu et al. Moving target detection and tracking in FLIR image sequences based on thermal target modeling

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191206