CN110458864A - Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects - Google Patents
Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects Download PDFInfo
- Publication number
- CN110458864A CN110458864A CN201910590225.XA CN201910590225A CN110458864A CN 110458864 A CN110458864 A CN 110458864A CN 201910590225 A CN201910590225 A CN 201910590225A CN 110458864 A CN110458864 A CN 110458864A
- Authority
- CN
- China
- Prior art keywords
- target
- frame
- example aspects
- semantic knowledge
- picture
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000012549 training Methods 0.000 claims abstract description 21
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 19
- 238000003062 neural network model Methods 0.000 claims abstract description 4
- 230000008859 change Effects 0.000 claims description 3
- 230000006872 improvement Effects 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 239000000284 extract Substances 0.000 description 3
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 description 1
- 101100537629 Caenorhabditis elegans top-2 gene Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 101100370075 Mus musculus Top1 gene Proteins 0.000 description 1
- 101150107801 Top2a gene Proteins 0.000 description 1
- 230000001154 acute effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005352 clarification Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/22—Cropping
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of based on the method for tracking target and target tracker of integrating semantic knowledge and example aspects.Described method includes following steps: extracting the picture of the 1st, t-1, t frame;Cut in step 1 the 1st, the picture of t-1, t frame, take input of the picture after cutting as convolutional neural networks;The neural network model based on Darknet-19 is constructed, and carries out trickle amendment on its backbone network;The entire tracker convolutional neural networks of training;Finally, the model performance of assessment training.Regression problem and directly the prediction target location coordinate that is passed to frame are modeled as the present invention is based on proposing a kind of new network architecture model on Darknet-19, while by Target Tracking Problem.The model of present invention training is for specific object type, it is achieved that state-of-the-art performance and speed is very fast.
Description
Technical field
The invention belongs to technical field of image processing more particularly to a kind of based on the mesh for integrating semantic knowledge and example aspects
Mark tracking and target tracker.
Background technique
As the important component of a large amount of computer vision systems, target following technology has attracted grinding for numerous scientific research persons
Study carefully interest.In the past decade, powerful ability is shown in target tracking domain based on the method for deep learning.It is typical deep
Layer network structure, as convolutional neural networks (CNN) can extract representative visual signature in training end to end.With craft
Make feature tradition indicate unlike, the description of this image data knowledge abundant can be saved in model come with
The acute variation of track target.Therefore, best target tracker, such as visual object tracking (VOT), to image tracing benchmark (OTB)
Etc. the method for being all based on deep learning.
Unlike target detection or identification, the research of current goal tracking focuses primarily on the example aspects of target
Rather than semantic knowledge.However, human eye is as a kind of high performance tracker, it is special that it can either capture low-level vision
Sign, can also capture high-caliber semantic knowledge.When human eye attempts to track an automobile, the feature that always will be seen that is as general
The a part being open to the traffic.In the absence of detailed example aspects (such as: shaking, block or have an X-rayed change etc.), these priori
Knowledge plays key effect under conditions of challenging.
When handling a series of targets includes pedestrian and vehicle etc., although using region motion network (RPN) structure
It can directly predict the position of target, but they only realize the recurrence of different anchor points without any semantic hypothesis.
In view of this, it is necessary to design it is a kind of based on the method for tracking target for integrating semantic knowledge and example aspects, with solution
The certainly above problem.
Summary of the invention
Of the invention is to solve general target tracker to focus simply on the example aspects of target and have ignored semantic priori
The problem of knowledge, proposes a kind of based on the method for integrating semantic knowledge and example aspects progress target following.This method is based on
Darknet-19 proposes a kind of new network architecture model, and Target Tracking Problem is modeled as regression problem and is directly predicted
The target location coordinate of incoming frame.
In order to achieve the above object, this method comprises the following steps the present invention provides a kind of method:
Step 1: extracting the picture of the 1st, t-1, t frame;
Step 2: cut in step 1 the 1st, the picture of t-1, t frame, take the picture after cutting as convolutional neural networks
Input;
Step 3: neural network model of the building based on Darknet-19, and trickle amendment is carried out on its backbone network;
Step 4: the entire tracker convolutional neural networks of training;
Step 5: assessing trained model performance.
A further improvement of the present invention is that further including step 3.1 before step 4, the step 3.1 is design grid
Network output, network output is comprising classification branch and returns branch;It further include step 4.1 before step 5, the step 4.1
For planned network loss function.
A further improvement of the present invention is that in step 1, the 1st frame picture is chosen for the standard form comprising target,
T frame picture is chosen for the candidate region that target is likely to occur.
A further improvement of the present invention is that in step 2, extracting the standard form including target to be initialized.
A further improvement of the present invention is that, it is assumed that the size of real border frame be (w, h), around the center of target with
Size S cuts the 1st inputted frame picture, so that example image is obtained, always by this example image during entire tracking
As standard form, back gauge information meets following relationship at this time:
s2=(3w) * (3h) (1).
A further improvement of the present invention is that in step 3, the prototype structure based on Darknet-19 is rolled up using three
Lamination and two full articulamentums replace the global pool to be respectively used to classify and position.
A further improvement of the present invention is that the network exports a scores vector w by full articulamentum in t framet∈RK
As the classification results of target, K is the number of classification, this vector reflects corresponding object and appears in possibility in sight
Property;Meanwhile the network exportsDeformation Prediction as each classification target, it is assumed that t-1 frame output boundary frame is
pt-1=(xt-1,yt-1,wt-1,ht-1), wherein x, y are the centre coordinates of frame, and w and h are the width and height of frame, for classification k
Deformation returnIt is made of four coordinates:
Wherein,Indicate the deformation of the target under different semantic hypothesis.The final result p of t frametIt can
To be calculated by following formula:
A further improvement of the present invention is that using cross entropy loss function for Classification Loss w, bounding box is returned
It loses d and uses L1 loss function:
Wherein,It is second true strain for being input to third input, L1 loses to the bounding box of prediction and true
The slight errors of bounding box have higher punishment.Thus the model of training has more stable bounding box.
A further improvement of the present invention is that step 4 includes the first stage: the pre-training on ImageNet categorized data set
10 epochs of backbone network are inputted using original image as first, using the random contrast image of standard and color
Variation inputs as second and third;Second stage: the entire target following network of training obtains training pattern.
To realize goal of the invention, the present invention also provides a kind of target trackers for realizing preceding method.
Beneficial effects of the present invention are as follows: the present invention integrates semantic knowledge and example aspects carry out target following, is based on
Darknet-19 proposes a kind of new network architecture model, and Target Tracking Problem is modeled as regression problem and is directly predicted
The target location coordinate of incoming frame, realizes high-accuracy and execution efficiency of the tracker in daily tracking task.
Detailed description of the invention
Fig. 1 be extract the 1st, t-1, t frame picture.
Fig. 2 is three pictures cut in Fig. 1, makes that it includes targets.
Fig. 3 is convolutional neural networks model.
Fig. 4 is two output branchs of convolutional neural networks.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments
The present invention is described in detail.
It is emphasized that various formula and constraint condition use self-consistent respectively in description process of the present invention
Label distinguishes, but is also not excluded for using the different identical formula of label mark and/or constraint condition, the mesh being arranged in this way
Be in order to it is clearer illustrate feature of present invention where.
CNN target following model proposed by the present invention be on a mixed data set training (ImageNet VID and
ALOV300++).ImageNet VID contains 30 different types of targets, we choose common 8 class include: aircraft, from
Driving, bird, bus, car, cat, horse and motorcycle.Because there is no pedestrian on this data set, from ALOV300+
+ middle selection pedestrian, final composition one are sorted in interior mixed data set comprising 9.
As shown in Figure 1, the present invention extracts input of three pictures of video frame as network first, then by cutting
The Target Photo of CNN network is input to three width.As shown in Fig. 2, this three Target Photos pass through CNN convolutional neural networks model
Extraction feature (as shown in Figure 3), final output Liang Ge branch is as shown in figure 4, one is classification branch, for differentiating the class of target
Not, the other is returning branch, the recurrence for bounding box.
Table 1 is CNN network structure detail parameters designed by the present invention.
As shown in table 1, the present invention has finely tuned Darknet-19 network model, uses three convolutional layers and two full articulamentums
It replaces global pool to be respectively used to classify and positions, and finely tune above-mentioned mixed video data set.It is every in a video sequence
First frame and t-1 frame are extracted every 100 frames.For data enhancing, real border frame is transformed to using Gaussian Profile in t frame.
Model iteration on 4 pieces of NVIDIA Tesla P40 GPU is more than 50 times, and each iteration has 800 batches (512 samples).
Specifically, the present invention is based on the methods for integrating semantic knowledge and example aspects progress target following, comprising following
Step:
Step 1: the picture of the 1st, t-1, t frame is extracted as input:
First picture chooses standard form of the first frame as target, and the second picture is selected from t-1 frame, finally
One picture is selected in target is likely to occur in the current frame candidate region.
Step 2: three pictures that input in step 1 are cut, make that it includes targets:
In first frame, the standard form for extracting target is initialized.Assuming that the size of real border frame is (w, h),
Diagram piece is cut with size S around the center of target, square region provides example image and context back gauge information
Meet following relationship:
s2=(3w) * (3h) (1)
First input of this example image as CNN network, size 288*288.During entire tracking
Always using this example image as template.Hyper parameter 3 in equation (1) is protected from the video statistics data in VID data set
It stays.Movement of this configuration comprising all targets in almost consecutive frame, while ensuring acceptable resolution ratio number after scaling
Value.
It is assumed that the tracking result of t-1 frame is pt-1, with (x in t-1 frame and t framet-1,yt-1) center cut,
Cutting size is (3wt-1,3ht-1), the size of picture is also 288*288 and while second as CNN network after cutting
Input and third input.The ratio of target in first frame to be kept is paid attention to, because it encodes template clarification of objective.On the contrary
Ground is extended to the target of t-1 frame the size of 96 pixels, in favor of CNN network by standardization frame between deformation come
More effectively study bounding box returns.
Step 3: neural network model of the building based on Darknet-19, and trickle amendment is carried out on its backbone network:
For balance model capacity and efficiency, Darknet-19 is developed as pillar network.Darknet-19 is demonstrate,proved
It is bright to can be realized high-performance in related objective Detection task.The model is made of the convolution filter of 3*3 and 1*1, in difference
Scale between use maximum Chi Hualai connection, the number of active lanes of each scale doubles.The model is in such as target classification
Parameter that is very good and using is relatively fewer with the performance that shows in the task of positioning.Original knot based on Darknet-19
Structure, the present invention replace global pool using three convolutional layers and two full articulamentums to be respectively used to classify and position.Table 1 arranges
The detailed network architecture is gone out.
Step 4: planned network output, including classify and return branch:
In t frame, which exports a scores vector w by full articulamentumt∈RKAs the classification results of target, K is
The number of classification, this vector reflect a possibility that corresponding object appears in sight.Meanwhile the network exportsDeformation Prediction as each classification target, it is assumed that t-1 frame output boundary frame is pt-1=(xt-1,yt-1,wt-1,
ht-1), wherein x, y are the centre coordinates of frame, and w and h are the width and height of frame, and the deformation of classification k is returnedIt is sat by four
Mark composition:
Indicate the deformation of the target under different semantic hypothesis.The final result p of t frametIt can be by
Following formula calculates:
Step 5: planned network loss function:
Cross entropy loss function is used for Classification Loss w, loss d is returned for bounding box and uses L1 loss function:
Wherein,It is second true strain for being input to third input, L1 loses to the bounding box of prediction and true
The slight errors of bounding box have higher punishment.Thus the model of training has more stable bounding box.
Step 6: the convolutional neural networks model of training tracker:
First stage: 10 epochs of pre-training backbone network on ImageNet categorized data set, using original image
It inputs as first, is inputted using the random contrast image of standard and color change as second and third.It should
Network realizes 72.5% top-1 accuracy rate and 91.0% top-2 accuracy rate in ImageNet.
Second stage: the entire target following network of training obtains training pattern.
Step 7: assessment training pattern performance:
Trained model is assessed on the Sub Data Set of VOT 2016, which there are 15 video sequences
Column.
The present invention integrates semantic knowledge and example aspects and carries out target following, is proposed based on Darknet-19 a kind of new
Network architecture model, and Target Tracking Problem is modeled as regression problem and directly predicts the target location coordinate of incoming frame, it is real
High-accuracy and execution efficiency of the tracker in daily tracking task are showed.
The above examples are only used to illustrate the technical scheme of the present invention and are not limiting, although referring to preferred embodiment to this hair
It is bright to be described in detail, those skilled in the art should understand that, it can modify to technical solution of the present invention
Or equivalent replacement, without departing from the spirit and scope of the technical solution of the present invention.
Claims (10)
1. a kind of based on the method for tracking target for integrating semantic knowledge and example aspects, which is characterized in that the method includes such as
Lower step:
Step 1: extracting the picture of the 1st, t-1, t frame;
Step 2: cut in step 1 the 1st, the picture of t-1, t frame, take the picture after cutting as the defeated of convolutional neural networks
Enter;
Step 3: neural network model of the building based on Darknet-19, and trickle amendment is carried out on its backbone network;
Step 4: the entire tracker convolutional neural networks of training;
Step 5: assessing trained model performance.
2. according to claim 1 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
Further include step 3.1 before step 4, the step 3.1 be planned network output, the network output comprising classification branch and
Return branch;It further include step 4.1 before step 5, the step 4.1 is planned network loss function.
3. according to claim 1 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
In step 1, the 1st frame picture is chosen for the standard form comprising target, t frame picture is chosen for what target was likely to occur
Candidate region.
4. according to claim 3 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
In step 2, the standard form including target is extracted to be initialized.
5. according to claim 4 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
Assuming that the size of real border frame is (w, h), the 1st inputted frame picture is cut with size S around the center of target, thus
Example image is obtained, always using this example image as standard form during entire tracking, under back gauge information meets at this time
Column relationship:
s2=(3w) * (3h) (1).
6. according to claim 1 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
In step 3, based on the prototype structure of Darknet-19, global pool is replaced using three convolutional layers and two full articulamentums
It is respectively used to classify and position.
7. according to claim 2 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
In t frame, which exports a scores vector w by full articulamentumt∈RKAs the classification results of target, K is the number of classification
Mesh, this vector reflect a possibility that corresponding object appears in sight;Meanwhile the network exportsAs every
The Deformation Prediction of one classification target, it is assumed that t-1 frame output boundary frame is pt-1=(xt-1,yt-1,wt-1,ht-1), wherein x, y
It is the centre coordinate of frame, w and h are the width and height of frame, and the deformation of classification k is returnedIt is made of four coordinates:
Wherein,Indicate the deformation of the target under different semantic hypothesis.The final result p of t frametIt can be by
Following formula calculates:
8. according to claim 7 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
Cross entropy loss function is used for Classification Loss w, loss d is returned for bounding box and uses L1 loss function:
Wherein,It is second true strain for being input to third input, bounding box and true side of the L1 loss to prediction
The slight errors of boundary's frame have higher punishment.Thus the model of training has more stable bounding box.
9. according to claim 8 based on the method for tracking target for integrating semantic knowledge and example aspects, it is characterised in that:
Step 4 includes two stages: the first stage is 10 epochs of pre-training backbone network on ImageNet categorized data set, is adopted
Original image is used to input as first, using the random contrast image of standard and color change as second and third
A input;Second stage is that the entire target following network of training obtains training pattern.
10. a kind of realize the target tracker such as any one of claim 1-9 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910590225.XA CN110458864A (en) | 2019-07-02 | 2019-07-02 | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910590225.XA CN110458864A (en) | 2019-07-02 | 2019-07-02 | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110458864A true CN110458864A (en) | 2019-11-15 |
Family
ID=68482051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910590225.XA Pending CN110458864A (en) | 2019-07-02 | 2019-07-02 | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458864A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105442A (en) * | 2019-12-23 | 2020-05-05 | 中国科学技术大学 | Switching type target tracking method |
CN111428567A (en) * | 2020-02-26 | 2020-07-17 | 沈阳大学 | Pedestrian tracking system and method based on affine multi-task regression |
CN112053384A (en) * | 2020-08-28 | 2020-12-08 | 西安电子科技大学 | Target tracking method based on bounding box regression model |
CN112232359A (en) * | 2020-09-29 | 2021-01-15 | 中国人民解放军陆军炮兵防空兵学院 | Visual tracking method based on mixed level filtering and complementary characteristics |
CN112861652A (en) * | 2021-01-20 | 2021-05-28 | 中国科学院自动化研究所 | Method and system for tracking and segmenting video target based on convolutional neural network |
CN112966581A (en) * | 2021-02-25 | 2021-06-15 | 厦门大学 | Video target detection method based on internal and external semantic aggregation |
CN113298142A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method based on deep space-time twin network |
CN117237402A (en) * | 2023-11-15 | 2023-12-15 | 北京中兵天工防务技术有限公司 | Target motion prediction method and system based on semantic information understanding |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN108027972A (en) * | 2015-07-30 | 2018-05-11 | 北京市商汤科技开发有限公司 | System and method for Object tracking |
CN109255351A (en) * | 2018-09-05 | 2019-01-22 | 华南理工大学 | Bounding box homing method, system, equipment and medium based on Three dimensional convolution neural network |
CN109543754A (en) * | 2018-11-23 | 2019-03-29 | 中山大学 | The parallel method of target detection and semantic segmentation based on end-to-end deep learning |
-
2019
- 2019-07-02 CN CN201910590225.XA patent/CN110458864A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108027972A (en) * | 2015-07-30 | 2018-05-11 | 北京市商汤科技开发有限公司 | System and method for Object tracking |
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN109255351A (en) * | 2018-09-05 | 2019-01-22 | 华南理工大学 | Bounding box homing method, system, equipment and medium based on Three dimensional convolution neural network |
CN109543754A (en) * | 2018-11-23 | 2019-03-29 | 中山大学 | The parallel method of target detection and semantic segmentation based on end-to-end deep learning |
Non-Patent Citations (2)
Title |
---|
SHAOQING REN等: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE(VOL.39,NO.6)》 * |
SHAOQING REN等: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE(VOL.39,NO.6)》, 30 June 2017 (2017-06-30), pages 1137 - 1149, XP055705510, DOI: 10.1109/TPAMI.2016.2577031 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105442B (en) * | 2019-12-23 | 2022-07-15 | 中国科学技术大学 | Switching type target tracking method |
CN111105442A (en) * | 2019-12-23 | 2020-05-05 | 中国科学技术大学 | Switching type target tracking method |
CN111428567A (en) * | 2020-02-26 | 2020-07-17 | 沈阳大学 | Pedestrian tracking system and method based on affine multi-task regression |
CN111428567B (en) * | 2020-02-26 | 2024-02-02 | 沈阳大学 | Pedestrian tracking system and method based on affine multitask regression |
CN112053384A (en) * | 2020-08-28 | 2020-12-08 | 西安电子科技大学 | Target tracking method based on bounding box regression model |
CN112053384B (en) * | 2020-08-28 | 2022-12-02 | 西安电子科技大学 | Target tracking method based on bounding box regression model |
CN112232359B (en) * | 2020-09-29 | 2022-10-21 | 中国人民解放军陆军炮兵防空兵学院 | Visual tracking method based on mixed level filtering and complementary characteristics |
CN112232359A (en) * | 2020-09-29 | 2021-01-15 | 中国人民解放军陆军炮兵防空兵学院 | Visual tracking method based on mixed level filtering and complementary characteristics |
CN112861652A (en) * | 2021-01-20 | 2021-05-28 | 中国科学院自动化研究所 | Method and system for tracking and segmenting video target based on convolutional neural network |
CN112966581B (en) * | 2021-02-25 | 2022-05-27 | 厦门大学 | Video target detection method based on internal and external semantic aggregation |
CN112966581A (en) * | 2021-02-25 | 2021-06-15 | 厦门大学 | Video target detection method based on internal and external semantic aggregation |
CN113298142A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method based on deep space-time twin network |
CN113298142B (en) * | 2021-05-24 | 2023-11-17 | 南京邮电大学 | Target tracking method based on depth space-time twin network |
CN117237402A (en) * | 2023-11-15 | 2023-12-15 | 北京中兵天工防务技术有限公司 | Target motion prediction method and system based on semantic information understanding |
CN117237402B (en) * | 2023-11-15 | 2024-02-20 | 北京中兵天工防务技术有限公司 | Target motion prediction method and system based on semantic information understanding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458864A (en) | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects | |
Liu et al. | ABNet: Adaptive balanced network for multiscale object detection in remote sensing imagery | |
Baheti et al. | Eff-unet: A novel architecture for semantic segmentation in unstructured environment | |
Oršić et al. | Efficient semantic segmentation with pyramidal fusion | |
Yang et al. | Deeperlab: Single-shot image parser | |
Si et al. | Real-time semantic segmentation via multiply spatial fusion network | |
Raza et al. | Appearance based pedestrians’ head pose and body orientation estimation using deep learning | |
CN111598030A (en) | Method and system for detecting and segmenting vehicle in aerial image | |
CN108537824B (en) | Feature map enhanced network structure optimization method based on alternating deconvolution and convolution | |
Zhang et al. | Domain adaptive yolo for one-stage cross-domain detection | |
CN107463892A (en) | Pedestrian detection method in a kind of image of combination contextual information and multi-stage characteristics | |
CN111611895B (en) | OpenPose-based multi-view human skeleton automatic labeling method | |
Weng et al. | Deep multi-branch aggregation network for real-time semantic segmentation in street scenes | |
Lu et al. | A cnn-transformer hybrid model based on cswin transformer for uav image object detection | |
Weidmann et al. | A closer look at seagrass meadows: Semantic segmentation for visual coverage estimation | |
CN112733590A (en) | Pedestrian re-identification method based on second-order mixed attention | |
CN110517270A (en) | A kind of indoor scene semantic segmentation method based on super-pixel depth network | |
CN112288776A (en) | Target tracking method based on multi-time step pyramid codec | |
Safavi et al. | Comparative study of real-time semantic segmentation networks in aerial images during flooding events | |
Yu et al. | Frequency feature pyramid network with global-local consistency loss for crowd-and-vehicle counting in congested scenes | |
Zhang et al. | From Coarse Attention to Fine-Grained Gaze: A Two-stage 3D Fully Convolutional Network for Predicting Eye Gaze in First Person Video. | |
Sun et al. | An integration–competition network for bridge crack segmentation under complex scenes | |
Noman et al. | ELGC-Net: Efficient Local-Global Context Aggregation for Remote Sensing Change Detection | |
CN105956607B (en) | A kind of improved hyperspectral image classification method | |
CN117576149A (en) | Single-target tracking method based on attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |