CN110458864A - Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects - Google Patents
Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects Download PDFInfo
- Publication number
- CN110458864A CN110458864A CN201910590225.XA CN201910590225A CN110458864A CN 110458864 A CN110458864 A CN 110458864A CN 201910590225 A CN201910590225 A CN 201910590225A CN 110458864 A CN110458864 A CN 110458864A
- Authority
- CN
- China
- Prior art keywords
- target
- frame
- network
- semantic knowledge
- tracking method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 28
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 19
- 238000012549 training Methods 0.000 claims abstract description 14
- 238000003062 neural network model Methods 0.000 claims abstract description 7
- 238000013461 design Methods 0.000 claims description 5
- 230000004807 localization Effects 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 3
- 230000006870 function Effects 0.000 description 6
- 230000000007 visual effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 101100153586 Caenorhabditis elegans top-1 gene Proteins 0.000 description 1
- 101100537629 Caenorhabditis elegans top-2 gene Proteins 0.000 description 1
- 241000282326 Felis catus Species 0.000 description 1
- 101100370075 Mus musculus Top1 gene Proteins 0.000 description 1
- 101150107801 Top2a gene Proteins 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
- G06T7/251—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2210/00—Indexing scheme for image generation or computer graphics
- G06T2210/22—Cropping
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The present invention provides a kind of based on the method for tracking target and target tracker of integrating semantic knowledge and example aspects.Described method includes following steps: extracting the picture of the 1st, t-1, t frame;Cut in step 1 the 1st, the picture of t-1, t frame, take input of the picture after cutting as convolutional neural networks;The neural network model based on Darknet-19 is constructed, and carries out trickle amendment on its backbone network;The entire tracker convolutional neural networks of training;Finally, the model performance of assessment training.Regression problem and directly the prediction target location coordinate that is passed to frame are modeled as the present invention is based on proposing a kind of new network architecture model on Darknet-19, while by Target Tracking Problem.The model of present invention training is for specific object type, it is achieved that state-of-the-art performance and speed is very fast.
Description
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a target tracking method and a target tracker based on integrated semantic knowledge and example features.
Background
As an important component of a large number of computer vision systems, target tracking technology has attracted research interest to a number of researchers. In the last decade, deep learning based methods have shown great power in the field of target tracking. Typical deep network structures, such as Convolutional Neural Networks (CNNs), can extract representative visual features in end-to-end training. Unlike traditional representations of hand-made features, this description of image data can save rich knowledge into the model to track drastic changes in the target. Therefore, the best target trackers, such as Visual Object Tracking (VOT), object tracking fiducial (OTB), etc., are all methods based on deep learning.
Unlike object detection or recognition, current research in object tracking focuses primarily on example features of the object rather than semantic knowledge. However, the human eye acts as a high performance tracker that captures both low levels of visual features and high levels of semantic knowledge. When the human eye attempts to track a car, the features that are seen are always considered part of the average car. When detailed instance features (e.g., jitter, occlusion, or perspective change, etc.) are not present, these a priori knowledge plays a key role in challenging conditions.
When a series of targets including pedestrians, vehicles and the like are processed, although the positions of the targets can be directly predicted by adopting a Region Proposal Network (RPN) structure, the targets only realize regression of different anchor points without any semantic assumption.
In view of the above, there is a need to design a target tracking method based on integrating semantic knowledge and instance features to solve the above problems.
Disclosure of Invention
The invention provides a method for tracking a target based on integrated semantic knowledge and example characteristics, which aims to solve the problem that a general target tracker only focuses on example characteristics of the target and ignores semantic prior knowledge. The method provides a new network architecture model based on Darknet-19, models a target tracking problem as a regression problem and directly predicts the target position coordinates of an incoming frame.
To achieve the above object, the present invention provides a method comprising the steps of:
step 1: extracting pictures of the 1 st frame, the t-1 th frame and the t th frame;
step 2: cutting the pictures of the 1 st, t-1 st and t frames in the step 1, and taking the cut pictures as the input of the convolutional neural network;
and step 3: constructing a neural network model based on Darknet-19, and slightly modifying the backbone network of the neural network model;
and 4, step 4: training the whole tracker convolutional neural network;
and 5: the trained model performance is evaluated.
The invention further improves the method, and before the step 4, the method also comprises a step 3.1, wherein the step 3.1 is to design a network output, and the network output comprises a classification branch and a regression branch; before step 5, a step 4.1 is also included, wherein the step 4.1 is to design a network loss function.
In step 1, the 1 st frame of picture is selected as a standard template containing the target, and the t th frame of picture is selected as a candidate area where the target may appear.
A further refinement of the invention consists in that in step 2 a standard template comprising the object is extracted for initialization.
A further improvement of the present invention is that, assuming that the size of the real bounding box is (w, h), the input 1 st frame of picture is cropped by the size S around the center of the target to obtain an example image, which is always used as a standard template throughout the tracking process, when the margin information satisfies the following relationship:
s2=(3w)*(3h) (1)。
a further improvement of the invention is that in step 3, based on the original structure of Darknet-19, three convolutional layers and two fully-connected layers are used instead of global pooling for classification and localization, respectively.
In a further development of the invention, in the t-th frame, the network outputs a fractional vector w from the fully-connected layert∈RKAs a result of the classification of the target, K is the number of classes, this vector reflecting the likelihood of the corresponding object appearing in the line of sight; at the same time, the network outputsAs a deformation prediction for each class target, assume that the t-1 frame output bounding box is pt-1=(xt-1,yt-1,wt-1,ht-1) Where x, y are the center coordinates of the box, w and h are the width and height of the box, and the regression of the deformation for class kConsists of four coordinates:
wherein,representing the deformation of the target under different semantic assumptions. Last result p of t-th frametCan be calculated from:
the invention is further improved in that a cross-entropy loss function is adopted for the classification loss w, and an L1 loss function is adopted for the bounding box regression loss d:
wherein,being the true deformation of the second input to the third input, the L1 penalty is higher for slight errors in the predicted bounding box and the true bounding box. The trained model thus has a more stable bounding box.
A further development of the invention is that step 4 comprises a first stage: pre-training 10 epochs of a backbone network on an ImageNet classification dataset, and taking an original image as a first input and a standard random contrast enhanced image and color change as a second input and a third input; and a second stage: and training the whole target tracking network to obtain a training model.
In order to achieve the purpose of the invention, the invention also provides a target tracker for realizing the method.
The invention has the following beneficial effects: the invention integrates semantic knowledge and example characteristics to track the target, provides a new network architecture model based on Darknet-19, models the target tracking problem as a regression problem and directly predicts the target position coordinates of an incoming frame, and realizes high accuracy and execution efficiency of the tracker in daily tracking tasks.
Drawings
FIG. 1 shows the 1 st, t-1 st and t frames of the extracted picture.
FIG. 2 is a diagram of the three pictures in FIG. 1 being cropped to include a target.
Fig. 3 is a convolutional neural network model.
Fig. 4 shows two output branches of a convolutional neural network.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
It should be emphasized that in describing the present invention, various formulas and constraints are identified with consistent labels, but the use of different labels to identify the same formula and/or constraint is not precluded and is provided for the purpose of more clearly illustrating the features of the present invention.
The CNN target tracking model provided by the invention is trained on a mixed data set (ImageNet VID and ALOV300+ +). The ImageNet VID contains 30 different classes of targets, and we pick the commonly used 8 classes including: airplanes, bicycles, birds, buses, cars, cats, horses, and motorcycles. Since there are no pedestrians in this dataset, the pedestrians were selected from the ALOV300+ +, and finally a mixed dataset containing 9 classes was formed.
As shown in fig. 1, the present invention first extracts three pictures of a video frame as input of a network, and then obtains three target pictures input to a CNN network by cropping. As shown in fig. 2, the three target images are subjected to a CNN convolutional neural network model (as shown in fig. 3) to extract features, and two branches are finally output as shown in fig. 4, one branch is a classification branch and is used for distinguishing the category of the target, and the other branch is a regression branch and is used for regression of the bounding box.
Table 1 shows detailed parameters of the CNN network structure designed by the present invention.
As shown in table 1, the present invention fine-tunes the Darknet-19 network model, uses three convolutional layers and two fully-connected layers instead of global pooling for classification and localization, respectively, and fine-tunes the above-mentioned hybrid video data set. The first and t-1 frames are extracted every 100 frames in a video sequence. For data enhancement, a transformation to the real bounding box is performed at the t-th frame using a Gaussian distribution. This model iterates over 50 times on a 4-block NVIDIA Tesla P40 GPU, with 800 batches (512 samples) per iteration.
Specifically, the method for tracking the target based on the integrated semantic knowledge and the example characteristics comprises the following steps:
step 1, extracting pictures of the 1 st, t-1 st and t frames as input:
the first picture selects the first frame as a standard template of the target, the second picture is selected from the t-1 th frame, and the last picture is selected from a candidate area in which the target may appear in the current frame.
Step 2: and (3) cutting the three pictures input in the step (1) to enable the three pictures to comprise targets:
in the first frame, a standard template of the extraction target is initialized. Assuming that the size of the real bounding box is (w, h), the picture is cropped around the center of the target by the size S, the square region provides an example image and the contextual edge distance information satisfies the following relationship:
s2=(3w)*(3h) (1)
this example image is the first input to the CNN network and is 288 × 288 in size. This example image is used as a template throughout the tracking process. The hyperparameter 3 in equation (1) is retained from the video statistics in the VID dataset. This configuration contains the motion of almost all objects in adjacent frames while ensuring an acceptable resolution value after scaling.
Assume that the tracking result of the t-1 th frame is pt-1In the t-1 th frame and the t-th frame with (x)t-1,yt-1) Cutting at the center to obtain a cut size of (3 w)t-1,3ht-1) The picture after cropping is also 288 × 288 and serves as both the second and third inputs to the CNN network. Note that the ratio of objects in the first frame is to be preserved because it encodes the features of the template object. Conversely, the target of the t-1 frame is expanded to a size of 96 pixels to facilitate the CNN network to learn the bounding box regression more efficiently by normalizing the deformation between frames.
And step 3: constructing a neural network model based on Darknet-19, and slightly modifying the backbone network of the neural network model:
to balance model capacity and efficiency, Darknet-19 was developed as a backbone network. Darknet-19 has been shown to enable high performance in relevant target detection tasks. The model consists of convolution filters of 3 x 3 and 1 x 1, connected between different scales using maximal pooling, doubling the number of channels per scale. The model performs very well in tasks such as object classification and localization and uses relatively few parameters. Based on the original structure of Darknet-19, the present invention uses three convolutional layers and two fully-connected layers instead of global pooling for classification and localization, respectively. Table 1 lists the detailed network architecture.
And 4, designing network output, including classification and regression branches:
in the t-th frame, the network outputs a fractional vector w from the fully-connected layert∈RKK is the number of classes as a result of the classification of the target, this vector reflecting the likelihood that the corresponding object appears in the line of sight. At the same time, the network outputsAs a deformation prediction for each class target, assume that the t-1 frame output bounding box is pt-1=(xt-1,yt-1,wt-1,ht-1) Where x, y are the center coordinates of the box, w and h are the width and height of the box, and the regression of the deformation for class kConsists of four coordinates:
representing the deformation of the target under different semantic assumptions. Last result p of t-th frametCan be calculated from:
and 5: designing a network loss function:
the cross entropy loss function is used for the classification loss w, and the L1 loss function is used for the bounding box regression loss d:
wherein,being the true deformation of the second input to the third input, the L1 penalty is higher for slight errors in the predicted bounding box and the true bounding box. The trained model thus has a more stable bounding box.
Step 6: training the convolutional neural network model of the tracker:
the first stage is as follows: the backbone network was pre-trained on ImageNet classification datasets for 10 epochs, using the original image as the first input, and standard random contrast enhanced images and color variations as the second and third inputs. The network achieves 72.5% top-1 accuracy and 91.0% top-2 accuracy in ImageNet.
And a second stage: and training the whole target tracking network to obtain a training model.
And 7: evaluating the performance of the training model:
the trained model is evaluated on a sub data set of the VOT 2016, which has 15 video sequences.
The invention integrates semantic knowledge and example characteristics to track the target, provides a new network architecture model based on Darknet-19, models the target tracking problem as a regression problem and directly predicts the target position coordinates of an incoming frame, and realizes high accuracy and execution efficiency of the tracker in daily tracking tasks.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.
Claims (10)
1. A target tracking method based on integrated semantic knowledge and instance features is characterized by comprising the following steps:
step 1: extracting pictures of the 1 st frame, the t-1 th frame and the t th frame;
step 2: cutting the pictures of the 1 st, t-1 st and t frames in the step 1, and taking the cut pictures as the input of the convolutional neural network;
and step 3: constructing a neural network model based on Darknet-19, and slightly modifying the backbone network of the neural network model;
and 4, step 4: training the whole tracker convolutional neural network;
and 5: the trained model performance is evaluated.
2. The integrated semantic knowledge and instance feature based target tracking method of claim 1, characterized in that: before step 4, a step 3.1 is also included, wherein the step 3.1 is to design a network output, and the network output comprises a classification branch and a regression branch; before step 5, a step 4.1 is also included, wherein the step 4.1 is to design a network loss function.
3. The integrated semantic knowledge and instance feature based target tracking method of claim 1, characterized in that: in step 1, the 1 st frame of picture is selected as a standard template containing a target, and the t-th frame of picture is selected as a candidate area where the target may appear.
4. The integrated semantic knowledge and instance feature based target tracking method of claim 3, wherein: in step 2, a standard template including the target is extracted for initialization.
5. The integrated semantic knowledge and instance feature based target tracking method of claim 4, wherein: assuming that the size of the real bounding box is (w, h), the input 1 st frame of picture is cropped by the size S around the center of the target to obtain an example image, which is always used as a standard template in the whole tracking process, and the edge distance information satisfies the following relation:
s2=(3w)*(3h) (1)。
6. the integrated semantic knowledge and instance feature based target tracking method of claim 1, characterized in that: in step 3, three convolutional layers and two fully-connected layers are used instead of global pooling for classification and localization, respectively, based on the original structure of Darknet-19.
7. The integrated semantic knowledge and instance feature based target tracking method of claim 2, wherein: in the t-th frame, the network outputs a fractional vector w from the fully-connected layert∈RKTo serve as the purposeThe target classification result, K is the number of classes, this vector reflects the probability that the corresponding object appears in the line of sight; at the same time, the network outputsAs a deformation prediction for each class target, assume that the t-1 frame output bounding box is pt-1=(xt-1,yt-1,wt-1,ht-1) Where x, y are the center coordinates of the box, w and h are the width and height of the box, and the regression of the deformation for class kConsists of four coordinates:
wherein,representing the deformation of the target under different semantic assumptions. Last result p of t-th frametCan be calculated from:
8. the integrated semantic knowledge and instance feature based target tracking method of claim 7, wherein: the cross entropy loss function is used for the classification loss w, and the L1 loss function is used for the bounding box regression loss d:
wherein,is the second input toThe true deformation of the three inputs, the L1 loss, penalizes the predicted bounding box and the true bounding box slightly more. The trained model thus has a more stable bounding box.
9. The integrated semantic knowledge and instance feature based object tracking method of claim 8, wherein: step 4 comprises two stages: the first stage is to pre-train 10 epochs of the backbone network on the ImageNet classification dataset, and adopt the original image as the first input and adopt the standard random contrast enhanced image and color variation as the second and third inputs; the second stage is to train the whole target tracking network to obtain a training model.
10. An object tracker implementing the method of any one of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910590225.XA CN110458864A (en) | 2019-07-02 | 2019-07-02 | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910590225.XA CN110458864A (en) | 2019-07-02 | 2019-07-02 | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110458864A true CN110458864A (en) | 2019-11-15 |
Family
ID=68482051
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910590225.XA Pending CN110458864A (en) | 2019-07-02 | 2019-07-02 | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458864A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105442A (en) * | 2019-12-23 | 2020-05-05 | 中国科学技术大学 | Switching type target tracking method |
CN111428567A (en) * | 2020-02-26 | 2020-07-17 | 沈阳大学 | Pedestrian tracking system and method based on affine multi-task regression |
CN112053384A (en) * | 2020-08-28 | 2020-12-08 | 西安电子科技大学 | Target tracking method based on bounding box regression model |
CN112232359A (en) * | 2020-09-29 | 2021-01-15 | 中国人民解放军陆军炮兵防空兵学院 | Visual tracking method based on mixed level filtering and complementary characteristics |
CN112861652A (en) * | 2021-01-20 | 2021-05-28 | 中国科学院自动化研究所 | Method and system for tracking and segmenting video target based on convolutional neural network |
CN112966581A (en) * | 2021-02-25 | 2021-06-15 | 厦门大学 | Video target detection method based on internal and external semantic aggregation |
CN113298142A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method based on deep space-time twin network |
CN117237402A (en) * | 2023-11-15 | 2023-12-15 | 北京中兵天工防务技术有限公司 | Target motion prediction method and system based on semantic information understanding |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN108027972A (en) * | 2015-07-30 | 2018-05-11 | 北京市商汤科技开发有限公司 | System and method for Object tracking |
CN109255351A (en) * | 2018-09-05 | 2019-01-22 | 华南理工大学 | Bounding box homing method, system, equipment and medium based on Three dimensional convolution neural network |
CN109543754A (en) * | 2018-11-23 | 2019-03-29 | 中山大学 | The parallel method of target detection and semantic segmentation based on end-to-end deep learning |
-
2019
- 2019-07-02 CN CN201910590225.XA patent/CN110458864A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108027972A (en) * | 2015-07-30 | 2018-05-11 | 北京市商汤科技开发有限公司 | System and method for Object tracking |
CN106709936A (en) * | 2016-12-14 | 2017-05-24 | 北京工业大学 | Single target tracking method based on convolution neural network |
CN106845430A (en) * | 2017-02-06 | 2017-06-13 | 东华大学 | Pedestrian detection and tracking based on acceleration region convolutional neural networks |
CN109255351A (en) * | 2018-09-05 | 2019-01-22 | 华南理工大学 | Bounding box homing method, system, equipment and medium based on Three dimensional convolution neural network |
CN109543754A (en) * | 2018-11-23 | 2019-03-29 | 中山大学 | The parallel method of target detection and semantic segmentation based on end-to-end deep learning |
Non-Patent Citations (2)
Title |
---|
SHAOQING REN等: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE(VOL.39,NO.6)》 * |
SHAOQING REN等: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", 《EEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE(VOL.39,NO.6)》, 30 June 2017 (2017-06-30), pages 1137 - 1149, XP055705510, DOI: 10.1109/TPAMI.2016.2577031 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105442B (en) * | 2019-12-23 | 2022-07-15 | 中国科学技术大学 | Switching type target tracking method |
CN111105442A (en) * | 2019-12-23 | 2020-05-05 | 中国科学技术大学 | Switching type target tracking method |
CN111428567A (en) * | 2020-02-26 | 2020-07-17 | 沈阳大学 | Pedestrian tracking system and method based on affine multi-task regression |
CN111428567B (en) * | 2020-02-26 | 2024-02-02 | 沈阳大学 | Pedestrian tracking system and method based on affine multitask regression |
CN112053384A (en) * | 2020-08-28 | 2020-12-08 | 西安电子科技大学 | Target tracking method based on bounding box regression model |
CN112053384B (en) * | 2020-08-28 | 2022-12-02 | 西安电子科技大学 | Target tracking method based on bounding box regression model |
CN112232359B (en) * | 2020-09-29 | 2022-10-21 | 中国人民解放军陆军炮兵防空兵学院 | Visual tracking method based on mixed level filtering and complementary characteristics |
CN112232359A (en) * | 2020-09-29 | 2021-01-15 | 中国人民解放军陆军炮兵防空兵学院 | Visual tracking method based on mixed level filtering and complementary characteristics |
CN112861652A (en) * | 2021-01-20 | 2021-05-28 | 中国科学院自动化研究所 | Method and system for tracking and segmenting video target based on convolutional neural network |
CN112966581B (en) * | 2021-02-25 | 2022-05-27 | 厦门大学 | Video target detection method based on internal and external semantic aggregation |
CN112966581A (en) * | 2021-02-25 | 2021-06-15 | 厦门大学 | Video target detection method based on internal and external semantic aggregation |
CN113298142A (en) * | 2021-05-24 | 2021-08-24 | 南京邮电大学 | Target tracking method based on deep space-time twin network |
CN113298142B (en) * | 2021-05-24 | 2023-11-17 | 南京邮电大学 | Target tracking method based on depth space-time twin network |
CN117237402A (en) * | 2023-11-15 | 2023-12-15 | 北京中兵天工防务技术有限公司 | Target motion prediction method and system based on semantic information understanding |
CN117237402B (en) * | 2023-11-15 | 2024-02-20 | 北京中兵天工防务技术有限公司 | Target motion prediction method and system based on semantic information understanding |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110458864A (en) | Based on the method for tracking target and target tracker for integrating semantic knowledge and example aspects | |
CN111210443B (en) | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance | |
Gu et al. | A review on 2D instance segmentation based on deep neural networks | |
Garcia-Garcia et al. | A survey on deep learning techniques for image and video semantic segmentation | |
Zhou et al. | Contextual ensemble network for semantic segmentation | |
CN112446398B (en) | Image classification method and device | |
Xiong et al. | DP-LinkNet: A convolutional network for historical document image binarization | |
CN107967484B (en) | Image classification method based on multi-resolution | |
CN112507777A (en) | Optical remote sensing image ship detection and segmentation method based on deep learning | |
Guo et al. | A survey on deep learning based approaches for scene understanding in autonomous driving | |
CN110738207A (en) | character detection method for fusing character area edge information in character image | |
CN108898145A (en) | A kind of image well-marked target detection method of combination deep learning | |
Girisha et al. | Performance analysis of semantic segmentation algorithms for finely annotated new uav aerial video dataset (manipaluavid) | |
Wulamu et al. | Multiscale road extraction in remote sensing images | |
CN111612008A (en) | Image segmentation method based on convolution network | |
Hara et al. | Towards good practice for action recognition with spatiotemporal 3d convolutions | |
Chen et al. | Corse-to-fine road extraction based on local Dirichlet mixture models and multiscale-high-order deep learning | |
CN112489050A (en) | Semi-supervised instance segmentation algorithm based on feature migration | |
CN115984172A (en) | Small target detection method based on enhanced feature extraction | |
Behera et al. | Superpixel-based multiscale CNN approach toward multiclass object segmentation from UAV-captured aerial images | |
Tao et al. | Exploiting web images for weakly supervised object detection | |
CN113177503A (en) | Arbitrary orientation target twelve parameter detection method based on YOLOV5 | |
CN113297959A (en) | Target tracking method and system based on corner attention twin network | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
US12033307B2 (en) | System and methods for multiple instance segmentation and tracking |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191115 |