CN111724409A - Target tracking method based on densely connected twin neural network - Google Patents
Target tracking method based on densely connected twin neural network Download PDFInfo
- Publication number
- CN111724409A CN111724409A CN202010420205.0A CN202010420205A CN111724409A CN 111724409 A CN111724409 A CN 111724409A CN 202010420205 A CN202010420205 A CN 202010420205A CN 111724409 A CN111724409 A CN 111724409A
- Authority
- CN
- China
- Prior art keywords
- target
- branch
- frame
- tracking
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 21
- 238000000034 method Methods 0.000 title claims abstract description 18
- 238000001514 detection method Methods 0.000 claims abstract description 27
- 238000012216 screening Methods 0.000 claims abstract description 6
- 230000005764 inhibitory process Effects 0.000 claims abstract description 5
- 238000005259 measurement Methods 0.000 claims abstract description 5
- 230000008569 process Effects 0.000 claims description 7
- 230000004044 response Effects 0.000 claims description 6
- 230000008859 change Effects 0.000 claims description 4
- OLBCVFGFOZPWHH-UHFFFAOYSA-N propofol Chemical compound CC(C)C1=CC=CC(C(C)C)=C1O OLBCVFGFOZPWHH-UHFFFAOYSA-N 0.000 claims description 4
- 229960004134 propofol Drugs 0.000 claims description 4
- 238000006073 displacement reaction Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 2
- 238000012545 processing Methods 0.000 abstract description 3
- 230000000694 effects Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000004660 morphological change Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
A target tracking method based on a densely connected twin neural network comprises the following steps: 1) extracting the features of the input template frame and the detection frame image pair: extracting features of the input image for a twin neural network of the DenseNet by using a backbone network; 2) generating a candidate region: inputting the feature map into an RPN network, and obtaining a plurality of proposals through classification branches and regression branches; screening candidate areas: screening out final proposals by using cosine window inhibition and scale punishment; 3) and (3) carrying out similarity measurement on the propulses obtained by the template branch and the detection branch, obtaining a final bounding box by using non-maximum inhibition, and updating the image size through linear interpolation to obtain a final predicted tracking position of the target in the detection frame. The invention achieves better effect in processing the tracking of the specific target in the video sequence.
Description
Technical Field
The invention relates to a target tracking task in computer vision, which is a twin neural network target tracking framework, can perform a tracking and positioning function on a video sequence, and belongs to the field of image processing.
Background
The target tracking based on the vision is an important branch of a plurality of tasks in the computer vision, and is widely applied to the fields of intelligent security, intelligent traffic, automatic driving and the like. The task of target tracking is to predict the size and position of a target in a subsequent frame under the condition of giving the size and position of the target in an initial frame of a certain video sequence, and realize analysis and understanding of the behavior of the moving target so as to complete a subsequent higher-level task. Researchers have explored the target tracking technology for many years, and the performance of the target tracking algorithm is gradually improved, but there is still a great challenge to realize real-time and stable tracking of moving targets under complex conditions, and the challenges faced are mainly: morphological changes, scale changes, occlusion and disappearance, image blurring, and the like.
With the application of deep learning in the field of target tracking, the twin neural network-based algorithm draws a great deal of attention by virtue of its good performance and speed. Tao et al introduced a twin neural network to target tracking for the first time, which achieved a good effect on the OTB50 dataset, but was limited by the amount of candidate sampling processing calculations, with a speed of only 2FPS, and was unable to run in real time. Bertonitto et al reduces the amount of computation by using fully connected layers, and only five convolutional layers are used to train the similarity function, which achieves good tracking results in real-time operation, but the accuracy still cannot exceed that of algorithms using correlation filtering, such as ECO.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a target tracking method based on a dense connection twin neural network, which is different from the existing twin neural network target tracking framework in that a deeper and better dense connection network DenseNet is introduced to replace AlexNet as a backbone network for feature extraction of the twin neural network, and the performance of the network is further improved. The invention creatively applies the deep convolutional neural network to the twin target tracking network, and effectively improves the target tracking precision.
The technical scheme adopted by the invention for solving the technical problems is as follows:
a target tracking method based on a densely connected twin neural network comprises the following steps:
step 1: features are extracted using a twin neural network, as follows:
step 1.1: the network consists of three DenseNet blocks, wherein a 1-by-1 convolution layer and a 2-by-2 mean-pooling layer are connected between every two adjacent DenseBlock;
step 1.2: each Dense Block consists of a Batch Normalization layer, a ReLU layer and a convolution layer;
step 1.3: respectively inputting an initial frame of a video sequence as a template frame and a current frame as a detection frame into a twin neural network to obtain feature maps of the template frame and the detection frame;
step 2: inputting the feature map into an RPN network to generate a candidate region, wherein the process is as follows:
step 2.1: the RPN consists of two parts, wherein one part is a classification branch for distinguishing a target and a background, and the other part is a regression branch for fine-tuning a candidate region;
step 2.2: for the classification branch, the RPN network receives the feature maps of the template frame and the detection frame generated in the step 1, and uses a new convolution kernel to perform convolution operation on the template frame and the detection frame, so that the feature maps are reduced and simultaneously a new feature map is generated;
step 2.3: performing convolution operation on the detection frame characteristic graph by taking the template frame characteristic graph as a convolution kernel to obtain a response graph;
step 2.4: for the regression branch, executing the same operation to obtain a response graph, extracting the output of M before scoring according to the output of the classification branch and the regression branch, and obtaining the position information of the M proposals;
and step 3: determining a tracking position by the following process:
step 3.1: performing similarity measurement on proposals of the template branch and proposals of the detection branch to obtain a bounding box of a tracking result;
step 3.2: screening the bounding box which is finally predicted and output by using non-maximum suppression (NMS) to obtain a target bounding box which is finally tracked;
step 3.3: updating the size of the target by using linear interpolation to obtain the position of the target finally obtained by tracking on the detection frame;
through the operation of the steps, the specific target in the video sequence can be accurately tracked.
Further, in 2.4, anchorars within 7 units around the target are selected instead of anchorars of the whole feature map, propofol farther from the target is deleted, a cosine window is used for suppressing the maximum displacement, and a scale change penalty is used for suppressing large changes in size and proportion, so that five propofol with higher final scores are obtained.
The invention has the following beneficial effects: the invention provides a target tracking method based on a densely connected twin neural network. Firstly, extracting the characteristics of a template frame and a detection frame by using a twin neural network with a dense connection network DenseNet as a backbone network, then generating five propulses through an RPN network, carrying out similarity measurement on the propulses obtained by the template branch and the detection branch to obtain a predicted target bounding box, screening the predicted bounding box by using non-maximum inhibition to obtain a final target bounding box, and finally updating the target size by using linear interpolation to obtain the specific position of a target in the detection frame. The invention can realize accurate tracking of specific targets in the video sequence.
Drawings
Fig. 1 is a schematic diagram of an algorithm framework of the tracking method of the present invention.
FIG. 2 is a schematic diagram of the overall structure of a densely connected twin tracking network.
FIG. 3 is a schematic diagram of the Dense Block structure.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Referring to fig. 1 to 3, a target tracking method based on a dense connection twin neural network includes three processes of feature extraction, candidate region generation and tracking position determination. FIG. 2 is an overall structure of a densely connected twin tracking network, which is divided into two parts, namely a twin network feature extraction part and an RPN network generation candidate region. FIG. three is a schematic diagram of the Dense Block structure.
The target tracking method based on the dense connection twin neural network comprises the following steps:
step 1: features are extracted using a twin neural network, as follows:
step 1.1: the network is mainly based on DenseNet and consists of an upper part and a lower part, wherein the two parts share weight and a network structure and respectively receive the input of template frames and detection frame images;
step 1.2: the main structure of each part is DenseNet using three Dense blocks, and a 1 × 1 convolution layer and a 2 × 2 mean-void layer are connected between two adjacent Dense blocks;
step 1.3: each Dense Block consists of three parts, namely a Batch Normalization layer, a ReLU layer and a convolution layer;
step 1.4: the template frame and the detection frame respectively obtain respective feature maps through the twin network structure;
step 2: inputting the feature map into an RPN network to generate a candidate region, wherein the process is as follows:
step 2.1: the RPN consists of two parts, wherein one part is a classification branch for distinguishing a target and a background, and the other part is a regression branch for fine-tuning a candidate region;
step 2.2: for the classification branch, the RPN network receives the feature maps of the template frame and the detection frame generated in the step 1, and uses a new convolution kernel to perform convolution operation on the template frame and the detection frame, so that the feature maps are reduced and simultaneously a new feature map is generated;
step 2.3: performing convolution operation on the detection frame characteristic graph by taking the template frame characteristic graph as a convolution kernel to obtain a response graph;
step 2.4: for the regression branch, executing the same operation to obtain a response graph, extracting the output of M before scoring according to the output of the classification branch and the regression branch, and obtaining the position information of the M proposals;
propusals was selected according to the following strategy; : selecting anchors within 7 units around the target instead of anchors of the whole feature map, deleting the proposals far away from the target, using a cosine window to suppress the maximum displacement, and using scale change penalty to suppress large changes in size and proportion, to obtain five proposals with higher final scores;
and step 3: determining a tracking position by the following process:
step 3.1: carrying out similarity measurement on proposals obtained by the template branch and the detection branch to obtain a bounding box of a tracking result;
step 3.2: screening a plurality of predicted bounding boxes by using non-maximum inhibition NMS to obtain a final tracking result bounding box;
step 3.3: updating the size of the template by using linear interpolation to obtain a target position finally obtained by tracking on the detection frame;
through the operation of the steps, the selected target in the video sequence can be accurately tracked.
The above detailed description is intended to illustrate the objects, aspects and advantages of the present invention, and it should be understood that the above detailed description is only exemplary of the present invention, and is not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (2)
1. A target tracking method based on a densely connected twin neural network is characterized by comprising the following steps:
step 1: features are extracted using a twin neural network, as follows:
step 1.1: the network consists of three DenseNet blocks, wherein a 1-by-1 convolution layer and a 2-by-2 mean-pooling layer are connected between every two adjacent DenseBlock;
step 1.2: each Dense Block consists of a Batch Normalization layer, a ReLU layer and a convolution layer;
step 1.3: respectively inputting an initial frame of a video sequence as a template frame and a current frame as a detection frame into a twin neural network to obtain feature maps of the template frame and the detection frame;
step 2: inputting the feature map into an RPN network to generate a candidate region, wherein the process is as follows:
step 2.1: the RPN consists of two parts, wherein one part is a classification branch for distinguishing a target and a background, and the other part is a regression branch for fine-tuning a candidate region;
step 2.2: for the classification branch, the RPN network receives the feature maps of the template frame and the detection frame generated in the step 1, and uses a new convolution kernel to perform convolution operation on the template frame and the detection frame, so that the feature maps are reduced and simultaneously a new feature map is generated;
step 2.3: performing convolution operation on the detection frame characteristic graph by taking the template frame characteristic graph as a convolution kernel to obtain a response graph;
step 2.4: for the regression branch, executing the same operation to obtain a response graph, extracting the output of M before scoring according to the output of the classification branch and the regression branch, and obtaining the position information of the M proposals;
and step 3: determining a tracking position by the following process:
step 3.1: performing similarity measurement on proposals of the template branch and proposals of the detection branch to obtain a bounding box of a tracking result;
step 3.2: screening the bounding box which is finally predicted and output by using the non-maximum inhibition NMS to obtain a finally tracked target bounding box;
step 3.3: updating the size of the target by using linear interpolation to obtain the position of the target finally obtained by tracking on the detection frame;
through the operation of the steps, the accurate tracking of the specific target in the video sequence can be realized.
2. The method as claimed in claim 1, wherein in 2.4, anchors within 7 units around the target are selected instead of anchors of the whole feature map, propofol farther away from the target is deleted, cosine window is used to suppress maximum displacement, and scale change is used to suppress large change in size and proportion, resulting in five propofol with higher final scores.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010420205.0A CN111724409A (en) | 2020-05-18 | 2020-05-18 | Target tracking method based on densely connected twin neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010420205.0A CN111724409A (en) | 2020-05-18 | 2020-05-18 | Target tracking method based on densely connected twin neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111724409A true CN111724409A (en) | 2020-09-29 |
Family
ID=72564648
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010420205.0A Pending CN111724409A (en) | 2020-05-18 | 2020-05-18 | Target tracking method based on densely connected twin neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111724409A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112365462A (en) * | 2020-11-06 | 2021-02-12 | 华雁智科(杭州)信息技术有限公司 | Image-based change detection method |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180129906A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Deep cross-correlation learning for object tracking |
CN109829934A (en) * | 2018-12-20 | 2019-05-31 | 北京以萨技术股份有限公司 | A kind of novel image tracking algorithm based on twin convolutional network |
CN110188753A (en) * | 2019-05-21 | 2019-08-30 | 北京以萨技术股份有限公司 | One kind being based on dense connection convolutional neural networks target tracking algorism |
CN110309729A (en) * | 2019-06-12 | 2019-10-08 | 武汉科技大学 | Tracking and re-detection method based on anomaly peak detection and twin network |
CN110443210A (en) * | 2019-08-08 | 2019-11-12 | 北京百度网讯科技有限公司 | A kind of pedestrian tracting method, device and terminal |
CN110544269A (en) * | 2019-08-06 | 2019-12-06 | 西安电子科技大学 | twin network infrared target tracking method based on characteristic pyramid |
CN110570458A (en) * | 2019-08-12 | 2019-12-13 | 武汉大学 | Target tracking method based on internal cutting and multi-layer characteristic information fusion |
CN111161311A (en) * | 2019-12-09 | 2020-05-15 | 中车工业研究院有限公司 | Visual multi-target tracking method and device based on deep learning |
-
2020
- 2020-05-18 CN CN202010420205.0A patent/CN111724409A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180129906A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Deep cross-correlation learning for object tracking |
CN109829934A (en) * | 2018-12-20 | 2019-05-31 | 北京以萨技术股份有限公司 | A kind of novel image tracking algorithm based on twin convolutional network |
CN110188753A (en) * | 2019-05-21 | 2019-08-30 | 北京以萨技术股份有限公司 | One kind being based on dense connection convolutional neural networks target tracking algorism |
CN110309729A (en) * | 2019-06-12 | 2019-10-08 | 武汉科技大学 | Tracking and re-detection method based on anomaly peak detection and twin network |
CN110544269A (en) * | 2019-08-06 | 2019-12-06 | 西安电子科技大学 | twin network infrared target tracking method based on characteristic pyramid |
CN110443210A (en) * | 2019-08-08 | 2019-11-12 | 北京百度网讯科技有限公司 | A kind of pedestrian tracting method, device and terminal |
CN110570458A (en) * | 2019-08-12 | 2019-12-13 | 武汉大学 | Target tracking method based on internal cutting and multi-layer characteristic information fusion |
CN111161311A (en) * | 2019-12-09 | 2020-05-15 | 中车工业研究院有限公司 | Visual multi-target tracking method and device based on deep learning |
Non-Patent Citations (1)
Title |
---|
BO LI等: "High Performance Visual Tracking with Siamese Region Proposal Network", 《2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112365462A (en) * | 2020-11-06 | 2021-02-12 | 华雁智科(杭州)信息技术有限公司 | Image-based change detection method |
CN112365462B (en) * | 2020-11-06 | 2022-10-04 | 华雁智科(杭州)信息技术有限公司 | Image-based change detection method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110991272B (en) | Multi-target vehicle track recognition method based on video tracking | |
CN111462175B (en) | Space-time convolution twin matching network target tracking method, device, medium and equipment | |
CN110111345B (en) | Attention network-based 3D point cloud segmentation method | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN111476817A (en) | Multi-target pedestrian detection tracking method based on yolov3 | |
CN113034548A (en) | Multi-target tracking method and system suitable for embedded terminal | |
CN110287826B (en) | Video target detection method based on attention mechanism | |
CN108346159A (en) | A kind of visual target tracking method based on tracking-study-detection | |
CN111192284A (en) | Vehicle-mounted laser point cloud segmentation method and system | |
CN111027377B (en) | Double-flow neural network time sequence action positioning method | |
CN103559724A (en) | Method for synchronously tracking multiple cells in high-adhesion cell environment | |
CN113744311A (en) | Twin neural network moving target tracking method based on full-connection attention module | |
CN111462173B (en) | Visual tracking method based on twin network discrimination feature learning | |
CN112100435B (en) | Automatic labeling method based on edge traffic audio and video synchronization samples | |
CN116030396B (en) | Accurate segmentation method for video structured extraction | |
CN110111370B (en) | Visual object tracking method based on TLD and depth multi-scale space-time features | |
CN110728694A (en) | Long-term visual target tracking method based on continuous learning | |
CN110827312A (en) | Learning method based on cooperative visual attention neural network | |
CN105405152B (en) | Adaptive scale method for tracking target based on structuring support vector machines | |
CN112801047A (en) | Defect detection method and device, electronic equipment and readable storage medium | |
CN112200831A (en) | Dense connection twin neural network target tracking method based on dynamic template | |
CN113963333B (en) | Traffic sign board detection method based on improved YOLOF model | |
CN116129386A (en) | Method, system and computer readable medium for detecting a travelable region | |
CN106919950A (en) | Probability density weights the brain MR image segmentation of geodesic distance | |
CN111724409A (en) | Target tracking method based on densely connected twin neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200929 |
|
RJ01 | Rejection of invention patent application after publication |