CN108734151A - Robust long-range method for tracking target based on correlation filtering and the twin network of depth - Google Patents
Robust long-range method for tracking target based on correlation filtering and the twin network of depth Download PDFInfo
- Publication number
- CN108734151A CN108734151A CN201810613931.7A CN201810613931A CN108734151A CN 108734151 A CN108734151 A CN 108734151A CN 201810613931 A CN201810613931 A CN 201810613931A CN 108734151 A CN108734151 A CN 108734151A
- Authority
- CN
- China
- Prior art keywords
- target
- correlation filtering
- model
- twin network
- tracking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 88
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000011156 evaluation Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims description 39
- 230000004044 response Effects 0.000 claims description 31
- 230000008569 process Effects 0.000 claims description 25
- 238000012360 testing method Methods 0.000 claims description 12
- 230000001186 cumulative effect Effects 0.000 claims description 9
- 230000006870 function Effects 0.000 claims description 6
- 238000005259 measurement Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 230000000007 visual effect Effects 0.000 abstract description 8
- 230000008034 disappearance Effects 0.000 abstract description 7
- 230000007246 mechanism Effects 0.000 abstract description 3
- 230000010354 integration Effects 0.000 abstract 1
- 238000012216 screening Methods 0.000 abstract 1
- 238000004422 calculation algorithm Methods 0.000 description 24
- 238000013527 convolutional neural network Methods 0.000 description 18
- 238000013135 deep learning Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 241000282994 Cervidae Species 0.000 description 1
- 101100001678 Emericella variicolor andM gene Proteins 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000006073 displacement reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 230000002195 synergetic effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
Robust long-range method for tracking target based on correlation filtering and the twin network of depth, is related to computer vision technique.By the way that under a unified tracking frame, the challenges such as target occlusion in long video, the disappearance visual field can be effectively treated in correlation filtering and the twin network integration of depth.In the tracking, the target candidate position that the expert's evaluation mechanism based on D-expert and C-expert proposed can effectively generate correlation filtering and the twin network of depth jointly carries out assessment screening, obtain best target following result, correlation filtering tracker is updated using the result, is updated by error sample to effectively prevent correlation filtering tracker.The method for tracking target of proposition, being capable of long-time stable tracking target to all kinds of challenges more robust in long video.
Description
Technical Field
The invention relates to a computer vision technology, in particular to a robust long-range target tracking method based on correlation filtering and a depth twin network.
Background
As a fundamental research topic in the field of computer vision, target tracking is widely applied in the fields of video monitoring, human-computer interaction, virtual reality, intelligent robots, automatic driving and the like. After long-time research, a large number of excellent target tracking algorithms emerge in the field of target tracking. The target tracking algorithm can be divided into a short-range target tracking algorithm and a long-range target tracking algorithm according to the length of a video to be processed. In practical application, the target often experiences challenges such as long-time shielding, rotation, illumination and the like, so that the short-range target tracking algorithm cannot accurately track the target for a long time. Therefore, a robust long-range tracking algorithm is researched, the challenges of shielding, visual field disappearance and the like can be effectively processed, the tracker can accurately track the target for a long time, and the method has important practical significance.
In recent years, the research of target tracking algorithm based on correlation filtering has made remarkable progress. In 2010, Bolme firstly provides a MOOSE tracking algorithm based on correlation filtering, and the problem of solving ridge regression is switched to a frequency domain through Fourier transformation, so that the calculation speed is greatly increased. The method has the advantages that the Heriques provides a CSK algorithm in 2012, and a cyclic displacement matrix is used for constructing a training sample, so that the tracking speed is further improved. In 2014, the KCF tracking algorithm is proposed again by Heriques, and the gray scale features used in CSK are replaced by the multichannel HOG features, so that the tracking accuracy is effectively improved. Although the above methods can achieve far ultra-real-time tracking speed, their tracking accuracy is still low, and it is difficult to meet practical requirements. In order to further improve the tracking accuracy, some correlation filtering tracking algorithms based on CNN features are proposed in succession in last two or three years. The HCF tracking algorithm is proposed by Ma in 2015, and under the KCF tracking framework, the HOG feature is replaced by the more robust CNN feature, so that the accuracy of related filtering tracking is greatly improved. Danelljan enables the related filtering tracking to be more accurate by solving the problem of the boundary effect of the related filtering and introducing the CNN characteristic under the related filtering tracking framework. In 2016, Qi proposed an improved Hedge algorithm to fuse multiple correlation filtering models trained using different CNN features to obtain a more robust tracking result. In the same year, Danelljan proposes a C-COT tracking algorithm, and CNN characteristic graphs with different resolutions are effectively fused by training continuous convolution kernels, so that higher tracking accuracy is achieved. In order to further improve the tracking accuracy and speed of the C-COT, Danelljan proposes a more effective convolution operation in 2017, solves the problem of characteristic sparsity caused by the original convolution operation, and greatly improves the tracking accuracy and speed. Although the related filtering tracking algorithm based on the CNN features has certain robustness, the related filtering tracking algorithm cannot process challenges such as long-time occlusion and visual field disappearance in a long video, and after the target is occluded, the tracker is updated mistakenly for a long time, so that the tracker finally loses the target.
In order to more robustly handle the challenges of long video, a representative algorithm is the TLD algorithm proposed by Kalal in 2010, which, unlike the conventional tracking algorithm, consists of two parts, a tracker and a detector. The tracker uses optical flow method for target location, and the detector uses a random fern classifier, the former provides online training samples for the latter, and the latter is used for relocating the target after tracking failure and initializing the tracker. The structure solves the problems of target shielding, visual field disappearance and the like in a short time to a certain extent. In 2015, Ma proposed a long-range tracking algorithm (LCT) based on correlation filtering, which has a structure similar to that of TLD, and the LCT uses correlation filtering as a tracker and a random fern classifier as a detector, and therefore has a better effect than TLD in tracking non-rigid bodies because the correlation filtering can more effectively model the apparent changes of objects. However, since the trackers and detectors in the LCT and TLD need to be updated online, when a long-time view is blocked or lost by a target in a long video, the trackers and detectors thereof may fail due to long-time wrong updating, resulting in a tracking failure. Therefore, how to design a robust long-range tracking algorithm can effectively process the long-time blocked or lost view of the target in the long video, and has great significance.
Deep learning has been widely used in recent years for computer vision research. In 2012, the use of alexnety by Krizhevsky won ImageNet's game with absolute advantage, igniting the enthusiasm of people for deep learning. In the next few years, deep learning has been successfully applied to various fields such as object detection, saliency detection, semantic segmentation, metric learning, and pedestrian re-recognition. However, in contrast, the application of deep learning in the field of target tracking has certain limitations, and the main reasons are two reasons: (1) lack of on-line training sample size; (2) training the model on-line is very time consuming. Two reasons restrict the application of deep learning in target tracking. The earliest deep learning approach to target tracking was the DLT tracking algorithm proposed by Wang in 2013, which trained the network using particle filtering to collect positive and negative samples online. In the following years, various target tracking algorithms based on convolutional neural networks, such as SO-DLT, FCNN, MDNet and SANet, are proposed in succession, and the algorithms train the network by collecting positive and negative samples in the tracking process, although the higher precision can be achieved, the requirement of real-time tracking is far from being achieved. In 2016, Bertinetto proposed using a deep twin network for target tracking that was trained offline on the ILSVRC dataset. In tracking, only the first frame is used as a target template, and an area most similar to the target template is found in the test frame as a tracking result. The method can achieve far ultra-real-time tracking speed, but due to lack of on-line updating, when the target appearance changes violently, the tracking effect is not ideal.
Disclosure of Invention
The invention aims to provide a robust long-range target tracking method based on related filtering and a depth twin network, which can effectively process the challenges of target occlusion, visual field disappearance and the like in a long video by combining the related filtering and the depth twin network in a unified tracking frame, can effectively evaluate and screen target candidate positions jointly generated by the related filtering and the depth twin network by using an expert evaluation mechanism based on D-expert and C-expert to obtain an optimal target tracking result, can effectively avoid the related filtering tracker from being influenced by wrong samples by using the result to update the related filtering tracker, is more robust to various challenges in the long video, and can stably track a target for a long time.
The invention comprises the following steps:
1) a frame of training video is given, a training area is defined by taking a target as a center, and the training area completely comprises the target and a part of background area;
in step 1), the method for defining the training area may be: constructing a rectangular training area by taking a target as a center, wherein the length and the width of the rectangular training area are respectively the length and the width of the target; if the rectangular training area exceeds the training video frame, the average pixel is used for filling.
2) Extracting CNN characteristics from the training area obtained in the step 1) by using a pre-trained VGG-Net-19 model;
in step 2), the specific process of extracting the CNN feature from the training region obtained in step 1) by using the pre-trained VGG-Net-19 model may be: changing the size of the rectangular training area obtained in the step 1) by using a bilinear interpolation method to ensure that the size of the rectangular training area is changed to be the input size (224 multiplied by 3) required by the network, and taking the output of the l layer (corresponding to the conv3-4, conv4-4 and conv5-4 layers in the VGG-Net-19 model) as the extracted CNN characteristic which is recorded as xl,Wherein M, N and D are length and width of the characteristic diagram and channel number respectively.
3) Training a relevant filtering model by using the CNN characteristics obtained in the step 2), wherein the formula (1) is as follows:
where λ is the regularization parameter, y (m, n) is a Gaussian label of continuity,where σ represents the bandwidth of the linear kernel, equation (1) is a typical ridge regression that can be solved for:
wherein,in order to train the resulting correlation filtering model,is composed ofThe conjugate of (a) to (b),and Y is eachAnd the discrete fourier transform of the gaussian label y,is a dot product operation;
in step 3), λ is 10-4,σ=10-1。
4) Giving a frame of test video, responding to the search area by using the trained related filtering model to obtain a response graph, and determining the position with the maximum response value in the response graph as a target initial position;
in step 4), the specific process of determining the position with the largest response value in the response map as the target initial position may be: extracting CNN characteristics of a search region in a layer l in a VGG-Net-19 model in a test video, and recording the CNN characteristics as zl,zlAnd xlSame size, correlation filtering model in zlThe response map above can be calculated by the following equation (3):
wherein f islA response graph representing the characteristics of the correlation filtering model at layer l, F-1Which represents the inverse fourier transform of the signal,is composed ofDiscrete fourier transform of (d); in order to improve the tracking robustness, the VGG-Net-19 model can be used for extracting features of different layers for target positioning, and a feature map of a common L layer is givenBy the formula (3), response graphs of the correlation filtering model on different layer characteristics can be obtained and are marked asThe initial target position estimated by the correlation filtering model may be calculated as:
wherein,target position, gamma, estimated for a correlation filtering modellIs the weight of the response graph on the l-level feature.
The given frame of test video may use a correlation filtering model to estimate a target location, comprising the sub-steps of:
a. the number of CNN feature layers L used in formula (4) is set to 3, namely conv3-4, conv4-4 and conv5-4 layers in the VGG-Net-19 model;
b. the weights corresponding to the conv3-4, conv4-4 and conv5-4 layer features of equation (4) are set to 0.5, 1, 0.02, respectively.
5) In a test video, taking the previous frame of target as a center, and constructing a search scale pyramid;
in step 5), the specific process of constructing the search scale pyramid may be: taking the target of the previous frame as the center, constructing Q scale factors on the basis of the scale of the target of the previous frame, multiplying the scale factors by the original target scale to obtain Q search areas with different scales, and changing the size of the search areas into equal size (255 multiplied by 3) by using a bilinear difference value to be recorded as equal size (255 multiplied by 3)And Q is 36.
6) Using a pre-trained deep twin network, taking a target in a first frame video as a template, matching the template on each scale obtained in the step 5) to obtain a candidate target position with the highest confidence level in each scale, and sequencing the candidate target positions with the confidence levels to obtain K candidate target positions with the highest confidence levels, wherein the calculation process is as follows:
wherein, o is the target template,s (,) is a similarity measurement function obtained by offline learning of the deep twin network, and the measurement result is returned to be a similarity graph and is recordedFor the best similarity value of the target template under the q search scale, pairSorting is carried out, the first K candidate target positions can be obtained and recorded as a setOrder toThe set U is used to represent all candidate target positions, then
In step 6), the parameter K of the deep twin network is set to 1 (see document: luca Bertinetto et al, 2016, on ECCVworks hop).
7) Evaluating the candidate target position obtained in the step 6) by using D-expert based on depth similarity to obtain the best candidate target position;
in step 7), the step of evaluating the candidate target position obtained in step 6) by using the D-expert based on the depth similarity may include: constructing an online target appearance model, wherein the online target appearance model mainly comprises three types of samples: (1) a target sample in a first frame; (2) target samples with higher confidence degrees collected in the tracking process; (3) target samples in a previous frame. And extracting the full-connection layer characteristics of the three samples by using a VGG-Net-19 model, and respectively recording the full-connection layer characteristics asAndthe set V is used to represent the three types of samples, thenSimilarly, extracting all-connected features from the candidate targets in U, and recording asFor ekE, D-expert calculates the cumulative similarity distance of the E, D-expert on V:
by comparing the cumulative similarity distances, the best candidate target obtained by the deep twin network search can be calculated by the following equation (7):
e-expert further evaluates the target position of the correlation filtering estimation and the best candidate target obtained by searching the twin network in different scale ranges:
wherein r isDFor the evaluation of D-expert, sign (. cndot.) is a sign function if rDIf the cumulative distance of the best candidate target obtained by twin network search on the appearance model is less than the candidate target estimated by relevant filtering, the best candidate target is more reliable, and then subsequent evaluation is carried out; otherwise, the candidate target of the relevant filtering estimation is used as the final tracking result;
the on-line target apparent model size can be set to | V0|=|V1|=|V2|=1。
8) Respectively evaluating the target position obtained in the step 3) and the optimal candidate target position obtained in the step 6) by using C-expert based on relevant filtering to obtain an optimal target tracking result and finish tracking; c-expert uses two correlation filter models for evaluation, notedAndthe former only trains in the first frame of video, and keeps the original target model, and the latter trains and updates in the whole tracking process, and considers the deformation of the object; let Rt(m, n) and R1(m, n) respectively represent correlation filtering modelsAnda response value at position (m, n); target position estimated by C-expert correlation filtering modelAnd the best target position obtained by the deep twin network searchEvaluation was carried out:
wherein r isCThe evaluation value was C-expert. If rCIs 1, then selectIs the final tracking result; if not, then,as a final tracking result; if it isIndicating the best position from a deep twin network searchDeformation of more objects is considered, and the method is more reliable; if it isIt indicates that the relevant filtering model is likely to have been updated incorrectly, and the result obtained by the deep twin network isWith higher response values and higher confidence, and therefore selects this result as the final tracking result.
According to the invention, related filtering and a depth twin network are combined under a unified tracking frame, so that the challenges of target occlusion, visual field disappearance and the like in a long video can be effectively processed. In the tracking method, the provided D-expert and C-expert based expert evaluation mechanism can effectively evaluate and screen target candidate positions jointly generated by the related filtering and the depth twin network to obtain the optimal target tracking result, and the result is used for updating the related filtering tracker, so that the related filtering tracker is effectively prevented from being updated by wrong samples. The target tracking method provided by the invention is robust to various challenges in long videos and can stably track the target for a long time.
Drawings
Fig. 1 is a schematic overall flow chart of an embodiment of the present invention.
FIG. 2 is a diagram illustrating a result of qualitative tracking in a target occlusion video according to an embodiment of the present invention. Wherein the rectangular frame is the target tracking result obtained by the invention.
Detailed Description
The method of the present invention is described in detail below with reference to the accompanying drawings and examples.
Referring to fig. 1, an implementation of an embodiment of the invention includes the steps of:
1) given a frame of training video, a training area is defined centered on the target, the training area completely containing the target and a portion of the background area. The dividing method comprises the following steps: constructing a rectangular training area by taking a target as a center, wherein the length and the width of the rectangular area are respectively the length and the width of the target; if the rectangular area exceeds the training video frame, the average pixel is used for filling.
2) And (3) extracting the CNN characteristics of the training area obtained in the step 1) by using a pre-trained VGG-Net-19 model. The specific process is as follows: changing the size of the rectangular training area obtained in the step 1) by using bilinear interpolation to ensure that the size of the rectangular training area meets the input size (224 multiplied by 3) required by the network, and taking the output of the l layer (corresponding to the conv3-4, conv4-4 and conv5-4 layers in the VGG-Net-19 model) and marking the output as xl,Wherein M, N and D are length and width of the characteristic diagram and channel number respectively.
3) Using the CNN features obtained in the step 2) to train a relevant filtering model. Equation (1) is as follows:
where λ is the regularization parameter, y (m, n) is a Gaussian label of continuity,where σ represents the bandwidth of the linear kernel. Equation (1) is a typical ridge regression, and there is a closed-form solution that can be solved:
wherein,in order to train the resulting correlation filtering model,is composed ofIn a common vesselThe yoke is provided with a plurality of yokes,and Y is eachAnd the discrete fourier transform of the gaussian label y,is a dot product operation.
4) And giving a frame of test video, responding to the search area by using the trained related filtering model to obtain a response graph, and determining the position with the maximum value in the response graph as the initial position of the target. The specific process is as follows: extracting CNN characteristics of a search region in a layer l in a VGG-Net-19 model in a test video, and recording the CNN characteristics as zl,zlAnd xlThe sizes are the same. Correlation filtering model in zlThe response map above can be calculated by the following equation (3):
wherein f islA response graph representing the characteristics of the correlation filtering model at layer l, F-1Which represents the inverse fourier transform of the signal,is composed ofDiscrete fourier transform of (d). In order to improve the robustness of tracking, the target is positioned by the characteristics of different layers. Feature map for a given common L layerBy the formula (3), response graphs of the correlation filtering model on different layer characteristics can be obtained and are marked asThe initial target position estimated by the correlation filtering model may be calculated as:
wherein,target position, gamma, estimated for a correlation filtering modellIs the weight of the response graph on the l-level feature.
5) And in the test video, taking the target of the previous frame as the center, and constructing a search scale pyramid. The specific process is as follows: taking the target of the previous frame as a center, constructing Q scale factors on the basis of the scale of the target of the previous frame, multiplying the scale factors by the original target scale to obtain Q search areas with different scales, and changing the size of the search areas into an equal size (255 multiplied by 3) by using a bilinear difference value, and recording the size as:
6) and (3) matching the templates on each scale obtained in the step 5) by using the pre-trained deep twin network and taking the target in the first frame video as the template to obtain the candidate target position with the highest confidence level in each scale, and sequencing by using the confidence levels to obtain K candidate target positions with the highest confidence levels. The calculation process is as follows:
wherein, o is the target template,and S (,) is a similarity measurement function obtained by offline learning of the deep twin network, and a measurement result is returned to be a similarity graph. Note the bookFor the best similarity value of the target template under the q search scale, pairSorting is carried out, the first K candidate target positions can be obtained and recorded as a setOrder toThe set U is used to represent all candidate target positions, then
7) And D-expert based on depth similarity is used for evaluating the candidate target position obtained in the step F to obtain the best candidate target position. The specific process is as follows: an online target appearance model is constructed, and the model mainly comprises three types of samples: (1) a target sample in a first frame; (2) target samples with higher confidence degrees collected in the tracking process; (3) and (5) the latest tracking result. And extracting the full-connection layer characteristics of the three samples by using a VGG-Net-19 model, and respectively recording the full-connection layer characteristics asAndthe set V is used to represent the three types of samples, thenSimilarly, extracting all-connected features from the candidate targets in U, and recording asFor ekE, D-expert calculates the cumulative similarity distance of the E, D-expert on V:
by comparing the cumulative similarity distances, the best candidate target obtained by the deep twin network search can be calculated by the following equation (7):
f-expert further evaluates the target position of the correlation filtering estimation and the best candidate target obtained by searching the twin network in different scale ranges:
wherein r isDSign () is a sign function for the evaluation value of D-expert. If rDIf the cumulative distance of the best candidate target obtained by twin network search on the appearance model is less than the candidate target estimated by relevant filtering, the best candidate target is more reliable, and then subsequent evaluation is carried out; otherwise, the candidate target of the relevant filtering estimation is used as the final tracking result.
8) And C-expert based on relevant filtering is used for evaluating the target position obtained in the step 3) and the optimal candidate target position obtained in the step 7), so that an optimal target tracking result is obtained, and tracking is completed. C-expert uses two correlation filter models for evaluation, notedAndthe former only trains in the first frame of video, and keeps the original target model, and the latter trains and updates in the whole tracking process, and considers the deformation of the target. Let Rt(m, n) and R1(m, n) respectively represent correlation filtering modelsAndthe response value at position (m, n). Target position estimated by C-expert correlation filtering modelAnd the best target position obtained by the deep twin network searchEvaluation was carried out:
wherein r isCThe evaluation value was C-expert. If rCIs 1, then selectIs the final tracking result; if not, then,as a final tracking result. If it isIndicating the best position from a deep twin network searchDeformation of more objects is considered, and the method is more reliable, and the deformation is taken as an optimal tracking result; if it isIt indicates that the relevant filtering model is likely to have been updated incorrectly, and the result obtained by the deep twin network isWith higher response values and higher confidence, and therefore selects this result as the final tracking result.
The overall framework of the invention is shown in figure 1. FIG. 2 is a diagram illustrating a result of qualitative tracking in a target occlusion video according to an embodiment of the present invention. Wherein the rectangular frame is the method of the invention; as can be seen from the figure, the method of the invention can effectively process the challenges of target occlusion, field of view disappearance and the like in the long video.
Table 1 shows the comparison of the accuracy, success rate and speed of the OTB-2013 data set of the invention and other 11 target tracking methods. The method obtains a good tracking result on a main stream data set.
TABLE 1
Method of producing a composite material | Precision (%) | Success rate (%) | Speed (FPS) |
The invention | 91.5 | 65.6 | 8.9 |
CF2(2015) | 89.1 | 60.5 | 10.5 |
HDT(2016) | 88.9 | 60.3 | 11.1 |
SiamFC(2016) | 80.1 | 60.6 | 68.1 |
Staple(2016) | 79.3 | 60.0 | 62.4 |
SRDCF(2015) | 83.8 | 62.6 | 3.8 |
KCF(2015) | 74.1 | 51.3 | 205.3 |
DSST(2014) | 74.0 | 55.4 | 23.6 |
CSK(2012) | 54.5 | 39.8 | 458.0 |
IVT(2008) | 49.9 | 35.8 | 40.1 |
LCT(2015) | 84.8 | 62.8 | 21.0 |
CT(2012) | 40.6 | 30.6 | 53.9 |
In table 1:
KCF corresponds to the method proposed by J.F.Henriques et al (J.F.Henriques, R.Caseiro, P.Martins, and J.Batista, "High-Speed Tracking with Kernelized correlation filters," IEEE trans.Pattern anal.Mach.Intell., vol.37, No.3, pp.583-596,2015.)
DSST corresponds to the method proposed for d.martin et al (d.martin, g.hager, f.s.khan, and m.felsberg, "dispersive Scale spacing Tracking," IEEE trans.pattern No. mach.inner, vol.39, No.8, pp.1561-1575,2016.)
The Stacke corresponds to the method proposed by L.Bertonitto et al (L.Bertonitto, J.Valldre, S.Golodetz, O.Miksik, and P.H.S.Torr, "Stacke: Complementary Learners for Real-Time Tracking," in Proc.IEEE Conf.Comp.Vis.Pattern recording., 2016, pp.1401-1409.)
SRDCF corresponds to the method proposed by m.danelljan et al (m.danelljan, G.F.S.Khan,andM.Felsberg,“Learning Spatially Regularized Correlation Filters for VisualTracking,”in Proc.IEEE Int.Conf.Comput.Vis.,2015,pp.4310-4318.)
SiamFC corresponds to the method proposed by L.Bertonitto et al (L.Bertonitto, J.Valldre, J.Henriques, A.Vedaldi, and P.Torr, "full-capacitive silicon Networks for Objecting," in Proc.Workshop on Eur.Conf.Computt.Vis., 2016, pp.850-865.)
CF2 corresponds to the method proposed by C.Ma et al (C.Ma, J. -B.Huang, X.K.Yang, and M. -H.Yang, "high efficiency synergistic components for Visual Tracking," in Proc. IEEEInt. Conf.Comp.Vis., 2015, pp.3074-3082.)
HDT corresponds to the method proposed by Y.K.Qi et al (Y.K.Qi, S.P.Zhang, L.Qin, H.X.Yao, Q.M.Huang, J.Lim, and M.H.Yang, "Hedged Deep Tracking," in Proc.IEEEConf.Comp.Vis.Pattern Recognit, 2016, pp.4303-4311.)
LCT corresponds to the method proposed by C.Ma et al (C.Ma, X.K.Yang, C.Y.Zhang, and M. -H.Yang, "Long-Term core-linking Tracking," in Proc.IEEE Conf.Comp.Vis.Pattern Recognition, 2015, pp.5388-5396.)
CSK corresponds to the proposed method for J.F.Henriques et al (J.F.Henriques, R.Caseiro, P.Martins, and J.Batista, "expanding the circular Structure of Tracking-by-Detection with Kernels," in Proc.Eur.Conf.Compout.Vis., 2012, pp.702-715.)
CT corresponds to the method proposed for K.H.Zhang et al (K.H.Zhang, L.Zhang, and M. -H.Yang, "Real-Time Compressive Tracking," in Proc.Eur.Conf.Compout.Vis., 2012, pp.864-877.)
IVT corresponds to the method proposed for D.A.Ross et al (D.A.Ross, J.Lim, R. -S.Lin, and M. -H.Yang, "incorporated Learning for Robust Visual Tracking," int.J.Computt.Vis., vol.77, No.1, pp.125-141,2008.)
Claims (10)
1. The robust long-range target tracking method based on the correlation filtering and the depth twin network is characterized by comprising the following steps of:
1) a frame of training video is given, a training area is defined by taking a target as a center, and the training area completely comprises the target and a part of background area;
2) extracting CNN characteristics from the training area obtained in the step 1) by using a pre-trained VGG-Net-19 model;
3) training a relevant filtering model by using the CNN characteristics obtained in the step 2), wherein the formula (1) is as follows:
where λ is the regularization parameter, y (m, n) is a Gaussian label of continuity,where σ represents the bandwidth of the linear kernel, equation (1) is a typical ridge regression, solved for:
wherein,in order to train the resulting correlation filtering model,is composed ofThe conjugate of (a) to (b),and Y is eachAnd the discrete fourier transform of the gaussian label y,is a dot product operation;
4) giving a frame of test video, responding to the search area by using the trained related filtering model to obtain a response graph, and determining the position with the maximum response value in the response graph as a target initial position;
5) in a test video, taking the previous frame of target as a center, and constructing a search scale pyramid;
6) using a pre-trained deep twin network, taking a target in a first frame video as a template, matching the template on each scale obtained in the step 5) to obtain a candidate target position with the highest confidence level in each scale, and sequencing the candidate target positions with the confidence levels to obtain K candidate target positions with the highest confidence levels, wherein the calculation process is as follows:
wherein, o is the target template,s (,) is a similarity measurement function obtained by offline learning of the deep twin network, and the measurement result is returned to be a similarity graph and is recordedFor the best similarity value of the target template under the q search scale, pairSorting is carried out to obtain the first K candidate target positions which are recorded as a setOrder toThe set U is used to represent all candidate target positions, then
7) Evaluating the candidate target position obtained in the step 6) by using D-expert based on depth similarity to obtain the best candidate target position;
8) respectively evaluating the target position obtained in the step 3) and the best candidate target position obtained in the step 6) by using C-expert based on correlation filtering to obtainCompleting tracking according to the optimal target tracking result; c-expert uses two correlation filter models for evaluation, notedAndthe former only trains in the first frame of video and keeps the original target model, and the latter trains and updates in the whole tracking process and considers the deformation of the object; let Rt(m, n) and R1(m, n) respectively represent correlation filtering modelsAnda response value at position (m, n); target position estimated by C-expert correlation filtering modelAnd the best target position obtained by the deep twin network searchEvaluation was carried out:
wherein r isCAn evaluation value of C-expert; if rCIs 1, then selectIs the final tracking result; if not, then,as a final tracking result; if it isIndicating the best position from a deep twin network searchThe deformation of more objects is considered, and the method is more reliable; if it isIt indicates that the relevant filtering model is likely to have been updated incorrectly, and the result obtained by the deep twin network isWith higher response values and higher confidence, and therefore selects this result as the final tracking result.
2. The robust long-range target tracking method based on correlation filtering and depth twin network as claimed in claim 1, wherein in step 1), the method for defining the training area is: constructing a rectangular training area by taking a target as a center, wherein the length and the width of the rectangular training area are respectively the length and the width of the target; if the rectangular training area exceeds the training video frame, the average pixel is used for filling.
3. The robust long-range target tracking method based on correlation filtering and a deep twin network as claimed in claim 1, wherein in step 2), the specific process of extracting the CNN feature from the training region obtained in step 1) by using the pre-trained VGG-Net-19 model is as follows: using a bilinear interpolation method for the rectangular training area obtained in the step 1), changing the size of the rectangular training area to the input size (224 multiplied by 3) required by the network, and taking the output of the I layer as the extracted CNN characteristic which is recorded as xl,Wherein M, N and D are respectively the length and width of the feature map andthe number of channels; the l layers correspond to conv3-4, conv4-4 and conv5-4 layers in the VGG-Net-19 model.
4. The robust long-range target tracking method based on correlation filtering and depth twin network as claimed in claim 1, wherein in step 3), λ -10-4,σ=10-1。
5. The robust long-range target tracking method based on correlation filtering and a depth twin network as claimed in claim 1, wherein in step 4), the specific process of determining the position with the largest response value in the response map as the initial position of the target is as follows: extracting CNN characteristics of a search region in a layer l in a VGG-Net-19 model in a test video, and recording the CNN characteristics as zl,zlAnd xlSame size, correlation filtering model in zlThe above response map is calculated by the following formula (3):
wherein f islA response graph representing the characteristics of the correlation filtering model at layer l, F-1Which represents the inverse fourier transform of the signal,is composed ofDiscrete fourier transform of (d); in order to improve the tracking robustness, the VGG-Net-19 model is used for extracting features of different layers for target positioning, and feature maps of the common L layers are givenObtaining a response graph of the correlation filtering model on different layer characteristics through the formula (3), and recording the response graph asThe initial target position estimated by the correlation filtering model is calculated as:
wherein,target position, gamma, estimated for a correlation filtering modellIs the weight of the response graph on the l-level feature.
6. The robust long-range target tracking method based on correlation filtering and depth twin network as claimed in claim 1, wherein in step 4), the given frame of test video is used to estimate the target position by using a correlation filtering model, comprising the sub-steps of:
a. the number of CNN feature layers L used in formula (4) is set to 3, namely conv3-4, conv4-4 and conv5-4 layers in the VGG-Net-19 model;
b. the weights corresponding to the conv3-4, conv4-4 and conv5-4 layer features of equation (4) are set to 0.5, 1, 0.02, respectively.
7. The robust long-range target tracking method based on correlation filtering and depth twin network as claimed in claim 1, wherein in step 5), the specific process of constructing the search scale pyramid is as follows: taking the target of the previous frame as the center, constructing Q scale factors on the basis of the scale of the target of the previous frame, multiplying the scale factors by the original target scale to obtain Q search areas with different scales, and changing the size of the search areas into equal size of 255 multiplied by 3 by using a bilinear difference value, and recording the equal size of 255 multiplied by 3 as the equal size of the search areasAnd Q is 36.
8. The robust long-range target tracking method based on correlation filtering and depth twin network as claimed in claim 1, wherein in step 6), the parameter K of the depth twin network is set to 1.
9. The robust long-range target tracking method based on correlation filtering and depth twin network as claimed in claim 1, wherein in step 7), the candidate target position obtained in step 6) is evaluated by using D-expert based on depth similarity, and the specific process of obtaining the best candidate target position is as follows: constructing an online target appearance model, wherein the online target appearance model mainly comprises three types of samples: (1) a target sample in a first frame; (2) target samples with higher confidence degrees collected in the tracking process; (3) a target sample in a previous frame; and extracting the full-connection layer characteristics of the three samples by using a VGG-Net-19 model, and respectively recording the full-connection layer characteristics asAndthe set V is used to represent the three types of samples, thenSimilarly, extracting all-connected features from the candidate targets in U, and recording asFor ekE, D-expert calculates the cumulative similarity distance of the E, D-expert on V:
by comparing the cumulative similarity distances, the best candidate target obtained by the deep twin network search is calculated by the following formula (7):
d-expert further evaluates the target position of the correlation filtering estimation and the best candidate target obtained by searching the twin network in different scale ranges:
wherein r isDFor the evaluation of D-expert, sign (. cndot.) is a sign function if rDIf the cumulative distance of the best candidate target obtained by twin network search on the appearance model is less than the candidate target estimated by relevant filtering, the best candidate target is more reliable, and then subsequent evaluation is carried out; otherwise, the candidate target of the relevant filtering estimation is used as the final tracking result.
10. The robust long-range target tracking method based on correlation filtering and depth twin network as claimed in claim 9, wherein the on-line target apparent model size is set to | V0|=|V1|=|V2|=1。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810613931.7A CN108734151B (en) | 2018-06-14 | 2018-06-14 | Robust long-range target tracking method based on correlation filtering and depth twin network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810613931.7A CN108734151B (en) | 2018-06-14 | 2018-06-14 | Robust long-range target tracking method based on correlation filtering and depth twin network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108734151A true CN108734151A (en) | 2018-11-02 |
CN108734151B CN108734151B (en) | 2020-04-14 |
Family
ID=63929665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810613931.7A Active CN108734151B (en) | 2018-06-14 | 2018-06-14 | Robust long-range target tracking method based on correlation filtering and depth twin network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108734151B (en) |
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109543615A (en) * | 2018-11-23 | 2019-03-29 | 长沙理工大学 | A kind of double learning model method for tracking target based on multi-stage characteristics |
CN109598684A (en) * | 2018-11-21 | 2019-04-09 | 华南理工大学 | In conjunction with the correlation filtering tracking of twin network |
CN109711316A (en) * | 2018-12-21 | 2019-05-03 | 广东工业大学 | A kind of pedestrian recognition methods, device, equipment and storage medium again |
CN109727272A (en) * | 2018-11-20 | 2019-05-07 | 南京邮电大学 | A kind of method for tracking target based on double branch's space-time regularization correlation filters |
CN109784155A (en) * | 2018-12-10 | 2019-05-21 | 西安电子科技大学 | Visual target tracking method, intelligent robot based on verifying and mechanism for correcting errors |
CN109859244A (en) * | 2019-01-22 | 2019-06-07 | 西安微电子技术研究所 | A kind of visual tracking method based on convolution sparseness filtering |
CN110033012A (en) * | 2018-12-28 | 2019-07-19 | 华中科技大学 | A kind of production method for tracking target based on channel characteristics weighted convolution neural network |
CN110033477A (en) * | 2019-04-04 | 2019-07-19 | 中设设计集团股份有限公司 | A kind of road vehicle LBP feature correlation filtering tracking suitable for blocking scene |
CN110033478A (en) * | 2019-04-12 | 2019-07-19 | 北京影谱科技股份有限公司 | Visual target tracking method and device based on depth dual training |
CN110070562A (en) * | 2019-04-02 | 2019-07-30 | 西北工业大学 | A kind of context-sensitive depth targets tracking |
CN110210503A (en) * | 2019-06-14 | 2019-09-06 | 厦门历思科技服务有限公司 | A kind of seal recognition methods and device and equipment |
CN110415271A (en) * | 2019-06-28 | 2019-11-05 | 武汉大学 | One kind fighting twin network target tracking method based on the multifarious generation of appearance |
CN110675429A (en) * | 2019-09-24 | 2020-01-10 | 湖南人文科技学院 | Long-range and short-range complementary target tracking method based on twin network and related filter |
CN110781778A (en) * | 2019-10-11 | 2020-02-11 | 珠海格力电器股份有限公司 | Access control method and device, storage medium and home system |
CN110942471A (en) * | 2019-10-30 | 2020-03-31 | 电子科技大学 | Long-term target tracking method based on space-time constraint |
CN111091585A (en) * | 2020-03-19 | 2020-05-01 | 腾讯科技(深圳)有限公司 | Target tracking method, device and storage medium |
CN111260687A (en) * | 2020-01-10 | 2020-06-09 | 西北工业大学 | Aerial video target tracking method based on semantic perception network and related filtering |
CN111931571A (en) * | 2020-07-07 | 2020-11-13 | 华中科技大学 | Video character target tracking method based on online enhanced detection and electronic equipment |
CN112085765A (en) * | 2020-09-15 | 2020-12-15 | 浙江理工大学 | Video target tracking method combining particle filtering and metric learning |
CN112348849A (en) * | 2020-10-27 | 2021-02-09 | 南京邮电大学 | Twin network video target tracking method and device |
CN112507859A (en) * | 2020-12-05 | 2021-03-16 | 西北工业大学 | Visual tracking method for mobile robot |
CN112560695A (en) * | 2020-12-17 | 2021-03-26 | 中国海洋大学 | Underwater target tracking method, system, storage medium, equipment, terminal and application |
CN112634316A (en) * | 2020-12-30 | 2021-04-09 | 河北工程大学 | Target tracking method, device, equipment and storage medium |
CN115061574A (en) * | 2022-07-06 | 2022-09-16 | 陈伟 | Human-computer interaction system based on visual core algorithm |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104616324A (en) * | 2015-03-06 | 2015-05-13 | 厦门大学 | Target tracking method based on adaptive appearance model and point-set distance metric learning |
CN105868789A (en) * | 2016-04-07 | 2016-08-17 | 厦门大学 | Object discovery method based on image area convergence measurement |
CN106952288A (en) * | 2017-03-31 | 2017-07-14 | 西北工业大学 | Based on convolution feature and global search detect it is long when block robust tracking method |
CN107154024A (en) * | 2017-05-19 | 2017-09-12 | 南京理工大学 | Dimension self-adaption method for tracking target based on depth characteristic core correlation filter |
CN107992826A (en) * | 2017-12-01 | 2018-05-04 | 广州优亿信息科技有限公司 | A kind of people stream detecting method based on the twin network of depth |
-
2018
- 2018-06-14 CN CN201810613931.7A patent/CN108734151B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104616324A (en) * | 2015-03-06 | 2015-05-13 | 厦门大学 | Target tracking method based on adaptive appearance model and point-set distance metric learning |
CN105868789A (en) * | 2016-04-07 | 2016-08-17 | 厦门大学 | Object discovery method based on image area convergence measurement |
CN106952288A (en) * | 2017-03-31 | 2017-07-14 | 西北工业大学 | Based on convolution feature and global search detect it is long when block robust tracking method |
CN107154024A (en) * | 2017-05-19 | 2017-09-12 | 南京理工大学 | Dimension self-adaption method for tracking target based on depth characteristic core correlation filter |
CN107992826A (en) * | 2017-12-01 | 2018-05-04 | 广州优亿信息科技有限公司 | A kind of people stream detecting method based on the twin network of depth |
Non-Patent Citations (1)
Title |
---|
BING LIU 等: "Convolutional neural networks based scale-adaptive kernelized correlation filter for robust visual object tracking", 《2017 INTERNATIONAL CONFERENCE ON SECURITY, PATTERN ANALYSIS, AND CYBERNETICS (SPAC)》 * |
Cited By (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109727272A (en) * | 2018-11-20 | 2019-05-07 | 南京邮电大学 | A kind of method for tracking target based on double branch's space-time regularization correlation filters |
CN109727272B (en) * | 2018-11-20 | 2022-08-12 | 南京邮电大学 | Target tracking method based on double-branch space-time regularization correlation filter |
CN109598684A (en) * | 2018-11-21 | 2019-04-09 | 华南理工大学 | In conjunction with the correlation filtering tracking of twin network |
CN109543615A (en) * | 2018-11-23 | 2019-03-29 | 长沙理工大学 | A kind of double learning model method for tracking target based on multi-stage characteristics |
CN109543615B (en) * | 2018-11-23 | 2022-10-28 | 长沙理工大学 | Double-learning-model target tracking method based on multi-level features |
CN109784155B (en) * | 2018-12-10 | 2022-04-29 | 西安电子科技大学 | Visual target tracking method based on verification and error correction mechanism and intelligent robot |
CN109784155A (en) * | 2018-12-10 | 2019-05-21 | 西安电子科技大学 | Visual target tracking method, intelligent robot based on verifying and mechanism for correcting errors |
CN109711316B (en) * | 2018-12-21 | 2022-10-21 | 广东工业大学 | Pedestrian re-identification method, device, equipment and storage medium |
CN109711316A (en) * | 2018-12-21 | 2019-05-03 | 广东工业大学 | A kind of pedestrian recognition methods, device, equipment and storage medium again |
CN110033012A (en) * | 2018-12-28 | 2019-07-19 | 华中科技大学 | A kind of production method for tracking target based on channel characteristics weighted convolution neural network |
CN109859244A (en) * | 2019-01-22 | 2019-06-07 | 西安微电子技术研究所 | A kind of visual tracking method based on convolution sparseness filtering |
CN109859244B (en) * | 2019-01-22 | 2022-07-08 | 西安微电子技术研究所 | Visual tracking method based on convolution sparse filtering |
CN110070562A (en) * | 2019-04-02 | 2019-07-30 | 西北工业大学 | A kind of context-sensitive depth targets tracking |
CN110033477A (en) * | 2019-04-04 | 2019-07-19 | 中设设计集团股份有限公司 | A kind of road vehicle LBP feature correlation filtering tracking suitable for blocking scene |
CN110033477B (en) * | 2019-04-04 | 2021-05-11 | 华设设计集团股份有限公司 | Road vehicle LBP feature-dependent filtering tracking method suitable for occlusion scene |
CN110033478A (en) * | 2019-04-12 | 2019-07-19 | 北京影谱科技股份有限公司 | Visual target tracking method and device based on depth dual training |
CN110210503A (en) * | 2019-06-14 | 2019-09-06 | 厦门历思科技服务有限公司 | A kind of seal recognition methods and device and equipment |
CN110210503B (en) * | 2019-06-14 | 2021-01-01 | 厦门历思科技服务有限公司 | Seal identification method, device and equipment |
CN110415271A (en) * | 2019-06-28 | 2019-11-05 | 武汉大学 | One kind fighting twin network target tracking method based on the multifarious generation of appearance |
CN110415271B (en) * | 2019-06-28 | 2022-06-07 | 武汉大学 | Appearance diversity-based method for tracking generation twin-resisting network target |
CN110675429A (en) * | 2019-09-24 | 2020-01-10 | 湖南人文科技学院 | Long-range and short-range complementary target tracking method based on twin network and related filter |
CN110781778B (en) * | 2019-10-11 | 2021-04-20 | 珠海格力电器股份有限公司 | Access control method and device, storage medium and home system |
CN110781778A (en) * | 2019-10-11 | 2020-02-11 | 珠海格力电器股份有限公司 | Access control method and device, storage medium and home system |
CN110942471B (en) * | 2019-10-30 | 2022-07-01 | 电子科技大学 | Long-term target tracking method based on space-time constraint |
CN110942471A (en) * | 2019-10-30 | 2020-03-31 | 电子科技大学 | Long-term target tracking method based on space-time constraint |
CN111260687A (en) * | 2020-01-10 | 2020-06-09 | 西北工业大学 | Aerial video target tracking method based on semantic perception network and related filtering |
CN111260687B (en) * | 2020-01-10 | 2022-09-27 | 西北工业大学 | Aerial video target tracking method based on semantic perception network and related filtering |
CN111091585A (en) * | 2020-03-19 | 2020-05-01 | 腾讯科技(深圳)有限公司 | Target tracking method, device and storage medium |
CN111931571B (en) * | 2020-07-07 | 2022-05-17 | 华中科技大学 | Video character target tracking method based on online enhanced detection and electronic equipment |
CN111931571A (en) * | 2020-07-07 | 2020-11-13 | 华中科技大学 | Video character target tracking method based on online enhanced detection and electronic equipment |
CN112085765B (en) * | 2020-09-15 | 2024-05-31 | 浙江理工大学 | Video target tracking method combining particle filtering and metric learning |
CN112085765A (en) * | 2020-09-15 | 2020-12-15 | 浙江理工大学 | Video target tracking method combining particle filtering and metric learning |
CN112348849B (en) * | 2020-10-27 | 2023-06-20 | 南京邮电大学 | Twin network video target tracking method and device |
CN112348849A (en) * | 2020-10-27 | 2021-02-09 | 南京邮电大学 | Twin network video target tracking method and device |
CN112507859A (en) * | 2020-12-05 | 2021-03-16 | 西北工业大学 | Visual tracking method for mobile robot |
CN112507859B (en) * | 2020-12-05 | 2024-01-12 | 西北工业大学 | Visual tracking method for mobile robot |
CN112560695B (en) * | 2020-12-17 | 2023-03-24 | 中国海洋大学 | Underwater target tracking method, system, storage medium, equipment, terminal and application |
CN112560695A (en) * | 2020-12-17 | 2021-03-26 | 中国海洋大学 | Underwater target tracking method, system, storage medium, equipment, terminal and application |
CN112634316A (en) * | 2020-12-30 | 2021-04-09 | 河北工程大学 | Target tracking method, device, equipment and storage medium |
CN115061574B (en) * | 2022-07-06 | 2023-03-31 | 大连厚仁科技有限公司 | Human-computer interaction system based on visual core algorithm |
CN115061574A (en) * | 2022-07-06 | 2022-09-16 | 陈伟 | Human-computer interaction system based on visual core algorithm |
Also Published As
Publication number | Publication date |
---|---|
CN108734151B (en) | 2020-04-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108734151B (en) | Robust long-range target tracking method based on correlation filtering and depth twin network | |
CN110298404B (en) | Target tracking method based on triple twin Hash network learning | |
CN104574445B (en) | A kind of method for tracking target | |
CN110084836B (en) | Target tracking method based on deep convolution characteristic hierarchical response fusion | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN110197502B (en) | Multi-target tracking method and system based on identity re-identification | |
CN111476817A (en) | Multi-target pedestrian detection tracking method based on yolov3 | |
CN109816689A (en) | A kind of motion target tracking method that multilayer convolution feature adaptively merges | |
CN108961308B (en) | Residual error depth characteristic target tracking method for drift detection | |
CN110210551A (en) | A kind of visual target tracking method based on adaptive main body sensitivity | |
CN109461172A (en) | Manually with the united correlation filtering video adaptive tracking method of depth characteristic | |
CN109035172B (en) | Non-local mean ultrasonic image denoising method based on deep learning | |
Wan et al. | Unmanned aerial vehicle video-based target tracking algorithm using sparse representation | |
CN111311647B (en) | Global-local and Kalman filtering-based target tracking method and device | |
CN106295564B (en) | A kind of action identification method of neighborhood Gaussian structures and video features fusion | |
CN108062531A (en) | A kind of video object detection method that convolutional neural networks are returned based on cascade | |
CN107103326A (en) | The collaboration conspicuousness detection method clustered based on super-pixel | |
CN107689052A (en) | Visual target tracking method based on multi-model fusion and structuring depth characteristic | |
CN111583300B (en) | Target tracking method based on enrichment target morphological change update template | |
CN109087337B (en) | Long-time target tracking method and system based on hierarchical convolution characteristics | |
CN104408760A (en) | Binocular-vision-based high-precision virtual assembling system algorithm | |
CN107622507B (en) | Air target tracking method based on deep learning | |
CN103985143A (en) | Discriminative online target tracking method based on videos in dictionary learning | |
CN111968155B (en) | Target tracking method based on segmented target mask updating template | |
Li et al. | Robust object tracking with discrete graph-based multiple experts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |