CN114998628A - Template matching-based twin network long-term target tracking method - Google Patents

Template matching-based twin network long-term target tracking method Download PDF

Info

Publication number
CN114998628A
CN114998628A CN202210630014.6A CN202210630014A CN114998628A CN 114998628 A CN114998628 A CN 114998628A CN 202210630014 A CN202210630014 A CN 202210630014A CN 114998628 A CN114998628 A CN 114998628A
Authority
CN
China
Prior art keywords
target
tracking
image
template
target tracking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210630014.6A
Other languages
Chinese (zh)
Inventor
侯颖
李阳
胡鑫
吴琰
李娇
贺顺
张释如
王书朋
张红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian University of Science and Technology
Original Assignee
Xian University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian University of Science and Technology filed Critical Xian University of Science and Technology
Priority to CN202210630014.6A priority Critical patent/CN114998628A/en
Publication of CN114998628A publication Critical patent/CN114998628A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Abstract

The invention discloses a twin network long-term target tracking method based on template matching, which adopts a SimFC + + target tracking method to determine a first target tracking result image and a tracking confidence score of a current frame; judging whether the target is lost or not according to the tracking confidence score; when the target is lost, updating the target position in the current frame by adopting an NCC template matching search method; taking the target position as a center, and performing target tracking on the current frame by adopting a dynamic matching template image and a SimFC + + target tracking method to obtain a second target tracking result image; taking the second target tracking result image as a target tracking result image of the current frame; on the basis of the SimFC + + target tracking method, the target position in the current frame is found by predicting and roughly positioning through an NCC template matching search method, and the target tracking is carried out by taking the target position as the center and combining a dynamic matching template image, so that the redetection time in the long-term target tracking process can be effectively reduced.

Description

Template matching-based twin network time target tracking method
Technical Field
The invention belongs to the technical field of long-term target tracking, and particularly relates to a twin network long-term target tracking method based on template matching.
Background
The target tracking plays an important role in the fields of intelligent monitoring, intelligent transportation, automatic driving, military guidance, unmanned aircrafts and the like. With the continuous improvement of the short-term target tracking performance of deep learning, people gradually pay attention to long-term target tracking application closer to the actual scene in recent years.
Long-term tracking tasks require trackers to have the ability to capture tracked objects in long-term video and handle the disappearance and reappearance of frequently appearing targets, as compared to short-term tracking. More challenges are posed, primarily from two aspects.
Because the length of a video sequence tracked by a long-term target is far longer than that of a short-term scene, and the problems of target deformation, disappearance and reproduction are particularly prominent, the difficulties cannot be solved by directly applying a short-term target tracking algorithm, and the tracking performance is extremely reduced. Therefore, many long-term target trackers solve the problem that a target is easy to lose by a target loss judgment mechanism and a re-detection mechanism on a short-term tracking algorithm.
A target re-detection mechanism in the long-term target tracker usually adopts a method of sliding window global detection or expanding a search area, but the methods are very time-consuming due to the large detection area of the methods, and even a plurality of deep learning long-term target tracking algorithms cannot realize real-time target tracking.
Disclosure of Invention
The invention aims to provide a twin network long-term target tracking method based on template matching, which reduces the time consumption of target re-detection in the long-term target tracking process so as to meet the real-time target tracking requirement.
The invention adopts the following technical scheme: the twin network long-time target tracking method based on template matching comprises the following steps:
determining a first target tracking result image and a tracking confidence score of the current frame by adopting a SimFC + + target tracking method;
judging whether the target is lost or not according to the tracking confidence score;
when the target is lost, updating the target position in the current frame by adopting an NCC template matching search method;
taking the target position as a center, and performing target tracking on the current frame by adopting a dynamic matching template image and a SimFC + + target tracking method to obtain a second target tracking result image;
and taking the second target tracking result image as a target tracking result image of the current frame.
Further, the updating the target position in the current frame by using the NCC template matching search method includes:
calculating first similarity between the matching template and all sub-images in the current frame by taking the dynamic matching template image as the matching template;
selecting a similar sub-image corresponding to the first similarity maximum;
and determining the target position according to the similar sub-images.
Further, before determining the target position according to the similar sub-images, the method further comprises:
calculating a second similarity of the similar sub-images and the dynamic matching template images;
and when the second similarity is larger than or equal to the similarity threshold, acquiring the position corresponding to the similar sub-image, and taking the position as the target position.
Further, when the second similarity value is less than the similarity threshold, the target position of the previous frame is taken as the target position of the current frame.
Further, the second similarity calculation method includes:
Figure BDA0003678938490000031
wherein SSIM (x, y) is the similarity between image x and image y, l (x, y) is the brightness similarity function, c (x, y) is the contrast similarity function, s (x, y) is the structure similarity function, α, β and γ are constants, μ x Is the gray-scale average, μ, of the image x y Is the mean value of the gray levels, σ, of the image y x Is the variance, σ, of the image x y Is the variance, σ, of the image y xy Is the covariance of image x and image y, c 1 =(k 1 L) 2 ,c 2 =(k 2 L) 2 ,k 1 、k 2 And L are both constants.
Further, determining the tracking confidence score further comprises:
comparing the tracking confidence score to a template update threshold;
and when the tracking confidence score is larger than or equal to the template updating threshold, taking the first target tracking result image corresponding to the tracking confidence score as a new dynamic matching template image.
Further, the template updating threshold is calculated according to the tracking confidence score of the second frame of the target tracking video.
Further, when the tracking confidence score < the template update threshold, the dynamically matching template image is not updated.
Further, judging whether the target is lost according to the tracking confidence score comprises:
comparing the tracking confidence score to a loss threshold;
when the tracking confidence score is larger than or equal to a loss threshold value, determining that the target is not lost;
when the tracking confidence score is less than a loss threshold, the target is determined to be lost.
The other technical scheme of the invention is as follows: the twin network long-time target tracking device based on the template matching comprises a memory, a processor and a computer program which is stored in the memory and can run on the processor, wherein the processor realizes the twin network long-time target tracking method based on the template matching when executing the computer program.
The invention has the beneficial effects that: on the basis of the SimFC + + target tracking method, the invention carries out prediction rough positioning by an NCC template matching search method, finds the target position in the current frame, and carries out target tracking by combining a dynamic matching template image with the target position as the center, thereby effectively reducing the re-detection time in the long-term target tracking process and further meeting the requirement of real-time target tracking.
Drawings
FIG. 1 is a flowchart of a twin network time target tracking method based on template matching according to an embodiment of the present invention;
FIG. 2 is a schematic block diagram of a twin network long-term target tracking method based on template matching according to an embodiment of the present invention;
fig. 3 is a schematic diagram illustrating an update of a dynamic matching template of a video in a VOT2019_ LT data set "skiing" according to an embodiment of the present invention;
FIG. 4 is a flowchart illustrating a global search re-detection process based on dynamic template matching according to an embodiment of the present invention;
FIG. 5 is a long-term target tracking flow chart based on the SimFC + + algorithm in the embodiment of the present invention;
FIG. 6 is a comparison diagram of visual results of four long-term target tracking methods of a LaSOT data set "elepthhat-16" video;
fig. 7 is a comparison graph of visual results of four long-term target tracking methods for the VOT2019_ LT data set "wartup" video;
FIG. 8 is a diagram comparing visual results of four long-term target tracking methods for a TLP data set "CarChase 1" video;
fig. 9 is a schematic structural diagram of a twin network time target tracking device based on template matching according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
In the long-term target tracking process, when a target is lost and then the lost target is tracked again, the re-detection mechanism of the current many long-term target tracking algorithms is time-consuming, and the algorithms cannot meet the real-time tracking performance. For example, the MBMD algorithm designs a framework of a regression network and a matching network for long-term target tracking, and a target can be retrieved more effectively by a sliding window detection method, but the MBMD algorithm still has a certain defect in real-time, and the target tracking rate on the VOT2018_ LT data set is only 2.5 fps. After determining that the target is lost, the DaSiamRPN _ LT and SiamRPN + + _ LT algorithms perform target re-detection by expanding the target search area from 255x255 to 831x831, with tracking rates of 20.8fps and 21.1fps on the VOT2018_ LT data set, respectively.
Therefore, the invention provides a twin network Long-Term target Tracking method (SimTM _ LT) Based on Template Matching, which not only can remarkably improve the Long-Term Tracking performance, but also can achieve the Tracking speed of about 45fps and meet the requirement of implementing target Tracking.
Specifically, as shown in fig. 1, an embodiment of the present invention discloses a template matching-based long-term target tracking method for a twin network, including the following steps: step S110, determining a first target tracking result image and a tracking confidence score of a current frame by adopting a SimFC + + target tracking method; step S120, judging whether the target is lost or not according to the tracking confidence score; step S130, when the target is lost, updating the target position in the current frame by adopting an NCC template matching search method; step S140, taking the target position as the center, and adopting a dynamic matching template image and a SimFC + + target tracking method to track the target of the current frame to obtain a second target tracking result image; and step S150, taking the second target tracking result image as the target tracking result image of the current frame.
As shown in fig. 2, on the basis of the SiamFC + + target tracking method, the present invention performs prediction rough positioning by the NCC template matching search method, finds the target position in the current frame, and performs target tracking by using the target position as the center in combination with the dynamic matching template image, so that the re-detection time in the long-term target tracking process can be effectively reduced, and the requirement of real-time target tracking is further satisfied.
The key of the long-term target tracking method is whether the target can be stably and continuously tracked, and the target is judged to be lost and recovered again. At present, target re-detection mechanism methods such as sliding window global detection and search area expansion are often adopted in many long-term tracking methods to solve the problem of target loss, but the methods all have means such as feature map extraction, the process is time-consuming, and therefore many deep learning long-term target tracking methods cannot achieve real-time target tracking. According to the invention, through unconventional application, a template matching method is introduced for re-detection, and the template matching method can complete the coarse positioning of the target only by simple elements (such as characteristics of gray scale, texture and the like) of the image without extracting the characteristic diagram of the image, so that although the precision is relatively reduced, the problem of reduced precision can be solved through two times of SimFC + + target tracking, and even the detection precision can be improved, so that the re-detection time can be effectively reduced, and a re-detection strategy and a dynamic updating strategy of a dynamic matching template are provided, so that the reproduced target can be quickly and effectively positioned or the lost target can be corrected, and the long-term target tracking performance can be effectively improved. In addition, the method of the invention has more accurate tracking and reduces the number of video frames needing to be re-detected, thereby further reducing the tracking time, improving the tracking frame rate to 40fps and meeting the real-time target tracking performance.
In this embodiment, determining whether the target is lost according to the tracking confidence score includes: comparing the tracking confidence score to a loss threshold; when the tracking confidence score is larger than or equal to a loss threshold value, determining that the target is not lost; determining that the target is lost when the tracking confidence score is less than a loss threshold.
Specifically, in the embodiment of the present invention, updating the target position in the current frame by using the NCC template matching search method includes: calculating first similarity between the matching template and all sub-images in the current frame by taking the dynamic matching template image as the matching template; selecting a similar sub-image corresponding to the first similarity maximum; and determining the target position according to the similar sub-images.
More specifically, a search image x, an initial target template z, a dynamic matching template T ═ z, and an initial target position pos are input ini . Then, with the initial purposeThe standard template z is used as a tracking model to track the SimFC + + target to obtain a target tracking result image x of the current frame t (pos opt ) And tracking confidence score max (z, x). Then, score is calculated according to the tracking confidence score max (z, x) determining if the target is missing. If score max (z, x) is less than the redetection discrimination threshold Th tl If the target tracking is lost, a global search re-detection mechanism of dynamic template matching needs to be started. Otherwise, if score max (z, x) is at least Th tl Then the target is considered to be untracked and lost.
When the target tracking is considered to be lost, starting a global search re-detection mechanism of dynamic template matching, and determining whether to update the initial target position pos according to the SSIM similarity judgment result ini . Then using the initial position pos of the target ini In a local area with the center as the center, the dynamic matching template T is used as a tracking model to track the SiamFC + + target to obtain a target tracking result x t_T (pos opt ) Finally, the target tracking result x is updated t (pos opt )=x t_T (pos opt )。
As a specific implementation mode, when judging whether to update the target initial position, firstly inputting an image X to be searched, dynamically matching a template T and a target initial position pos ini Target sub-picture X (X) ini ,y ini ). Then, a dynamic matching template T is used for carrying out normalized cross-correlation (NCC) global template matching on the image X to be searched to obtain a re-detection rough prediction position pos tm
In one embodiment, determining the target location from the similar sub-images further comprises: calculating a second similarity of the similar sub-images and the dynamic matching template images; and when the second similarity is larger than or equal to the similarity threshold, acquiring the position corresponding to the similar sub-image, and taking the position as the target position. And when the second similarity value is less than the similarity threshold value, taking the target position of the previous frame as the target position of the current frame.
The second similarity calculation method comprises the following steps:
Figure BDA0003678938490000071
wherein SSIM (x, y) is the similarity between image x and image y, l (x, y) is the brightness similarity function, c (x, y) is the contrast similarity function, s (x, y) is the structure similarity function, α, β and γ are constants, μ x Is the gray-scale average, μ, of the image x y Is the mean value of the gray levels, σ, of the image y x Is the variance, σ, of the image x y Is the variance, σ, of the image y xy Is the covariance of image x and image y, c 1 =(k 1 L) 2 ,c 2 =(k 2 L) 2 ,k 1 、k 2 And L are both constants.
That is, the template matching image is taken as the calculation re-detection prediction target image X tm Then calculating the structural similarity (ssim) between the template and the dynamic matching template T tm =SSIM(X tm T). If ssim tm Similarity discrimination threshold Th or more tm Retention of the gross predicted position pos tm . If ssim tm Less than threshold Th tm And eliminating the prediction result of heavy detection.
If ssim tm Greater than or equal to similarity discrimination threshold Th tm Updating the target initial position as the re-detection rough prediction position pos ini =pos tm And ending the global search re-detection mechanism of the dynamic template matching, and subsequently continuing to further obtain the accurate position of the target by using a siamf fc + + tracker, wherein a specific flow chart of the method is shown in fig. 5.
If ssim tm Less than threshold Th tm Without updating the target initial position pos ini And ending the global search re-detection mechanism of the dynamic template matching.
In the embodiment of the present invention, in order to implement effective combination of the SiamFC + + target tracking method and the NCC template matching search method, after determining the tracking confidence score, the method further includes:
comparing the tracking confidence score to a template update threshold; and when the tracking confidence score is larger than or equal to the template updating threshold, taking the first target tracking result image corresponding to the tracking confidence score as a new dynamic matching template image. And calculating the template updating threshold according to the tracking confidence score of the second frame of the target tracking video. When the tracking confidence score is less than the template update threshold, the dynamically matching template image is not updated.
Specifically, the global search algorithm for matching the NCC template is widely used as an index for measuring the similarity between images due to high matching accuracy and strong robustness. In the template matching process of an embodiment, as shown in fig. 4, for an M × N size image X to be retrieved and a W × H size matching template T, the template T is translated on the image X to be retrieved, and a sub-picture X intercepted at (xy) from the image X is sequentially obtained c (x,y)={X(x+i,y+j)|i∈[1,…,W],j∈[1,…,H]And (3) adopting the formula (2) to carry out similarity comparison.
Figure BDA0003678938490000091
Wherein { NCC (x, y) | x ∈ [1, …, M],y∈[1,…,N]The position corresponding to the maximum value in the position is the template matching position pos tm =(x tm_opt ,y tm_opt ) Corresponding to sub-picture X tm =X c (x tm_opt ,y tm_opt ) Namely the dynamic template matching image. In this formula, x and y each represent a coordinate value of an image. By the method, the template T is dynamically updated frame by frame to carry out global template matching, so that the anti-interference performance is further improved.
In addition, in order to avoid accumulated errors caused by inaccurate template matching, the embodiment provides a template similarity discriminant analysis and positioning optimization strategy, so that a more accurate predicted initial position is provided for a subsequent target tracking algorithm. SSIM structural similarity is an image evaluation index more conforming to human vision, and the similarity of images is comprehensively measured from three characteristics of brightness, contrast and structure, specifically referring to formula (1).
The closer the template matching result is to the real position of the target, the more accurate the tracking performance of the subsequent target tracking algorithm is, so the improved algorithm adopts the template matching similarity judgment criterion of formula (3) to classify the matching result.
Figure BDA0003678938490000092
For dynamic matching template T and output matching image X tm Obtaining the sim tm =SSIM(X tm And T) similarity index, so as to measure the template matching accuracy. If ssim tm Greater than a set threshold Th for similarity of matches tm If so, the template matching is more accurate, and the matching position is closer to the tracking target position, so that the initial position pos of the subsequent target tracking algorithm ini Matching positions pos with templates tm Updating and replacing; else sim tm Less than threshold Th of matching similarity tm And if the template matching is not accurate enough, the initial position of the subsequent target tracking algorithm is not updated and is still the target tracking position of the previous frame. Th is obtained by performing a large number of experiments on a plurality of long-term standard data sets tm 0.15 is the optimal matching threshold.
In addition, the maximum score value of the confidence coefficient of the classification branch of the SimFC + + algorithm reflects the similarity between the predicted target and the template, so that the confidence coefficient peak value is generally low when tracking is lost. Therefore, the invention introduces the confidence peak value to judge the tracking loss state of the target and provides a re-detection mechanism starting strategy of a formula (4). And analyzing the target tracking state by using the confidence coefficient peak value, and judging whether redetection processing is needed.
Figure BDA0003678938490000101
If the confidence peak score is shown in equation (4) max (z, x) is equal to or greater than target loss discrimination threshold Th tl If the target is accurately tracked, the target re-detection mechanism is not started, i.e. the target re-detection flag is set to 0. On the contrary, if score max (z, x) is less than Th tl If the target tracking is not accurate or the target is lost, starting a target re-detection mechanism, updating the target re-detection mark to be 1, and performing target re-detection by adopting a dynamic template matching global search method.
Obtaining a target loss discrimination threshold Th by performing a large number of experiments on a plurality of long-term standard data sets tl The optimal choice is 0.3.
In a long-term video sequence, the accuracy of re-detection and positioning of a template matching target is seriously influenced by the appearance deformation of the target, so that the improved template matching strategy cannot play a good role, and therefore, the embodiment of the invention introduces a dynamic matching template updating strategy to further improve the anti-interference performance. And analyzing the target tracking state by using the confidence coefficient peak value, and judging whether the current frame tracking result can be used as a matching template.
If the confidence peak score is higher than the confidence peak score max (z, x) is equal to or greater than the template update discrimination threshold Th td If the tracking target is accurate, the current frame is used for tracking the target x (pos) opt ) Updating the replacement dynamic matching template; on the contrary, if score max (z, x) is less than Th td And if the target tracking is not accurate or the target is lost, the matching template is not updated.
The embodiment introduces a dynamic matching template updating strategy, and performs template updating as shown in formula (5) after target tracking of the SiamFC + + target of each frame. After a large amount of experimental analysis, after the first frame specifies the tracking target, the second frame can track the target position more accurately, so the confidence peak score of the second frame is used max (z,x 2 ) On the basis, a large number of experiments are carried out on a plurality of long-time standard data sets to obtain a template updating judgment threshold Th td =score max (z,x 2 ) -0.15 is the optimum choice.
Figure BDA0003678938490000111
According to the formula (5), the embodiment of the present invention discriminates and updates the dynamic matching template frame by frame, for example, fig. 3 shows an update diagram of the dynamic matching template of the video with the VOT2019_ LT data set "skiing". The first frame image target is used as an initial template, and then each video frame moving target is continuously changed in shape and scale, and has obvious appearance difference with the initial target template. The dynamic matching template is continuously updated after the algorithm target is tracked, and the updated dynamic matching template is more similar to the target form of the current video frame from the graph shown in FIG. 3, so that reliable preconditions are provided for the next frame to follow the target.
To further demonstrate the effectiveness of the embodiments of the present invention, the following comparative validation experiments were also performed. The experiment is realized by Python Pythoch programming on Intel i 7-67003.40 GHz and GeForce GTX TITAN X32 GB GPU. In the method, performance comparison is performed on three video databases, namely, a LaSOT database, a VOT2019_ LT database and a TLP database, with a plurality of advanced target tracking algorithms, wherein the LaSOT data set uses three evaluation indexes, namely Success rate (Success), Normalized Precision rate (Normalized Precision) and Precision rate (Precision). The VOT2019_ LT data set uses tracking accuracy (Precision), Recall (Recall), F-value (F-score), and tracking speed (Frames Per Second, FPS) as long-term tracking performance evaluation indexes. The TLP dataset uses two evaluation indicators, Success (Success) and Precision (Precision). According to the embodiment of the invention, the AlexNet network is adopted by the backbone network of the SimTM _ LT method to extract the characteristic information.
(1) Quantitative results and analysis:
table 1, table 2, and table 3 respectively show comparison results of the method of the present invention and a plurality of advanced long-term target tracking algorithms in the LaSOT, VOT2019_ LT, and TLP data sets.
The test set of LaSOT has 280 long-term video sequences, and the average frame number is about 2512 frames. Compared with LTMU, DIMP50, DIMP18, Global Track, SimCAR and ATOM algorithms, the success rate of the method is respectively improved by 0.2%, 0.6%, 4.0%, 5.7%, 5.8% and 5.9% compared with the methods, the normalized accuracy rate is respectively improved by 0.4%, 2.1%, 5.6%, 7.2%, 5.9% and 9.3% compared with the methods, and the accuracy rate is respectively improved by 1.4%, 2.2%, 5.3%, 5.8%, 6.2% and 8.1% compared with the methods. Compared with other algorithms in the table, the method has obvious advantages in success rate, normalized accuracy rate and accuracy rate.
TABLE 1 target tracking Performance comparison results based on LaSOT dataset
Figure BDA0003678938490000121
The VOT2019_ LT data set has 50 long-time video sequences, and the average frame number is about 4296 frames. Compared with the SiamDW _ LT algorithm, the method has the advantage that the tracking speed is 16.16 times that of the SiamDW _ LT algorithm although the tracking performance is lower. Compared with the SiamRPNs _ LT algorithm, the method has lower accuracy, but the recall rate is improved by 14.2 percent, and the F value is improved by 7.8 percent.
As shown in Table 2, compared with the algorithm of SiamRPN + + _ LT, mbdet, Siamfcos _ LT and FuCoLoT, the accuracy of the method of the present invention is respectively improved by 4.9%, 7.8%, 19.8% and 18.4%, the recall rate is respectively improved by 0.5%, 5.2%, 3.6% and 23.9%, and the F value is respectively improved by 2.5%, 6.4%, 11.4% and 22.3%. Compared with other algorithms in the table, the method has obvious advantages in accuracy, recall rate and F value. From the aspect of tracking speed, the tracking speed of the method of the present invention is 2.03 times, 20.2 times, 1.77 times, 26.93 times and 5.77 times that of the algorithm SiamRPN + + _ LT, mbdet, siamrns _ LT, siamrcos _ LT and FuCoLoT, respectively. Therefore, the method not only obviously improves the tracking performance, but also achieves the tracking frame rate of 40.4fps and has good real-time performance.
Table 2 target tracking performance comparison results based on the VOT2019_ LT dataset
Figure BDA0003678938490000131
As shown in Table 3, there are 50 long-duration video sequences in the TLP data set, with an average frame number around 13529 frames. The method has the tracking performance similar to that of a Global Track algorithm, the success rate is only slightly higher than 0.5%, and the accuracy rate is only lower than 0.4%. Compared with other algorithms in the table, the method has obvious advantages in success rate and accuracy rate.
TABLE 3 TLP dataset-based target tracking performance comparison results
Figure BDA0003678938490000132
Figure BDA0003678938490000141
In summary, fig. 6, 7 and 8 show the visual qualitative comparison results of various excellent long-term target tracking methods on the LaSOT, VOT2019_ LT and TLP data sets, respectively.
In the LaSOT data set 'elephant-16' video in FIG. 6, the target in the 540 th frame and the 1505 th frame is occluded, the method can quickly detect the position of the target and keep stable tracking. Although the LTMU method can also detect the target, the target tracking scale is inaccurate, and the tracking is easy to miss the similar object.
In fig. 7, the target at 3795 th frame of the VOT2019_ LT data set "wartup" video is completely blocked, and the target at 3800 th frame reappears, but the appearance of the target is greatly different from that of the original target template. The method of the invention can quickly and effectively detect the target through a template matching redetection mechanism in the 3805 frame, so that the SimDMT _ LT method of the invention can effectively follow the target in each subsequent frame of video. The Mbdet method does not detect the target until frame 3815, and neither of the other two methods accurately tracks the target.
In fig. 8, the video tracking result of the TLP data set "carychase 1" is shown, and the targets in the 2530 th frame and the 4800 th frame have visual fields, the method of the present invention immediately starts a template matching redetection mechanism, and after the targets in the 2560 th frame and the 4830 th frame are reproduced, the siadmt _ LT algorithm captures the reproduced targets more accurately, so that the video tracking effect of each subsequent frame is obviously more accurate.
In conclusion, the visual tracking results show that the method has good tracking performance on attribute video tracking such as 'out-of-view' and 'target shielding' in long-term target tracking application.
The present invention also discloses a twin network long-time target tracking apparatus based on template matching, as shown in fig. 9, including a memory 210, a processor 220 and a computer program 230 stored in the memory 210 and operable on the processor 220, when the processor 220 executes the computer program 230, the above-mentioned twin network long-time target tracking method based on template matching is implemented.
The device can be a desktop computer, a notebook computer, a palm computer, a cloud server and other computing equipment. The apparatus may include, but is not limited to, a processor, a memory. Those skilled in the art will appreciate that the apparatus may include more or fewer components, or combine certain components, or different components, and may also include, for example, input output devices, network access devices, and the like.
The Processor may be a Central Processing Unit (CPU), and the Processor may be other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The storage may in some embodiments be an internal storage unit of the device, such as a hard disk or a memory of the device. The memory may also be an external storage device of the apparatus in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the apparatus. Further, the memory may also include both an internal storage unit and an external storage device of the apparatus. The memory is used for storing an operating system, application programs, a BootLoader (BootLoader), data, and other programs, such as program codes of the computer programs. The memory may also be used to temporarily store data that has been output or is to be output.
It should be noted that, for the specific content of the above-mentioned apparatus, since the same concept is based on, the specific functions and the technical effects brought by the method embodiment of the present invention, reference may be made to the method embodiment section specifically, and details are not described here.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment. Those of ordinary skill in the art will appreciate that the various illustrative modules and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims (10)

1. The twin network long-time target tracking method based on template matching is characterized by comprising the following steps of:
determining a first target tracking result image and a tracking confidence score of a current frame by adopting a SimFC + + target tracking method;
judging whether the target is lost or not according to the tracking confidence score;
when the target is lost, updating the target position in the current frame by adopting an NCC template matching search method;
taking the target position as a center, and performing target tracking on the current frame by adopting a dynamic matching template image and the SiamFC + + target tracking method to obtain a second target tracking result image;
and taking the second target tracking result image as the target tracking result image of the current frame.
2. The twin network long-term target tracking method based on template matching according to claim 1, wherein updating the target position in the current frame by using an NCC template matching search method comprises:
calculating first similarity between the dynamic matching template image and all sub-images in the current frame by taking the dynamic matching template image as a matching template;
selecting a similar sub-image corresponding to the first similarity maximum;
and determining the target position according to the similar sub-images.
3. The twin network time target tracking method based on template matching as claimed in claim 2, wherein before determining the target position according to the similar sub-image, further comprising:
calculating a second similarity of the similar sub-image and the dynamic matching template image;
and when the second similarity is larger than or equal to the similarity threshold, acquiring the position corresponding to the similar sub-image, and taking the position as the target position.
4. The twin network length target tracking method based on template matching as claimed in claim 3, wherein when the second similarity value < similarity threshold, the target position of the previous frame is taken as the target position of the current frame.
5. The twin network long-time target tracking method based on template matching as claimed in any one of claims 2-4, wherein the second similarity is calculated by:
Figure FDA0003678938480000021
wherein SSIM (x, y) is the similarity between image x and image y, l (x, y) is the brightness similarity function, c (x, y) is the contrast similarity function, s (x, y) is the structure similarity function, α, β and γ are constants, μ x Is the gray-scale average, μ, of the image x y Is the gray-level mean value, σ, of the image y x Is the variance, σ, of the image x y Is the variance, σ, of the image y xy Is the covariance of image x and image y, c 1 =(k 1 L) 2 ,c 2 =(k 2 L) 2 ,k 1 、k 2 And L are both constants.
6. The twin network long time target tracking method based on template matching as claimed in any of claims 2-4, wherein determining the tracking confidence score further comprises:
comparing the tracking confidence score to a template update threshold;
and when the tracking confidence score is larger than or equal to the template updating threshold, taking the first target tracking result image corresponding to the tracking confidence score as a new dynamic matching template image.
7. The twin network time target tracking method based on template matching as claimed in claim 6, wherein the template update threshold is calculated according to the tracking confidence score of the second frame of the target tracking video.
8. The template matching based twin growth network long target tracking method of claim 6, wherein the dynamic matching template image is not updated when the tracking confidence score < a template update threshold.
9. The twin network long time target tracking method based on template matching as claimed in claim 2 or 3 or 4 or 8, wherein judging whether the target is lost according to the tracking confidence score comprises:
comparing the tracking confidence score to a loss threshold;
when the tracking confidence score is larger than or equal to the loss threshold, determining that the target is not lost;
determining that the target is lost when the tracking confidence score is less than the loss threshold.
10. A template matching based twin network long time target tracking apparatus comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor when executing the computer program implements the template matching based twin network long time target tracking method according to any one of claims 1 to 9.
CN202210630014.6A 2022-06-06 2022-06-06 Template matching-based twin network long-term target tracking method Pending CN114998628A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210630014.6A CN114998628A (en) 2022-06-06 2022-06-06 Template matching-based twin network long-term target tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210630014.6A CN114998628A (en) 2022-06-06 2022-06-06 Template matching-based twin network long-term target tracking method

Publications (1)

Publication Number Publication Date
CN114998628A true CN114998628A (en) 2022-09-02

Family

ID=83033757

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210630014.6A Pending CN114998628A (en) 2022-06-06 2022-06-06 Template matching-based twin network long-term target tracking method

Country Status (1)

Country Link
CN (1) CN114998628A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523970A (en) * 2023-07-05 2023-08-01 之江实验室 Dynamic three-dimensional target tracking method and device based on secondary implicit matching
CN117132623A (en) * 2023-10-26 2023-11-28 湖南苏科智能科技有限公司 Article tracking method, apparatus, electronic device and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116523970A (en) * 2023-07-05 2023-08-01 之江实验室 Dynamic three-dimensional target tracking method and device based on secondary implicit matching
CN116523970B (en) * 2023-07-05 2023-10-20 之江实验室 Dynamic three-dimensional target tracking method and device based on secondary implicit matching
CN117132623A (en) * 2023-10-26 2023-11-28 湖南苏科智能科技有限公司 Article tracking method, apparatus, electronic device and storage medium
CN117132623B (en) * 2023-10-26 2024-02-23 湖南苏科智能科技有限公司 Article tracking method, apparatus, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Shen et al. Fast online tracking with detection refinement
CN107563313B (en) Multi-target pedestrian detection and tracking method based on deep learning
US7756296B2 (en) Method for tracking objects in videos using forward and backward tracking
CN114998628A (en) Template matching-based twin network long-term target tracking method
CN105913028B (en) Face + + platform-based face tracking method and device
WO2023065395A1 (en) Work vehicle detection and tracking method and system
CN112836639A (en) Pedestrian multi-target tracking video identification method based on improved YOLOv3 model
CN106952293B (en) Target tracking method based on nonparametric online clustering
US20190066311A1 (en) Object tracking
CN110660102B (en) Speaker recognition method, device and system based on artificial intelligence
WO2022198817A1 (en) Vehicle image clustering method and vehicle trajectory restoration method
CN111754548A (en) Multi-scale correlation filtering target tracking method and device based on response discrimination
CN111415370A (en) Embedded infrared complex scene target real-time tracking method and system
CN108694411A (en) A method of identification similar image
He et al. Variable scale learning for visual object tracking
CN112561956B (en) Video target tracking method and device, electronic equipment and storage medium
CN112053384B (en) Target tracking method based on bounding box regression model
CN108763265A (en) A kind of image-recognizing method based on block research
CN114155411A (en) Intelligent detection and identification method for small and weak targets
CN113129332A (en) Method and apparatus for performing target object tracking
CN113836980A (en) Face recognition method, electronic device and storage medium
CN116580066B (en) Pedestrian target tracking method under low frame rate scene and readable storage medium
Zhang et al. Robust visual tracker integrating adaptively foreground segmentation into multi-feature fusion framework
Zhao et al. Fast facial feature tracking with multi-cue particle filter
Zhang et al. An Improved Object Tracking Algorithm Combining Spatio-Temporal Context and Selection Update

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination