CN107274433B - Target tracking method and device based on deep learning and storage medium - Google Patents

Target tracking method and device based on deep learning and storage medium Download PDF

Info

Publication number
CN107274433B
CN107274433B CN201710474118.1A CN201710474118A CN107274433B CN 107274433 B CN107274433 B CN 107274433B CN 201710474118 A CN201710474118 A CN 201710474118A CN 107274433 B CN107274433 B CN 107274433B
Authority
CN
China
Prior art keywords
target
frame
current frame
area
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710474118.1A
Other languages
Chinese (zh)
Other versions
CN107274433A (en
Inventor
王欣
石祥文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jilin University
Original Assignee
Jilin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jilin University filed Critical Jilin University
Priority to CN201710474118.1A priority Critical patent/CN107274433B/en
Publication of CN107274433A publication Critical patent/CN107274433A/en
Application granted granted Critical
Publication of CN107274433B publication Critical patent/CN107274433B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • G06T7/248Analysis of motion using feature-based methods, e.g. the tracking of corners or segments involving reference images or patches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30221Sports video; Sports image
    • G06T2207/30224Ball; Puck
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30236Traffic on road, railway or crossing

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A target tracking method, device and storage medium based on deep learning, the method is to read two frames of pictures continuously; respectively setting and cutting a target area of a previous frame and a search area of a current frame, and when setting and cutting the search area of the current frame, obtaining the search area by judging whether the position of a central point is stably set or not when an object moves rapidly; inputting the target area and the search area into a convolutional neural network to calculate to obtain a current frame target area; calculating to obtain the interframe displacement of the current frame relative to the previous frame target; and judging whether the current frame is the last frame or not so as to judge whether iterative target tracking is continued or not. The invention realizes the prediction of the central point position of the current frame cutting area by judging the rapid speed of the target object moving in the image, improves the target tracking accuracy and the target contact ratio on the premise of basically keeping the original high tracking speed compared with the existing algorithm, and has better tracking robustness.

Description

Target tracking method and device based on deep learning and storage medium
Technical Field
The present invention relates to the field of image processing, and in particular, to a target tracking method and apparatus based on deep learning in image processing, and a storage medium.
Background
Target tracking is a challenging research topic in the field of computer vision, and is a research hotspot because it is widely applied in many fields such as security, transportation, military, virtual reality, medical imaging, and the like. The target tracking aims to determine the continuous position of a target object in an ordered image sequence so as to facilitate further analysis and processing, thereby realizing the analysis and understanding of the motion behavior of the target object. Since the twenty-first century, information technology has been developed at a high speed, the computing performance of computers and the acquisition quality of image acquisition equipment such as cameras have been gradually improved, and more experts and scholars are invested in the related technology of research target tracking because of the increasing importance of people on their own and property safety.
The target tracking technology is one of the core research subjects in the field of computer vision, and comprises various technologies such as computer graphics, target recognition, artificial intelligence, automatic control and the like. The target tracking technology has originated in the last 50 years, and through the continuous development of more than 60 years, various tracking algorithms have been proposed so far, such as Mean Shift algorithm (Mean Shift), Background difference method (Background difference method), Background modeling method (Background modeling), optical flow method (optical flow method), Kalman Filter (Kalman Filter), Particle Filter (Particle Filter), and various improved algorithms based on the above algorithms, but these algorithms basically have certain problems and defects, such as low tracking accuracy or poor real-time performance, and are difficult to meet various requirements of real-world scene applications.
Since the concept of Deep Learning (Deep Learning) was proposed in 2006, research of Deep Learning has become popular, and more experts and scholars are invested in the research of Deep Learning, and Deep Learning has made breakthrough progress in many fields and is widely applied to fields such as computer vision, image processing, natural language processing, information classification, search, and big data. Naturally, attempts have been made to solve the target tracking problem by using deep learning methods. However, the algorithm for researching target tracking by adopting the deep learning mode is often slow due to huge calculation amount, and has poor real-time performance, so that the requirement of practical application is difficult to meet.
Therefore, how to improve both tracking accuracy and tracking efficiency in target tracking is a technical problem that needs to be solved urgently in the prior art.
Disclosure of Invention
The invention aims to provide a target tracking method, a target tracking device and a storage medium based on deep learning, which are used for processing input videos frame by frame to realize accurate tracking of a target object, enable a neural network to have stronger characteristic generalization capability through offline training of a large amount of labeled data, improve the tracking precision, accelerate the operation speed through means of cutting, GPU acceleration and the like, and improve the tracking efficiency.
In order to achieve the purpose, the invention adopts the following technical scheme:
a target tracking method based on deep learning comprises the following steps:
picture reading step S110: continuously reading two frames of pictures, including a previous frame of picture and a current frame of picture, wherein the previous frame of picture has a calculated target position, and the current frame of picture needs to calculate the target position;
area setting step S120: respectively setting and cutting a target area of a previous frame and a search area of a current frame;
the setting and cutting of the target area of the previous frame specifically comprises the following steps: knowing the center point position c of the target from the previous frame (c ═ c)x,cy) Marking the target object by taking the rectangular frame as a first bounding box as a central point, wherein the height of the first bounding box is h, the width of the first bounding box is w, and the height and the width of the target area obtained after cutting are respectively k1h and k1w. Parameter k1For controlling the size of the target area;
the setting and cutting of the search area of the current frame specifically comprises: judging whether the motion of the object in the image is stable or not, and if the speed is stable, determining that the center point position c 'of the search area of the current frame is (c'x,c'y) Equal to the center point position c of the known target of the previous frame ═ cx,cy) Adding the inter-frame displacement S of the two previous frame image objects, if the speed changes sharply, for example, decreases or increases rapidly, the center point position c 'of the search area of the current frame is (c'x,c'y) The center point position c of the known target of the previous frame is (c)x,cy) Namely, the position of the target center point of the previous frame is used as the clipping center of the current frame, the rectangular frame is used as the second bounding box for marking, the height of the second bounding box is h, the width of the second bounding box is w, and the height and the width of the clipped search area are respectively k2h and k2w. Parameter k2For controlling the size of the search area;
a feature extraction and comparison step S130: inputting the target area and the search area into a Convolutional Neural Network (CNN), performing feature extraction and feature comparison, and calculating to obtain the target area of the current frame;
interframe displacement calculation step S140: calculating to obtain the interframe displacement of the current frame relative to the target of the previous frame by using the target area of the current frame and the target area of the previous frame;
a judgment step S150: and judging whether the current frame is the last frame, if so, finishing the tracking, otherwise, entering a picture reading step S110, continuously reading two continuous frames of pictures, and continuously carrying out iterative target tracking.
Preferably, in the region setting step S120, the step of determining whether the target object moves smoothly in the image is: comparing the interframe displacement of the target of two adjacent frames in three continuous frames before the current frame, and if the interframe displacement difference of two adjacent frames in the three continuous frames is smaller, considering that the motion is stable; if the interframe displacement difference of two adjacent frames in the three continuous frames is large, the movement speed is considered to be changed violently.
Preferably, in the area setting step S120, it is determined whether the inter-frame displacement difference between two adjacent frames in the three consecutive frames is smaller than 1/3 of the inter-frame displacement between the two previous frames;
controlling the size of the region k2And k1All take the value of 2.
Preferably, in the region setting step S120, in order to avoid the situation that the actual position of the current frame target exceeds the second bounding box due to too fast change of the moving speed, when the speed is changed drastically, the size of the second bounding box is increased, that is, k is increased2The numerical value of (c).
Preferably, the feature extraction and comparison step S130 is to perform feature extraction on the target region and the search region in the convolution layer, input the extracted features into the full-link layer, perform feature comparison on the target region and the search region in the full-link layer, and finally obtain the target region of the current frame after calculation.
The invention further discloses a target tracking device based on deep learning, which comprises the following components:
a picture reading unit: continuously reading two frames of pictures, including a previous frame of picture and a current frame of picture, wherein the previous frame of picture has a calculated target position, and the current frame of picture needs to calculate the target position;
an area setting unit: respectively setting and cutting a target area of a previous frame and a search area of a current frame;
the setting and cutting of the target area of the previous frame specifically comprises the following steps: knowing the center point position c of the target from the previous frame (c ═ c)x,cy) Marking the target object by taking the rectangular frame as a first bounding box as a central point, wherein the height of the first bounding box is h, the width of the first bounding box is w, and the height and the width of the target area obtained after cutting are respectively k1h and k1w. Parameter k1For controlling the size of the target area;
the setting and cutting of the search area of the current frame specifically comprises: judging whether the motion of the object in the image is stable or not, and if the speed is stable, determining that the center point position c 'of the search area of the current frame is (c'x,c'y) Equal to the center point position c of the known target of the previous frame ═ cx,cy) Adding the inter-frame displacement S of the two previous frame image objects, if the speed changes sharply, for example, decreases or increases rapidly, the center point position c 'of the search area of the current frame is (c'x,c'y) The center point position c of the known target of the previous frame is (c)x,cy) Namely, the position of the target center point of the previous frame is used as the clipping center of the current frame, the rectangular frame is used as the second bounding box for marking, the height of the second bounding box is h, the width of the second bounding box is w, and the height and the width of the clipped search area are respectively k2h and k2w. Parameter k2For controlling the size of the search area;
and (3) feature extraction and comparison steps: inputting the target area and the search area into a Convolutional Neural Network (CNN), performing feature extraction and feature comparison, and calculating to obtain the target area of the current frame;
an interframe displacement calculation unit: calculating to obtain the interframe displacement of the current frame relative to the target of the previous frame by using the target area of the current frame and the target area of the previous frame;
a judging unit: and judging whether the current frame is the last frame, if so, finishing the tracking, otherwise, continuously reading two continuous frames of pictures by the picture reading unit, and performing iterative target tracking.
Preferably, in the region setting unit, the determining whether the object moves smoothly in the image is: comparing the interframe displacement of the target of two adjacent frames in three continuous frames before the current frame, and if the interframe displacement difference of two adjacent frames in the three continuous frames is smaller, considering that the motion is stable; if the interframe displacement difference of two adjacent frames in the three continuous frames is large, the movement speed is considered to be changed violently.
Preferably, in the region setting unit, 1/3 that whether the inter-frame displacement difference between two adjacent frames in the three consecutive frames is smaller than the inter-frame displacement between the two previous frames is judged;
controlling the size of the region k2And k1All take the value of 2.
Preferably, in the region setting unit (220), in order to avoid the situation that the actual position of the current frame target exceeds the second bounding box due to too fast change of the motion speed, when the speed is changed violently, the size of the second bounding box is increased, namely, k is increased2The value of (d); and/or
The feature extraction and comparison unit (230) firstly extracts features of the target area and the search area in the convolutional layer, then inputs the extracted features into the full-link layer, compares the features of the target area and the search area in the full-link layer, and finally obtains the target area of the current frame after calculation.
A storage medium for storing computer-executable instructions,
the computer executable instructions, when executed by a processor, perform the object tracking method as described above.
The invention sets the position of the central point of the cutting area of the current frame by judging whether the object moves rapidly or stably in the image or not through the position of the central point of the target of the previous frame, improves the target tracking accuracy rate compared with the prior algorithm, has high target coincidence degree, basically keeps the original high tracking speed and has better algorithm robustness.
Drawings
FIG. 1 is a schematic diagram of a deep learning based target tracking method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart diagram of a target tracking method based on deep learning according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a motion model of a deep learning based target tracking method according to an embodiment of the invention;
FIG. 4 is a comparative example of tracking robustness of a target tracking method according to a specific embodiment of the present invention;
FIG. 5 is another comparative example of tracking robustness of a target tracking method according to a specific embodiment of the present invention
Fig. 6 is a block diagram of a target tracking apparatus based on deep learning according to the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Referring to fig. 1, a network architecture diagram of a deep learning based target tracking method according to the present invention is shown.
The invention relates to an iterative loop method, wherein the position of a target in a previous frame is known, the target center is included, a rectangular frame is set by taking the target position as the center to serve as a first enclosing box to mark a target object, and the target object is cut to obtain a target area after being expanded; predicting the search position of the current frame by using the target position of the previous frame, setting a rectangular frame as a second enclosure box by using the search position as a center, cutting the second enclosure box after the second enclosure box is expanded to obtain a search area, wherein the size of the target area and the size of the search area can be the same or different, and inputting the search area into a Convolutional Neural Network (CNN) for calculation to obtain the target position of the current frame.
In the invention, a Caffe (convergence Architecture For FeatureExtraction) framework is preferably used For calculation, the convolutional layers of the network adopt the first 5 convolutional layers of CaffeNet, the rear 3 layers are full connection layers, each full connection layer is provided with 4096 neural nodes, the last output layer of the full connection layer is provided with 4 neural nodes, and two pairs of coordinate values of the upper left and the lower right of a tracking target are respectively output, so that the target position of the current frame is calculated.
With further reference to fig. 2, there is shown a flow chart of the target tracking method based on deep learning according to the present invention, comprising the following steps:
picture reading step S110: and continuously reading two frames of pictures, including a previous frame of picture and a current frame of picture, wherein the previous frame of picture has a calculated target position, and the current frame of picture needs to calculate the target position.
As mentioned above, the present invention is a loop iteration algorithm, in step S110, one of two consecutive frames of pictures read at a time is repeated last time, for example: reading a t-1 th frame and a t-th frame at this time, wherein the target position of the t-1 th frame is known and needs to be calculated; the next reading is the t frame and the t +1 frame, and then the clipping center of the t +1 frame is calculated.
Area setting step S120: respectively setting and cutting a target area of a previous frame and a search area of a current frame;
the setting and cutting of the target area of the previous frame specifically comprises the following steps: knowing the center point position c of the target from the previous frame (c ═ c)x,cy) Marking the target object by taking the rectangular frame as a first bounding box, wherein the height of the first bounding box is h, the width of the first bounding box is w, and the height and the width of the target area obtained after cutting are respectively k1h and k1w. Parameter k1For controlling the size of the target area;
the setting and cutting of the search area of the current frame specifically comprises: judging whether the motion of the object in the image is stable or not, and if the motion is stable, determining that the center point position c 'of the search area of the current frame is (c'x,c'y) Equal to the center point position c of the known target of the previous frame ═ cx,cy) Adding the inter-frame displacement S of the two previous frame image objects, if the speed changes sharply, for example, decreases or increases rapidly, the center point position c 'of the search area of the current frame is (c'x,c'y) The center point position c of the known target of the previous frame is (c)x,cy) Namely, the position of the target center point of the previous frame is used as the clipping center of the current frame, the rectangular frame is used as the second bounding box for marking, the height of the second bounding box is h, the width of the second bounding box is w, and the height and the width of the clipped search area are respectively k2h and k2w. Parameter k2For controlling the size of the search area.
In one embodiment, k is2And k1All take the value of 2.
Further, judging whether the object moves stably in the image is as follows: comparing the interframe displacement of the target of two adjacent frames in three continuous frames before the current frame, and if the interframe displacement difference of two adjacent frames in the three continuous frames is smaller, for example, smaller than 1/3 of the interframe displacement of the two previous frames, considering that the motion is stable in speed; if the interframe displacement of two adjacent frames in three consecutive frames is greatly different, for example greater than 1/3 of the interframe displacement of the previous two frames, the speed is considered to be changed drastically. The inter-frame displacement refers to the change of the relative position of two continuous frames of the target in the image.
Specifically, first, the picture of the previous frame (t-1 st frame) is cropped, and the tracking target is located at the middle position of the image block after cropping. In the tracking process, the target object is marked by using a rectangular frame as a first bounding box, and the coordinate of the center point of the bounding box is set as c ═ c (c)x,cy) H and w, and the height and width of the cut picture are respectively k1h and k1w. Parameter k1Used for controlling the size of the target area and determining the amount of background information, k, in the clipped picture1The larger the value is, the larger the area of the clipped picture is, and the included background information is correspondingly increased; likewise, k1The smaller the value is taken, the smaller the area of the cropped picture becomes, and the less background information is included. For objects with strongly varying motion speed, k should be increased1To enlarge the target region, k in the experimental environment of the present invention1The value of (2) is taken.
For the current frame, different objects in the real scene generally have different moving speeds, and the moving speed of some objects is very fast and may also change dramatically (decrease rapidly or increase rapidly). After a target object moving rapidly is captured by a camera and shot into a video and is segmented into frames, a certain inter-frame difference exists between two continuous frames of pictures at the position (not the absolute position in a scene) of the target object in the pictures, the inter-frame difference is smaller when the moving speed is lower, and the inter-frame difference is correspondingly increased when the moving speed is higher.
Referring first to fig. 3, a schematic diagram of a motion model of a deep learning based target tracking method according to an embodiment of the present invention is shown.
Suppose that the current frame (t-th frame) target is located at xtAt position, the t-1 frame target is located at xt-1At position, the t-2 frame is located at xt-2At position, the t-3 th frame is located at xt-3At position, the t +1 th frame is located at xt+1At position, let:
st-2=xt-2-xt-3……………………………(1)
st-1=xt-1-xt-2… … … … … … … … … … … … (2) wherein st-2Represents the displacement between the t-3 th frame and the t-2 th frame with the direction xt-3Point of direction xt-2;st-1Represents the displacement between the t-2 th frame and the t-1 th frame with the direction xt-2Point of direction xt-1
The following will discuss the moving speed of the target object in two processes, namely deceleration and acceleration, respectively:
(1) when the movement of the target object is in the process of deceleration, e.g. xt-3To xt+1The motion trajectory of the segment is shown.
Wherein x ist-3To xt-1The speed of the segments not varying significantly, i.e. st-2And st-1Are not very different in size; and xt-1To xt+1And section, the speed is rapidly reduced to 0. For the speed degree of the change of the target motion speed, the invention obtains the speed degree through a plurality of experiments
Figure BDA0001327807390000091
As a criterion.
When in use
Figure BDA0001327807390000092
When the motion speed of the target object changes little, namely the displacement difference of the target object in three continuous frames is small, for example, xt-3To xtAnd (4) section. At this time, the clipping center x of the current frametThe value of' is obtained as follows:
xt'=xt-1+st-1…………………………(3)
as can be seen in FIG. 3, the clipping center xt' position and actual position x of the current frame (t-th frame)tIs much smaller than the actual position x of the previous frame (t-1 th frame)t-1With the actual position x of the current frame (t-th frame)tThe distance between the target object and the target object is used for explaining that the motion model provided by the invention has more obvious advantages for tracking the target object which moves rapidly.
When in use
Figure BDA0001327807390000101
When the displacement of two continuous frames is different greatly, it indicates that the movement speed of the target object is changed greatly, such as xt-1To xt+1And (4) section. At this time, the clipping center x of the current framet' obtained by the following formula:
xt'=xt………………………………(4)
that is, here, when the speed is changed drastically, the target center of the previous frame (t-1 th frame) is set as the clipping center of the current frame (t-th frame). In addition, the value range of t in the invention is t ≧ 4, and the tracking of the 2 nd frame and the 3 rd frame is also applicable to the formula 4.
(2) When the movement of the target object is in the process of acceleration, e.g. xt+1To xt+5The motion trajectory of the segment is shown. Wherein x ist+1To xt+3The speed of the segment increases rapidly from 0, when the center of the crop is compared with xt-1To xt+1Solving the segments; and xt+3To xt+5The speed of the segment does not change significantly, when the center of the cut is compared to xt-3To xt-1Is made by segmentsAnd (6) solving.
Assume that the center point coordinate of the target object in the current frame picture (t-th frame) is c '═ c'x,c'y) Calculating according to formula (3) and formula (4) to obtain the clipping center of the current frame, setting a second bounding box with the position as the center, the height as h and the width as w, and then setting k as2h and k2w sets the search area, k2And k1Again, the value is 2.
Therefore, in this step, firstly, whether the motion of the object is stable or not is judged through the interframe displacement of the adjacent three frames, if the difference value of the interframe displacement is small, namely the motion of the object is stable, the clipping center of the current frame (the t-th frame) is obtained by adding the target position of the previous frame (the t-1 th frame) and the displacement S between the previous two frames (the t-2 th frame and the t-1 th frame); when the speed of the current frame is changed severely (reduced or increased rapidly), the displacement between frames is changed greatly, and the prediction of the cutting center of the current frame by adding the target position and the displacement S between the previous two frames does not have reference meaning and may bring larger error.
Further, in order to avoid the situation that the actual position of the current frame target exceeds the second bounding box due to too fast change of the motion speed, when the speed is changed violently, the size of the second bounding box can be increased, namely, k is increased2Thus increasing the area of search comparison to avoid the above-mentioned situation.
A feature extraction and comparison step S130: and inputting the target area and the search area into a Convolutional Neural Network (CNN), performing feature extraction and feature comparison, and calculating to obtain the target area of the current frame.
Specifically, firstly, feature extraction is carried out on a target area and a search area in the convolutional layer, then the extracted features are input into a full connection layer, feature comparison is carried out on the target area and the search area in the full connection layer, and finally the target area of the current frame is obtained after calculation.
This step is to use a convolutional neural network for the acquisition of the current frame target region, and before using, the convolutional neural network should use video and/or pictures for deep learning, i.e., training.
Interframe displacement calculation step S140: and calculating to obtain the interframe displacement of the current frame relative to the target of the previous frame by using the target area of the current frame and the target area of the previous frame.
This step is used in the iterative calculation, and is used in the region setting step to calculate whether the object is a subject whose moving speed is drastically changed, and to calculate the center position of the search region.
A judgment step S150: and judging whether the current frame is the last frame, if so, finishing the tracking, otherwise, entering a picture reading step S110, continuously reading two continuous frames of pictures, and continuously carrying out iterative target tracking.
This step is for determining whether target tracking has ended or should continue.
The network training of the invention adopts the following method:
1. training set
The training set includes two parts, video from the ALOV300+ + dataset and pictures from the ImageNet2012 dataset.
The ALOV300+ + dataset is a video dataset that is often used to test the performance of various target tracking algorithms, and has the following address: http:// alov300pp. There are 314 segments of video in the ALOV300+ + dataset, containing 14 types of video: light, surface cover, Specularity, Transparency, Shape, MotionSmoothness, MotionCoherence, Clutter, fusion, LowContrast, occupancy, MovingCamera, ZoomingCamera, LongDuration, which are classified respectively for problems of illumination change, Occlusion, target deformation, camera movement, etc., and can effectively train the neural network for the problems, so as to better deal with and process the problems. Except that type 14 LongDuration contains 10 segments of long video of 1-2 minutes, the other videos are relatively short, with an average duration of 9.2 seconds per segment and a maximum duration of 35 seconds. These videos are segmented into frames and presented in the form of pictures, which are about 15 ten thousand frames of pictures, and contain 314 different types of target objects, and the positions of the target objects in all the pictures are manually marked with group route.
The present invention divides the 314 video sequence into two parts by decimating 1 segment every 5 segments of video. For example, 33 segments of Light type video, 7 segments of 1, 6, 11, 16, 21, 26 and 31 are extracted, and other types of video are divided according to the method. After the division is completed, the first 251 video sequence section comprises 11.8 ten thousand pictures and is used for training a network; the second part, 64 video sequence, contains 3.2 ten thousand pictures, which is used as a validation set for neural network hyper-parameter tuning (hyper-parameter tuning).
The ImageNet2012 data set is a massive picture data set containing 135 ten thousand pictures, wherein 120 thousand pictures are in the training set, 5 thousand pictures are in the verification set, and 10 ten thousand pictures are in the test set. In view of the huge data volume of the ImageNet2012 data set, the ImageNet2012 data set cannot be used for training the network, and 10 ten thousand test set pictures in the ImageNet2012 data set are used as the training set of the invention. The image training set is used for pre-training the neural network, so that massive image information of the ImageNet2012 data set is fully utilized, the classification and recognition capability of the neural network is improved, and the network learns the appearance model of the target object.
2. Test set
The test set uses a VOT2016 data set, which is also a video data set, and contains 60 video segments, including 2.1 ten thousand pictures, and the positions of the target objects in all the pictures are manually marked with group route, website: http:// www.votchallenge.net/vot2016/dataset. The VOT2016 dataset is a standard dataset for object tracking, which can be used for comparison and quantification with various object tracking algorithms currently in the state of the art. The VOT2016 data set contains rich object types, and specific detection labels are set for the problems of shielding, illumination change, target deformation, camera movement and the like in target tracking, so that the data set is adopted for testing the neural network of the algorithm.
3. Training strategy
Pre-training a neural network by using partial pictures in an ImageNet2012 data set, and training the capability of accurately positioning the position of a target object in an image B when the characteristics of the target object in the image A are known by the neural network so that the network learns an apparent model of the target object; then, 251 sections of video sequences in the training set are used for training the neural network, so that the neural network learns the continuous motion of different types of objects, the neural network obtains the capability of tracking moving objects in the video sequences, and the network learns the motion model of the target object; and finally, training the neural network again by using 64 video sequences in the verification set, and continuously adjusting the hyper-parameter (hyper-parameter tuning) of the neural network to ensure that the neural network obtains excellent target identification and tracking capability.
Example 1:
in the present embodiment, a comparative example of the method of the present invention with other target tracking methods is shown.
At present, algorithms for researching the target Tracking problem by adopting a deep learning method are mostly slower, and the fastest algorithm is a regression network-based general target Tracking algorithm goturn (general Object Tracking using regression networks) proposed in 2016. In order to evaluate the performance of the algorithm more accurately and objectively, the invention designs a plurality of groups of comparison experiments to be compared with a GOTURN algorithm, and evaluates the performance of the target tracking algorithm in three aspects of accuracy, instantaneity and robustness: and quantifying the tracking accuracy by using the tracking accuracy and the contact ratio, quantifying the real-time performance by using the tracking speed, and carrying out qualitative analysis on the robustness evaluation experiment.
The configuration of the PC used in the comparative experiments designed by the present invention is shown in Table 1:
TABLE 1 Experimental apparatus parameter configuration
Figure BDA0001327807390000141
(1) Difficulties and challenges of target tracking
The test set VOT2016 contains 60 video sequences, limited to space, and the present invention does not list all of the 60 video sequences, but rather picks 8 challenging video segments for presentation. These 8 video sequences include various challenges and difficulties that are often present in most of the target tracking problems, such as camera shake, illumination change, motion blur, occlusion, target scale change, etc., as shown in table 2:
TABLE 2 various challenges and difficulties in video sequences
Figure BDA0001327807390000151
(2) Tracking accuracy
The target tracking accuracy defined by the invention is calculated as follows: firstly, calculating the center point error S between the tracking result and the Ground TrutherrorThen calculating the center point error SerrorLess than a set threshold t0(taking t in the invention)020 pixels) of the tracking target frame number FtAnd the ratio E of the total number F of the video frames is called the target tracking accuracy, and the calculation formula is as follows:
Figure BDA0001327807390000161
error of center point SerrorThe calculation is carried out through the average Euclidean distance between the tracking result and the group Truth, and the calculation formula is as follows:
Figure BDA0001327807390000162
in the above formula, x and y represent coordinate values of the tracking result in the x direction and the y direction, respectively; x is the number ofgAnd ygAnd the group Truth respectively represents the coordinate values of the tracking target in the x direction and the y direction.
The data of 8-group comparison experiments performed on the test set VOT2016 of the present invention are shown in Table 3:
TABLE 3 tracking accuracy (%)
Video sequence name GOTURN algorithm Algorithm of the invention
ball1 87.34 91.75
gymnastics2 89.91 95.06
gymnastics3 50.07 81.40
hand 37.85 77.28
leaves 24.60 63.18
motocross1 92.59 94.30
road 71.36 83.86
soccer2 45.77 71.56
Table 3 is a partial statistical result of comparison experiments of tracking accuracy of the GOTURN algorithm and the algorithm of the present invention on the test set VOT 2016. For tracking three sequences, namely ball1, gymnics 2 and motocrosss 1, the GOTURN algorithm has good effect, but the algorithm has more excellent performance, and the tracking accuracy is improved by a few percent. And for tracking the remaining 5 segments of videos, the GOTURN algorithm has poor performance, and particularly for tracking the hand sequence and the leave sequence, a serious frame loss phenomenon occurs. Because the target object is small, the search area obtained by cutting is relatively small, and for an object with a high moving speed, the target can run out of the search area due to overlarge interframe displacement, and the tracking of the GOTURN algorithm is invalid. The algorithm of the invention improves the tracking accuracy by a large extent after considering the influence of interframe displacement, wherein the tracking accuracy of the hand sequence and the leave sequence is improved by nearly 40%.
(3) Tracking contact ratio
The tracking contact ratio defined by the invention refers to the ratio between the tracking frame of the target object and the marking frame of the Ground Truth, and the calculation formula is as follows:
Figure BDA0001327807390000171
in the above formula, S represents a tracking contact ratio; rgThe tracking accuracy of the algorithm is higher as the tracking contact degree is higher according to formula 7. table 4 lists the tracking contact degrees of 8 different algorithms in the test set VOT 2016.
TABLE 4 tracking contact ratio (%)
Figure BDA0001327807390000172
Figure BDA0001327807390000181
Table 4 is a partial statistical result of comparison experiments of tracking contact ratio between the GOTURN algorithm and the algorithm of the present invention on the test set VOT 2016. The tracking accuracy measure introduced in the previous section is the distance between the tracking target and the Ground Truth, and the tracking coincidence measure in the current section is the coincidence degree between the tracking frame of the tracking target and the marking frame of the Ground Truth. Generally, the closer the distance, the higher the degree of overlap, so the data in Table 4 generally appear consistent with Table 3. For tracking three sequences, namely ball1, gymnics 2 and motocross1, the performance of the two algorithms is relatively better, while for tracking the hand sequence and the leave sequence, the performance of the two algorithms is not ideal, but the tracking coincidence degree of the algorithm is higher than that of the GOTURN algorithm, which shows that the algorithm is more excellent than the GOTURN algorithm.
(4) Tracking speed
The tracking speed defined by the invention refers to the ratio of the total number of the tracked video frames to the tracking time, and the calculation formula is as follows:
Figure BDA0001327807390000182
in the above formula, V represents a tracking speed; n represents the total frame number of a certain section of tracked video; t denotes the duration of tracking the video. Table 5 lists the tracking speed of the different algorithms in the test set VOT 2016.
TABLE 5 tracking speed (Frames/sec)
Figure BDA0001327807390000183
Figure BDA0001327807390000191
The algorithm of the invention is different from the GOTURN algorithm in the adopted motion model, the motion model adopted by the GOTURN algorithm is too simple and has poor performance when tracking a fast moving target, and the motion model constructed by the invention is mainly used for solving the tracking problem of the fast moving target. The motion model designed by the invention only relates to some simple interframe coordinate operations, does not relate to complex image operations, and has little increased algorithm complexity, so that the tracking accuracy is improved, the tracking speed is not reduced basically, and the tracking speed equivalent to the GOTURN algorithm is maintained basically.
(5) Tracking robustness
The algorithm of the invention performs equivalently to the GOTURN algorithm for tracking video sequences with slow motion speed or relatively large target objects. For illustration of the tracking effect of the algorithm of the present invention, and at the same time, for space, 2 sets of tracking results of video sequences with faster motion speed or relatively smaller target object are selected, wherein the solid line boxes represent the tracking result of the GOTURN algorithm, and the dashed line boxes represent the tracking result of the algorithm herein.
Two video sequences including football and sports motorcycle contain most common difficulties in the target tracking process, and meanwhile, the target object is relatively small and moves fast, and the problems bring great challenges to the correct tracking of the target. For a sequence of soccer balls, the ball is relatively small and, after being struck or hitting the ground, moves at a fast speed, causing the target object to "run out" of the search area, resulting in failed tracking by the GOTURN algorithm. For a motorcycle in motion, the motorcycle has a relatively fast motion speed and a relatively long shooting distance, so that a target area in a shot picture is relatively small, and the problems are quite challenging for the GOTURN algorithm. According to the algorithm, after the influence of the inter-frame difference on target tracking is considered, a motion model based on the inter-frame difference is constructed, and experiments show that the algorithm has better robustness compared with the GOTURN algorithm.
Referring to fig. 6, the invention further discloses a target tracking device based on deep learning, comprising the following components:
the picture reading unit 210: continuously reading two frames of pictures, including a previous frame of picture and a current frame of picture, wherein the previous frame of picture has a calculated target position, and the current frame of picture needs to calculate the target position;
area setting section 220: respectively setting and cutting a target area of a previous frame and a search area of a current frame;
the setting and cutting of the target area of the previous frame specifically comprises the following steps: knowing the center point position c of the target from the previous frame (c ═ c)x,cy) Marking the target object by taking the rectangular frame as a first bounding box as a central point, wherein the height of the first bounding box is h, the width of the first bounding box is w, and the height and the width of the target area obtained after cutting are respectively k1h and k1w. Parameter k1For controlling the size of the target area;
the setting and cutting of the search area of the current frame specifically comprises: judging whether the motion of the object in the image is stable or not, and if the speed is stable, determining that the center point position c 'of the search area of the current frame is (c'x,c'y) Equal to the center point position c of the known target of the previous frame ═ cx,cy) Adding the inter-frame displacement S of the two previous frame image objects, if the speed changes sharply, for example, decreases or increases rapidly, the center point position c 'of the search area of the current frame is (c'x,c'y) The center point position c of the known target of the previous frame is (c)x,cy) Namely, the position of the target center point of the previous frame is used as the clipping center of the current frame, the rectangular frame is used as the second bounding box for marking, the height of the second bounding box is h, the width of the second bounding box is w, and the height and the width of the clipped search area are respectively k2h and k2w. Parameter k2For controlling the size of the search area;
feature extraction and comparison unit 230: inputting the target area and the search area into a Convolutional Neural Network (CNN), performing feature extraction and feature comparison, and calculating to obtain the target area of the current frame;
the interframe displacement calculation unit 240: calculating to obtain the interframe displacement of the current frame relative to the target of the previous frame by using the target area of the current frame and the target area of the previous frame;
the judgment unit 250: and judging whether the current frame is the last frame, if so, finishing the tracking, otherwise, continuously reading two continuous frames of pictures by the picture reading unit, and performing iterative target tracking.
Further, in the region setting unit 220, determining whether the object moves smoothly in the image is: comparing the interframe displacement of the target of two adjacent frames in three continuous frames before the current frame, and if the interframe displacement difference of two adjacent frames in the three continuous frames is smaller, considering that the motion is stable; if the interframe displacement difference of two adjacent frames in the three continuous frames is large, the movement speed is considered to be changed violently.
Further, in the area setting unit 220, it is determined whether the inter-frame displacement difference between two adjacent frames in the three consecutive frames is smaller than 1/3 of the inter-frame displacement between the two previous frames;
controlling the size of the region k2And k1All take the value of 2.
Further, in the region setting unit 220, in order to avoid the situation that the actual position of the current frame target exceeds the second bounding box due to too fast change of the moving speed, when the speed is changed drastically, the size of the second bounding box is increased, that is, k is increased2The value of (d); and/or
The feature extraction and comparison unit 230 first performs feature extraction on the target region and the search region in the convolutional layer, then performs feature comparison on the target region and the search region in the fully connected layer, and finally obtains the target region of the current frame after calculation.
The present invention still further discloses a storage medium for storing computer-executable instructions,
the computer executable instructions, when executed by a processor, perform the method described above.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product. Accordingly, various aspects of the present invention may take the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module "or" system. Further, aspects of the invention may take the form of: a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.
Any combination of one or more computer-readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to: electromagnetic, optical, or any suitable combination thereof. The computer readable signal medium may be any of the following computer readable media: is not a computer readable storage medium and may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including: object oriented programming languages such as Java, Smalltalk, C + +, and the like; and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package; executing in part on a user computer and in part on a remote computer; or entirely on a remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider)
While the invention has been described in further detail with reference to specific preferred embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (6)

1. A target tracking method based on deep learning comprises the following steps:
picture reading step S110: continuously reading two frames of pictures, including a previous frame of picture and a current frame of picture, wherein the previous frame of picture has a calculated target position, and the current frame of picture needs to calculate the target position;
area setting step S120: respectively setting and cutting a target area of a previous frame and a search area of a current frame;
the setting and cutting of the target area of the previous frame specifically comprises the following steps: knowing the center point position c of the target from the previous frame (c ═ c)x,cy) Marking the target object by taking the rectangular frame as a first bounding box as a central point, wherein the height of the first bounding box is h, the width of the first bounding box is w, and the rectangular frame is cut to obtain the target objectRespectively, the height and width of the target region of (2) are k1h and k1w, parameter k1For controlling the size of the target area;
the setting and cutting of the search area of the current frame specifically comprises: judging whether the motion of the object in the image is stable or not, if so, determining that the position c 'of the center point of the search area of the current frame is (c'x,c′y) Equal to the center point position c of the known target of the previous frame ═ cx,cy) Adding the interframe displacement S of the image targets of the previous two frames, and if the speed changes greatly, determining that the position c 'of the central point of the search area of the current frame is (c'x,c′y) The center point position c of the known target of the previous frame is (c)x,cy) Namely, the position of the target center point of the previous frame is used as the clipping center of the current frame, the rectangular frame is used as the second bounding box for marking, the height of the second bounding box is h, the width of the second bounding box is w, and the height and the width of the clipped search area are respectively k2h and k2w, parameter k2For controlling the size of the search area;
a feature extraction and comparison step S130: inputting the target area and the search area into a Convolutional Neural Network (CNN), performing feature extraction and feature comparison, and calculating to obtain the target area of the current frame;
interframe displacement calculation step S140: calculating to obtain the interframe displacement of the current frame relative to the target of the previous frame by using the target area of the current frame and the target area of the previous frame;
a judgment step S150: judging whether the current frame is the last frame, if so, finishing the tracking, otherwise, entering a picture reading step S110, continuously reading two continuous frames of pictures, and continuously carrying out iterative target tracking;
in the region setting step S120, it is determined whether the target object moves smoothly in the image: comparing the interframe displacement of the target of two adjacent frames in three continuous frames before the current frame, and if the interframe displacement difference of two adjacent frames in the three continuous frames is smaller, considering that the motion is stable; if the interframe displacement difference of two adjacent frames in the continuous three frames is large, the movement speed is considered to be changed violently;
in the step S120 of setting the area, it is determined whether the inter-frame displacement difference between two adjacent frames in the three consecutive frames is smaller than 1/3 of the inter-frame displacement between the two previous frames;
controlling the size of the region k2And k1All take the value of 2.
2. The target tracking method of claim 1, wherein:
in the region setting step S120, in order to avoid the situation that the actual position of the current frame target exceeds the second bounding box due to too fast change of the moving speed, when the speed is changed drastically, the size of the second bounding box is increased, that is, k is increased2The numerical value of (c).
3. The target tracking method of claim 1, wherein:
the step S130 of extracting and comparing features is to extract features of the target region and the search region in the convolutional layer, input the extracted features into the full link layer, compare the features of the target region and the search region in the full link layer, and calculate the target region of the current frame.
4. A target tracking device based on deep learning comprises the following components:
picture reading unit (210): continuously reading two frames of pictures, including a previous frame of picture and a current frame of picture, wherein the previous frame of picture has a calculated target position, and the current frame of picture needs to calculate the target position;
region setting unit (220): respectively setting and cutting a target area of a previous frame and a search area of a current frame;
the setting and cutting of the target area of the previous frame specifically comprises the following steps: knowing the center point position c of the target from the previous frame (c ═ c)x,cy) And as a central point, marking the target object by taking the rectangular frame as a first bounding box, wherein the height of the first bounding box is h, the width of the first bounding box is w, and the height and the width of the target area obtained after cutting are respectively k1h and k1w, parameter k1For controlling the size of the target area;
the setting and cutting of the search area of the current frame specifically comprises: judging whether the motion of the object in the image is stable or not, if so, determining that the position c 'of the center point of the search area of the current frame is (c'x,c′y) Equal to the center point position c of the known target of the previous frame ═ cx,cy) Adding the interframe displacement S of the image targets of the previous two frames, and if the speed changes greatly, determining that the position c 'of the central point of the search area of the current frame is (c'x,c′y) The center point position c of the known target of the previous frame is (c)x,cy) Namely, the position of the target center point of the previous frame is used as the clipping center of the current frame, the rectangular frame is used as the second bounding box for marking, the height of the second bounding box is h, the width of the second bounding box is w, and the height and the width of the clipped search area are respectively k2h and k2w, parameter k2For controlling the size of the search area;
a feature extraction and comparison step (230): inputting the target area and the search area into a Convolutional Neural Network (CNN), performing feature extraction and feature comparison, and calculating to obtain the target area of the current frame;
interframe displacement calculation unit (240): calculating to obtain the interframe displacement of the current frame relative to the target of the previous frame by using the target area of the current frame and the target area of the previous frame;
determination unit (250): judging whether the current frame is the last frame, if so, finishing the tracking, otherwise, continuously reading two continuous frames of pictures by the picture reading unit, and carrying out iterative target tracking;
in the area setting unit (220), whether the object moves smoothly in the image is judged as follows: comparing the interframe displacement of the target of two adjacent frames in three continuous frames before the current frame, and if the interframe displacement difference of two adjacent frames in the three continuous frames is smaller, considering that the motion is stable; if the interframe displacement difference of two adjacent frames in the continuous three frames is large, the movement speed is considered to be changed violently;
in the area setting unit (220), 1/3 judging whether the interframe displacement difference of two adjacent frames in the continuous three frames is smaller than that of the previous two frames;
controlling the size of the region k2And k1All take the value of 2.
5. The object tracking device of claim 4, wherein:
in the area setting unit (220), in order to avoid the situation that the actual position of the current frame target exceeds the second enclosing box due to the fact that the movement speed changes too fast, when the speed changes violently, the size of the second enclosing box is increased, namely k is increased2The value of (d); and/or
The feature extraction and comparison unit (230) firstly extracts features of the target area and the search area in the convolutional layer, then inputs the extracted features into the full-link layer, compares the features of the target area and the search area in the full-link layer, and finally obtains the target area of the current frame after calculation.
6. A storage medium for storing computer-executable instructions,
the computer executable instructions, when executed by a processor, perform the target tracking method of any one of claims 1-3.
CN201710474118.1A 2017-06-21 2017-06-21 Target tracking method and device based on deep learning and storage medium Active CN107274433B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710474118.1A CN107274433B (en) 2017-06-21 2017-06-21 Target tracking method and device based on deep learning and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710474118.1A CN107274433B (en) 2017-06-21 2017-06-21 Target tracking method and device based on deep learning and storage medium

Publications (2)

Publication Number Publication Date
CN107274433A CN107274433A (en) 2017-10-20
CN107274433B true CN107274433B (en) 2020-04-03

Family

ID=60068118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710474118.1A Active CN107274433B (en) 2017-06-21 2017-06-21 Target tracking method and device based on deep learning and storage medium

Country Status (1)

Country Link
CN (1) CN107274433B (en)

Families Citing this family (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766821B (en) * 2017-10-23 2020-08-04 江苏鸿信系统集成有限公司 Method and system for detecting and tracking full-time vehicle in video based on Kalman filtering and deep learning
CN109754412B (en) * 2017-11-07 2021-10-01 北京京东乾石科技有限公司 Target tracking method, target tracking apparatus, and computer-readable storage medium
CN108021883B (en) * 2017-12-04 2020-07-21 深圳市赢世体育科技有限公司 Method, device and storage medium for recognizing movement pattern of sphere
CN108171752A (en) * 2017-12-28 2018-06-15 成都阿普奇科技股份有限公司 A kind of sea ship video detection and tracking based on deep learning
CN108510523A (en) * 2018-03-16 2018-09-07 新智认知数据服务有限公司 It is a kind of to establish the model for obtaining object feature and object searching method and device
CN108805907B (en) * 2018-06-05 2022-03-29 中南大学 Pedestrian posture multi-feature intelligent identification method
CN110830846B (en) * 2018-08-07 2022-02-22 阿里巴巴(中国)有限公司 Video clipping method and server
CN109086725B (en) * 2018-08-10 2021-01-05 北京华捷艾米科技有限公司 Hand tracking method and machine-readable storage medium
CN109087510B (en) * 2018-09-29 2021-09-07 讯飞智元信息科技有限公司 Traffic monitoring method and device
CN109446978B (en) * 2018-10-25 2022-01-07 哈尔滨工程大学 Method for tracking moving target of airplane based on staring satellite complex scene
CN111127510B (en) * 2018-11-01 2023-10-27 杭州海康威视数字技术股份有限公司 Target object position prediction method and device
CN109726683B (en) * 2018-12-29 2021-06-22 北京市商汤科技开发有限公司 Target object detection method and device, electronic equipment and storage medium
CN109816014A (en) * 2019-01-22 2019-05-28 天津大学 Generate method of the deep learning target detection network training with labeled data collection
US10943132B2 (en) * 2019-04-10 2021-03-09 Black Sesame International Holding Limited Distant on-road object detection
CN110189364B (en) * 2019-06-04 2022-04-01 北京字节跳动网络技术有限公司 Method and device for generating information, and target tracking method and device
CN110378938A (en) * 2019-06-24 2019-10-25 杭州电子科技大学 A kind of monotrack method based on residual error Recurrent networks
CN110276739B (en) * 2019-07-24 2021-05-07 中国科学技术大学 Video jitter removal method based on deep learning
CN110533699B (en) * 2019-07-30 2024-05-24 平安科技(深圳)有限公司 Dynamic multi-frame velocity measurement method for pixel change based on optical flow method
CN110647836B (en) * 2019-09-18 2022-09-20 中国科学院光电技术研究所 Robust single-target tracking method based on deep learning
CN111274914B (en) * 2020-01-13 2023-04-18 目骉资讯有限公司 Horse speed calculation system and method based on deep learning
CN110956165B (en) * 2020-02-25 2020-07-21 恒大智慧科技有限公司 Intelligent community unbundling pet early warning method and system
CN111311643B (en) * 2020-03-30 2023-03-24 西安电子科技大学 Video target tracking method using dynamic search
CN111627046A (en) * 2020-05-15 2020-09-04 北京百度网讯科技有限公司 Target part tracking method and device, electronic equipment and readable storage medium
CN112037257B (en) * 2020-08-20 2023-09-29 浙江大华技术股份有限公司 Target tracking method, terminal and computer readable storage medium thereof
CN112184770A (en) * 2020-09-28 2021-01-05 中国电子科技集团公司第五十四研究所 Target tracking method based on YOLOv3 and improved KCF
CN112188212B (en) * 2020-10-12 2024-02-13 杭州电子科技大学 Intelligent transcoding method and device for high-definition monitoring video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750522A (en) * 2012-06-18 2012-10-24 吉林大学 Method for tracking targets
CN105741316A (en) * 2016-01-20 2016-07-06 西北工业大学 Robust target tracking method based on deep learning and multi-scale correlation filtering
CN106875425A (en) * 2017-01-22 2017-06-20 北京飞搜科技有限公司 A kind of multi-target tracking system and implementation method based on deep learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750522A (en) * 2012-06-18 2012-10-24 吉林大学 Method for tracking targets
CN105741316A (en) * 2016-01-20 2016-07-06 西北工业大学 Robust target tracking method based on deep learning and multi-scale correlation filtering
CN106875425A (en) * 2017-01-22 2017-06-20 北京飞搜科技有限公司 A kind of multi-target tracking system and implementation method based on deep learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Learning to Track at 100 FPS with Deep Regression Networks;David Held 等;《European Conference on Computer Vision,ECCV 2016》;20160816;第749-765页第3节,图2 *
基于PTZ主动摄像头的动目标检测跟踪系统;张永霞;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140115(第01期);第I138-1875页第30-31页,图3-8 *
旋转运动背景下对地运动目标检测跟踪技术研究;储林臻;《中国优秀硕士学位论文全文数据库 信息科技辑》;20150115(第01期);第I138-1475页第56,60-61页,图5.3、5.6 *

Also Published As

Publication number Publication date
CN107274433A (en) 2017-10-20

Similar Documents

Publication Publication Date Title
CN107274433B (en) Target tracking method and device based on deep learning and storage medium
US11176381B2 (en) Video object segmentation by reference-guided mask propagation
Wen et al. Detection, tracking, and counting meets drones in crowds: A benchmark
CN107481270B (en) Table tennis target tracking and trajectory prediction method, device, storage medium and computer equipment
Zhao et al. Spatio-temporal autoencoder for video anomaly detection
US20220417590A1 (en) Electronic device, contents searching system and searching method thereof
Felsberg et al. The thermal infrared visual object tracking VOT-TIR2015 challenge results
JP7147078B2 (en) Video frame information labeling method, apparatus, apparatus and computer program
Lai et al. Semantic-driven generation of hyperlapse from 360 degree video
WO2017096949A1 (en) Method, control device, and system for tracking and photographing target
Wen et al. Visdrone-sot2018: The vision meets drone single-object tracking challenge results
TWI777185B (en) Robot image enhancement method, processor, electronic equipment, computer readable storage medium
Zhu et al. Multi-drone-based single object tracking with agent sharing network
WO2021027543A1 (en) Monocular image-based model training method and apparatus, and data processing device
Martin et al. Optimal choice of motion estimation methods for fine-grained action classification with 3d convolutional networks
WO2023109361A1 (en) Video processing method and system, device, medium and product
CN113160283A (en) Target tracking method based on SIFT under multi-camera scene
CN111833378A (en) Multi-unmanned aerial vehicle single-target tracking method and device based on proxy sharing network
Wu et al. Multi‐camera 3D ball tracking framework for sports video
Liu et al. MBA-VO: Motion blur aware visual odometry
Rozumnyi et al. Fmodetect: Robust detection of fast moving objects
Gao et al. A joint local–global search mechanism for long-term tracking with dynamic memory network
Xu et al. Fast and accurate object detection using image Cropping/Resizing in multi-view 4K sports videos
Abulwafa et al. A fog based ball tracking (FB 2 T) system using intelligent ball bees
Kart et al. Evaluation of Visual Object Trackers on Equirectangular Panorama.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant