CN110660082A - Target tracking method based on graph convolution and trajectory convolution network learning - Google Patents

Target tracking method based on graph convolution and trajectory convolution network learning Download PDF

Info

Publication number
CN110660082A
CN110660082A CN201910908419.XA CN201910908419A CN110660082A CN 110660082 A CN110660082 A CN 110660082A CN 201910908419 A CN201910908419 A CN 201910908419A CN 110660082 A CN110660082 A CN 110660082A
Authority
CN
China
Prior art keywords
target
frame
convolution
track
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910908419.XA
Other languages
Chinese (zh)
Other versions
CN110660082B (en
Inventor
卢学民
权伟
刘跃平
张卫华
周宁
邹栋
郭少鹏
郑丹阳
侯思帧
郭永成
彭宇晨
陈锦雄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southwest Jiaotong University
Original Assignee
Southwest Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southwest Jiaotong University filed Critical Southwest Jiaotong University
Priority to CN201910908419.XA priority Critical patent/CN110660082B/en
Publication of CN110660082A publication Critical patent/CN110660082A/en
Application granted granted Critical
Publication of CN110660082B publication Critical patent/CN110660082B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses a target tracking method based on graph convolution and trajectory convolution network learning, and relates to the technical field of computer vision and target tracking. The network mainly comprises a double-flow feature extraction layer, a target candidate track extraction layer and a target positioning layer. The network outputs the confidence of each target candidate track, selects the target candidate track with the maximum confidence as a target motion track, and then takes the target block of the last frame of the target motion track as a target image block, and the network has initial target positioning capability after training. In the tracking process, the space characteristics and the target motion track characteristics of the target of the continuous 16 frames of images are extracted and connected to form double-flow characteristics, the target candidate track following the target motion rule is obtained through the LSTM structure, and in the characteristic extraction mode, larger weight can be distributed to the parts of the target which are more distinguished for tracking.

Description

Target tracking method based on graph convolution and trajectory convolution network learning
Technical Field
The invention relates to the technical field of computer vision, machine learning and target tracking.
Background
Visual target tracking is a very popular research topic in the field of computer vision, and the research content is to automatically identify or manually specify a target object to be tracked in a video sequence according to a given video segment, so as to predict information such as the position, appearance and motion of the target in a subsequent frame. The target tracking is widely applied to the fields of military and civil intelligent monitoring, man-machine interaction, traffic monitoring and the like, and has strong practical value. Although this research topic has been studied for decades, it remains a challenging topic. In real-world situations, target objects are susceptible to various factors, such as illumination changes, attitude changes, target occlusion, and the like, so that developing a continuously robust target tracking system becomes a very challenging problem. In the last two thirty years, the visual target tracking technology has advanced greatly, and particularly in recent years, the target tracking method using deep learning has achieved satisfactory effect, so that the target tracking technology has achieved breakthrough progress.
Deep learning, which is a hot spot of machine learning research in recent years, has been surprisingly successful in many aspects, such as speech recognition, image recognition, target detection, video classification, etc., due to its powerful feature expression capability and powerful data set and hardware and software support. Research and development of deep learning in target tracking are very rapid, but due to the lack of prior knowledge of target tracking and the requirement of real-time performance, a deep learning technology based on a large amount of training data and parameter calculation is difficult to be fully developed in this respect, and still has a large exploration space. Compared with the traditional manual feature extraction, the deep learning has the important characteristics of deeper semantic features and stronger representation capability, and is more accurate and reliable in solving the target tracking problem.
At present, target tracking algorithms based on deep learning are mainly classified into three categories: a tracking algorithm based on template matching, an algorithm based on machine learning regression and an algorithm based on machine learning classification. However, the current deep learning tracking algorithm does not completely solve the possible problems in the actual tracking process, and the target may be subjected to various interferences, such as deformation, shielding, illumination change and the like, so that the uncertainty of the target motion is increased. However, the spatial position relationship of each part of the target and the target motion trajectory information play an extremely important role in performing accurate and robust target tracking. Recently, a graph convolution neural network has made a great progress in the target tracking research direction in the computer vision field, and Zhen Cui et al propose a spectral filter tracking method, which adopts a spectral filter to encode and extract features of a local structure of an image, and finally uses filter parameters and a feature projection function to perform regression positioning on the position of a target. Junyu Gao et al propose a graph convolution tracking method, which can simultaneously realize spatio-temporal appearance modeling and context-aware adaptive learning of a target, thereby realizing robust target positioning. The target motion track is used as an important information characteristic of a continuous motion target, and is widely applied to the fields of target tracking and behavior recognition. Chenge Li provides a method for tracking a target in a video in real time, extracts a space-time convolution characteristic of the target to detect a three-dimensional track, thereby realizing the tracking of the target, Yue Zhao and the like provide an end-to-end track convolution identification network aiming at a behavior identification task of the video, extracts a dynamic track characteristic of the target by introducing track convolution operation, further combines the appearance and motion information of the target, and is different from time sequence convolution, the track convolution takes the position offset and motion change rule of the target into consideration, so that the appearance characteristic of the target is aggregated along a motion path, thereby more accurately expressing the continuous motion characteristic of the target along the time lapse.
Disclosure of Invention
The invention aims to provide a target tracking method based on graph convolution and trajectory convolution network learning, which can effectively solve the technical problem of accurately, robustly and long-term tracking of a target object in a complex motion scene.
The purpose of the invention is realized by the following technical scheme: a target tracking method based on graph convolution and trajectory convolution network learning comprises the following steps:
step one, target selection
Selecting and determining a target object to be tracked from the initial image sequence, wherein the process of selecting the target object is automatically extracted by a moving target detection method or manually specified by a man-machine interaction method;
step two, generation of training data set
The generation of the training data set is divided into two steps, firstly, the selection of the data set is carried out, and then the manufacture of the data set is carried out; selecting a large classified recognition Video data set ImageNet Video, marking all images with corresponding target object position coordinates, then manufacturing a training data set through a known label, wherein the data set has 4500 types of videos in total, and manufacturing the data set from each type of Video data according to two different selection rules; specifically, 16-frame images (I) are taken continuously1,I2,...,I16) Taking 1 frame (I) as a set of training data and every 2 frames of images1,I3,..,I31) As a set of training data set, where I represents a frame image, the number of frames of the sampled image is 16 frames, and finally 56250 sets of training sets are generated, and the sizes of the image frames are normalized to 224 × 224 pixels;
step three, constructing and training graph convolution and track convolution network
The network model is divided into three parts, namely a double-flow feature extraction layer, a candidate track extraction layer and a target positioning layer; wherein, the double-current feature extraction layer adopts a graph convolution and track convolution structure to extract together; the specific operation of graph convolution is: firstly, dividing a target object into graph nodes or called parts, specifically dividing the target object into M grids with the same size, wherein each graph node is of the same grid structure, and simultaneously constructing a non-directional weight graph G (v, W) which comprises graph nodes v and weights W of edges connecting the graph nodes; the edges of the graph nodes between the previous and subsequent frames of the successive 16 frame images need to be weight initialized, where the {0,1} initialization weight, i.e., W, is usedijE {0,1}, wherein i is the graph node of the t-th frame, and j is the graph node of the t + 1-th frame; setting each graph node to be connected with four directly adjacent graph nodes only, wherein the weight is 1, and the rest are 0; the network structure adopts the first five layers of Alexenet networks pre-trained on ImageNet, two layers of graph convolution layers are added behind the Alexenet networks, the output characteristic F is calculated to be F ═ WX, X is the characteristic of each graph node after passing through the Alexenet five-layer networks, each frame image obtains h × w × 256 graph convolution characteristics, and the continuous 16 frames of images are finally output to obtain T × h × w × 256 graph convolution characteristics; wherein T is the frame number of the video image sequence, where T is 16, h is the feature height, w is the feature width, and 256 is the number of feature channels;
the specific operation of the trajectory convolution is: knowing the target position of each frame of image in the 16 frames of images, wherein each target position is represented as x, y, w, h, the x, y, w, h respectively represent the central abscissa, the central ordinate, the width and the height of the target position, and connecting the target positions between the front frame and the rear frame of the continuous 16 frames of images to obtain a target motion track; the trajectory convolution is adopted inTop five layers of a pre-trained C3D network on ImageNet, given an input profile x at time tt(p) the output characteristic map is yt(p), convolution kernel parameters of trajectory convolution { Wτ:τ∈[0,Δt]And kernel parameter size Δ t-1, where Δ t is 16, output profile yt(p) is calculated as
Figure BDA0002213973050000021
Inputting the graph convolution characteristics obtained by each frame of image into the track convolution to finally obtain the continuous 16-frame image track convolution characteristics, wherein the dimension of the graph convolution characteristics is T multiplied by h multiplied by w multiplied by 256; finally, the graph convolution and the track convolution are connected to form a T multiplied by h multiplied by w multiplied by 512 dimensional characteristic, wherein T is the frame number of the video image sequence, T is 16, h is the characteristic height, w is the characteristic width, and 512 is the characteristic channel number;
taking the target position of the previous frame image as the center, forming a target attention area in the current input frame by taking the target position of the previous frame image as the center and taking 4 times of the target, obtaining target candidate blocks in the target attention area by adopting a sliding search window method, wherein the length-width ratios of the adopted search windows are respectively 1:1, 1:2 and 2:1, moving from the initial coordinate position of the target attention area until the target attention area is searched, taking the image blocks selected by the search window as the target candidate blocks, normalizing the scales of the image blocks into the size same as that of a target object, connecting each target candidate block with the target positions of the previous 16 frames of images to form a new target motion track, then passing the double-flow characteristics of the continuous 17 frames of images through an LSTM structure to obtain N target candidate tracks, wherein the dimensions are Nx 4, N is the number of the target candidate tracks, and 4 represents 4 position coordinates of the target position of each frame of image, in particular, setting the loss function of the target candidate trajectory network as
Figure BDA0002213973050000031
T is the image frame number, Delta theta is the deviation of the predicted value and the true value of the coordinate, and the position coordinate of the target candidate block of the current input frame is represented as x0,y0,w0,h0Wherein x is0,y0,w0,h0Respectively representing the center abscissa of the target candidate block,Center ordinate, width and height, while the offset value for coordinate prediction is Δ x0,Δy0,Δw0,Δh0Then the coordinates of each target candidate block are x0+Δx0,y0+Δy0,w0+Δw0,h0+Δh0Connecting target motion tracks of continuous 16-frame images with each target candidate block to form target candidate tracks, finally obtaining N target candidate tracks by learning a target motion rule, inputting double-current characteristics of the N target candidate tracks into a full-connection layer for classification, and setting a network classification loss function as cross entropy loss;
after the network is constructed, training the network by using the training data set generated in the second step, wherein the training method adopts a classical random gradient descent method, after the training is finished, the network outputs the confidence (namely, the similarity) of each target candidate track, then selects the target candidate track with the maximum confidence as a target motion track, and then takes the target position of the last frame image of the target motion track as a target image block, so as to obtain the initial capability of target positioning;
step four, inputting image sequence
After the graph convolution and track convolution network training is finished, under the condition of real-time processing, extracting a video image which is collected by a camera and stored in a storage area as an input image to be tracked; under the condition of offline processing, decomposing the video file which is acquired into an image sequence consisting of a plurality of frames, extracting continuous 16-frame images as an input image sequence according to a time sequence, and stopping the whole process if the number of the input image frames is not equal to 16;
step five, generating target candidate tracks
Dividing the target object in the continuous 16-frame images into M image nodes according to the method in step three, simultaneously connecting the positions of the target object between the front frame and the rear frame of the 16-frame images to obtain a target motion track, inputting a double-stream feature extraction layer, extracting to obtain double-stream features with the dimension of T multiplied by w multiplied by 512, and obtaining N target candidate tracks with the dimension of N multiplied by 4 through a candidate track extraction layer of a graph convolution and track convolution network, wherein N is the number of the target candidate tracks, and 4 represents 4 position coordinates of each frame of target;
sixthly, positioning the target
Classifying the target candidate tracks obtained in the fifth step through a full connection layer, outputting the confidence coefficient of each target candidate track by a network, selecting the target candidate track with the maximum confidence coefficient as a target motion track, and taking the target position of the last frame of the target motion track as a target image block so as to obtain the initial capability of target positioning, wherein the target positioning is completed;
step seven, network online updating
After the tracked target result is successfully determined, inputting the target object and the position coordinates of the current input image frame obtained by positioning in the sixth step into the end of the 16-frame image sequence of the initial training set, simultaneously deleting the first frame of the 16-frame image sequence, and updating the first frame into a new training set, which is represented as (I)2,...,I17) (ii) a And then jumping to the step four, obtaining a new training set of continuous 16-frame images again, dynamically adjusting the target motion track in real time, performing network on-line learning, realizing fine adjustment and updating of the network, and performing a new round of target positioning.
The invention has the advantages and positive effects that: a target tracking method based on graph convolution and trajectory convolution network learning is provided. The method uses a training data set to train a graph convolution and track convolution network model in an off-line mode, and the network mainly comprises a double-current feature extraction layer, a target candidate track extraction layer and a target positioning layer. The method comprises the steps of extracting spatial features of a target in each frame of image by adopting graph convolution, extracting track features of continuous frames of video images by adopting track convolution, obtaining target candidate tracks following a target motion rule by adopting a time recursive neural network (LSTM) structure, classifying the target candidate tracks by adopting a full-connection layer structure, outputting confidence coefficient of each target candidate track by a network, selecting the target candidate track with the maximum confidence coefficient as a target motion track, then selecting a target block of the last frame of the target motion track as a target image block, and having initial target positioning capability after network training is completed. In the tracking process, space features and target motion track features of targets of continuous 16 frames of images are extracted and connected to form double-flow features, target candidate tracks following the target motion rule are obtained through an LSTM structure, the network outputs the confidence coefficient of each target candidate track, then the target candidate track with the maximum confidence coefficient is selected as the target motion track, then a target block of the last frame of the target motion track is selected as a target image block, and target positioning is completed, so that the target object is tracked. In the online learning process of the network, the network model is finely adjusted through the target image block obtained by tracking, so that the network model can dynamically adjust the motion track of the target, and the current image sequence is better adapted.
The network model can fully extract the characteristics of the target under the condition of continuous motion, including the space position characteristics and the target motion track characteristics of the target. In a feature extraction mode, a larger weight can be distributed to a part with more discrimination of a target for tracking; meanwhile, in the generation mode of the target candidate track, the corresponding target candidate track can be generated in the motion constraint range according to the target motion track, so that the probability of target drifting and even target loss can be reduced, the calculated amount of target positioning is greatly reduced, and the robustness and accuracy of target tracking are improved. The invention can process complex tracking scenes, realize long-time real-time accurate target tracking and solve the problems of target shielding, drifting and the like in the tracking process. In addition, the method can be used for single-target tracking and multi-target tracking in complex scenes.
Drawings
FIG. 1 is a schematic view of the present invention showing the connection of nodes
FIG. 2 is a block diagram of the present invention
FIG. 3 is a flow chart of the present invention
Detailed Description
The method can be used for various occasions of visual target tracking, including the fields of military, civil use and the like, the fields of military such as unmanned aircrafts, accurate guidance, air early warning and the like, and the fields of civil use such as mobile robots, intelligent video monitoring of traction substations, intelligent traffic systems, intelligent security and the like. Take intelligent video monitoring of a traction substation as an example: the intelligent video monitoring of the traction substation comprises a plurality of important automatic analysis tasks, such as intrusion detection, behavior analysis, abnormal alarm and the like, and the basis of the work is to realize real-time and stable target tracking. The tracking method provided by the invention can be adopted for realizing the tracking method, and specifically, a graph convolution and track convolution network model is required to be constructed firstly, and the network mainly comprises a double-current feature extraction layer, a target candidate track extraction layer and a target positioning layer, as shown in fig. 2. And then, manually labeling the target in the monitoring video in the traction substation to obtain a corresponding training data set, and then training the network by adopting the monitoring video training set and a random gradient descent method, wherein the network initially has corresponding target positioning capability after training. In the tracking process, space features and target motion track features of targets of 16 continuous frames of images are extracted and connected to form double-flow features, target candidate tracks following a target motion rule are obtained through an LSTM structure, the target candidate tracks are classified by adopting a full-connection layer structure, a network outputs the confidence coefficient of each target candidate track, then the target candidate track with the maximum confidence coefficient is selected as a target motion track, then a target block of the last frame of the target motion track is selected as a target image block, and target positioning is completed, so that the target object is tracked. In the online learning process of the network, the network model is finely adjusted through the target image block obtained by tracking, so that the network model can dynamically adjust the motion track of the target, thereby better adapting to the actually monitored image sequence in the traction substation and effectively improving the robustness and accuracy of target tracking. The invention can process complex tracking scenes, realize long-time real-time accurate target tracking and solve the problems of target shielding, drifting and the like in the tracking process. In addition, the method can be used for single-target tracking and multi-target tracking in complex scenes.
The method can be realized by programming in any computer programming language (such as C language), and the tracking system software based on the method can realize real-time target tracking application in any PC or embedded system.

Claims (1)

1. A target tracking method based on graph convolution and trajectory convolution network learning comprises the following steps:
step one, target selection
Selecting and determining a target object to be tracked from the initial image sequence, wherein the process of selecting the target object is automatically extracted by a moving target detection method or manually specified by a man-machine interaction method;
step two, generation of training data set
The generation of the training data set is divided into two steps, firstly, the selection of the data set is carried out, and then the manufacture of the data set is carried out; selecting a large classified recognition Video data set ImageNet Video, marking all images with corresponding target object position coordinates, then manufacturing a training data set through a known label, wherein the data set has 4500 types of videos in total, and manufacturing the data set from each type of Video data according to two different selection rules; specifically, 16-frame images (I) are taken continuously1,I2,...,I16) Taking 1 frame (I) as a set of training data and every 2 frames of images1,I3,..,I31) As a set of training data set, where I represents a frame image, the number of frames of the sampled image is 16 frames, and finally 56250 sets of training sets are generated, and the sizes of the image frames are normalized to 224 × 224 pixels;
step three, constructing and training graph convolution and track convolution network
The network model is divided into three parts, namely a double-flow feature extraction layer, a candidate track extraction layer and a target positioning layer; wherein, the double-current feature extraction layer adopts a graph convolution and track convolution structure to extract together; the specific operation of graph convolution is: firstly, dividing a target object into graph nodes or called parts, specifically dividing the target object into M grids with the same size, wherein each graph node is of the same grid structure, and simultaneously constructing a non-directional weight graph G (v, W) which comprises graph nodes v and weights W of edges connecting the graph nodes; the edges of the graph nodes between the previous and subsequent frames of the successive 16-frame images need to be weight initialized, where the {0,1} initialization weight, i.e., the initialization weight, is usedWijE {0,1}, wherein i is the graph node of the t-th frame, and j is the graph node of the t + 1-th frame; setting each graph node to be connected with four directly adjacent graph nodes only, wherein the weight is 1, and the rest are 0; the network structure adopts the first five layers of Alexenet networks pre-trained on ImageNet, two layers of graph convolution layers are added behind the Alexenet networks, the output characteristic F is calculated to be F ═ WX, X is the characteristic of each graph node after passing through the Alexenet five-layer networks, each frame image obtains h × w × 256 graph convolution characteristics, and the continuous 16 frames of images are finally output to obtain T × h × w × 256 graph convolution characteristics; wherein T is the frame number of the video image sequence, where T is 16, h is the feature height, w is the feature width, and 256 is the number of feature channels;
the specific operation of the trajectory convolution is: knowing the target position of each frame of image in the 16 frames of images, wherein each target position is represented as x, y, w, h, the x, y, w, h respectively represent the central abscissa, the central ordinate, the width and the height of the target position, and connecting the target positions between the front frame and the rear frame of the continuous 16 frames of images to obtain a target motion track; trace convolution takes the first five layers of a C3D network pre-trained on ImageNet, given an input feature map x at time tt(p) the output characteristic map is yt(p), convolution kernel parameters of trajectory convolution { Wτ:τ∈[0,Δt]And kernel parameter size Δ t-1, where Δ t is 16, output profile yt(p) is calculated as
Figure FDA0002213973040000011
Inputting the graph convolution characteristics obtained by each frame of image into the track convolution to finally obtain the continuous 16-frame image track convolution characteristics, wherein the dimension of the graph convolution characteristics is T multiplied by h multiplied by w multiplied by 256; finally, the graph convolution and the track convolution are connected to form a T multiplied by h multiplied by w multiplied by 512 dimensional characteristic, wherein T is the frame number of the video image sequence, T is 16, h is the characteristic height, w is the characteristic width, and 512 is the characteristic channel number;
taking the target position of the image of the previous frame as the center, forming a target attention area in the current input frame by taking the target position as 4 times of the target, acquiring target candidate blocks in the target attention area by adopting a sliding search window method, wherein the aspect ratios of the adopted search windows are respectively1:1, 1:2 and 2:1, moving from an initial coordinate position of a target attention area until the target attention area is searched, taking an image block selected by a search window as a target candidate block, normalizing the dimension of the image block to be the same as that of a target object, connecting each target candidate block with the target position of the previous 16 frames of images to form a new target motion track, then passing the double-flow characteristics of the continuous 17 frames of images through an LSTM (local Scale invariant feature) structure to obtain N target candidate tracks with dimensions of Nx 4, wherein N is the number of the target candidate tracks, 4 represents 4 position coordinates of the target position of each frame of images, and specifically, setting the loss function of a target candidate track network as the loss function
Figure FDA0002213973040000021
T is the image frame number, Delta theta is the deviation of the predicted value and the true value of the coordinate, and the position coordinate of the target candidate block of the current input frame is represented as x0,y0,w0,h0Wherein x is0,y0,w0,h0Respectively representing the center abscissa, center ordinate, width and height of the target candidate block, and the offset value of coordinate prediction is Δ x0,Δy0,Δw0,Δh0Then the coordinates of each target candidate block are x0+Δx0,y0+Δy0,w0+Δw0,h0+Δh0Connecting target motion tracks of continuous 16-frame images with each target candidate block to form target candidate tracks, finally obtaining N target candidate tracks by learning a target motion rule, inputting double-current characteristics of the N target candidate tracks into a full-connection layer for classification, and setting a network classification loss function as cross entropy loss;
after the network is constructed, training the network by using the training data set generated in the second step, wherein the training method adopts a classical random gradient descent method, after the training is finished, the network outputs the confidence coefficient of each target candidate track, then selects the target candidate track with the maximum confidence coefficient as a target motion track, and then takes the target position of the last frame image of the target motion track as a target image block, so as to obtain the initial capability of target positioning;
step four, inputting image sequence
After the graph convolution and track convolution network training is finished, under the condition of real-time processing, extracting a video image which is collected by a camera and stored in a storage area as an input image to be tracked; under the condition of offline processing, decomposing the video file which is acquired into an image sequence consisting of a plurality of frames, extracting continuous 16-frame images as an input image sequence according to a time sequence, and stopping the whole process if the number of the input image frames is not equal to 16;
step five, generating target candidate tracks
Dividing the target object in the continuous 16-frame images into M image nodes according to the method in step three, simultaneously connecting the positions of the target object between the front frame and the rear frame of the 16-frame images to obtain a target motion track, inputting a double-stream feature extraction layer, extracting to obtain double-stream features with the dimension of T multiplied by w multiplied by 512, and obtaining N target candidate tracks with the dimension of N multiplied by 4 through a candidate track extraction layer of a graph convolution and track convolution network, wherein N is the number of the target candidate tracks, and 4 represents 4 position coordinates of each frame of target;
sixthly, positioning the target
Classifying the target candidate tracks obtained in the fifth step through a full connection layer, outputting the confidence coefficient of each target candidate track by a network, selecting the target candidate track with the maximum confidence coefficient as a target motion track, and taking the target position of the last frame of the target motion track as a target image block so as to obtain the initial capability of target positioning, wherein the target positioning is completed;
step seven, network online updating
After the tracked target result is successfully determined, inputting the target object and the position coordinates of the current input image frame obtained by positioning in the sixth step into the end of the 16-frame image sequence of the initial training set, simultaneously deleting the first frame of the 16-frame image sequence, and updating the first frame into a new training set, which is represented as (I)2,...,I17) (ii) a Then jumping to step four, obtaining a new training set of continuous 16 frames of images again, and dynamically adjusting the target in real timeAnd (4) performing network on-line learning by the motion track, realizing fine adjustment and updating of the network, and performing a new round of target positioning.
CN201910908419.XA 2019-09-25 2019-09-25 Target tracking method based on graph convolution and trajectory convolution network learning Active CN110660082B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910908419.XA CN110660082B (en) 2019-09-25 2019-09-25 Target tracking method based on graph convolution and trajectory convolution network learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910908419.XA CN110660082B (en) 2019-09-25 2019-09-25 Target tracking method based on graph convolution and trajectory convolution network learning

Publications (2)

Publication Number Publication Date
CN110660082A true CN110660082A (en) 2020-01-07
CN110660082B CN110660082B (en) 2022-03-08

Family

ID=69039024

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910908419.XA Active CN110660082B (en) 2019-09-25 2019-09-25 Target tracking method based on graph convolution and trajectory convolution network learning

Country Status (1)

Country Link
CN (1) CN110660082B (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291631A (en) * 2020-01-17 2020-06-16 北京市商汤科技开发有限公司 Video analysis method and related model training method, device and apparatus
CN111339449A (en) * 2020-03-24 2020-06-26 青岛大学 User motion trajectory prediction method, device, equipment and storage medium
CN111382318A (en) * 2020-03-14 2020-07-07 平顶山学院 Dynamic community detection method based on information dynamics
CN111524164A (en) * 2020-04-21 2020-08-11 北京爱笔科技有限公司 Target tracking method and device and electronic equipment
CN111626121A (en) * 2020-04-24 2020-09-04 上海交通大学 Complex event identification method and system based on multi-level interactive reasoning in video
CN111862153A (en) * 2020-07-10 2020-10-30 电子科技大学 Long-time multi-target tracking method for pedestrians
CN111881840A (en) * 2020-07-30 2020-11-03 北京交通大学 Multi-target tracking method based on graph network
CN112465006A (en) * 2020-11-24 2021-03-09 中国人民解放军海军航空大学 Graph neural network target tracking method and device
CN112541449A (en) * 2020-12-18 2021-03-23 天津大学 Pedestrian trajectory prediction method based on unmanned aerial vehicle aerial photography view angle
CN112597796A (en) * 2020-11-18 2021-04-02 中国石油大学(华东) Robust point cloud representation learning method based on graph convolution
CN112598698A (en) * 2021-03-08 2021-04-02 南京爱奇艺智能科技有限公司 Long-time single-target tracking method and system
CN113221676A (en) * 2021-04-25 2021-08-06 中国科学院半导体研究所 Target tracking method and device based on multi-dimensional features
CN113253684A (en) * 2021-05-31 2021-08-13 杭州蓝芯科技有限公司 Multi-AGV (automatic guided vehicle) scheduling method and device based on graph convolution neural network and electronic equipment
CN113362368A (en) * 2021-07-26 2021-09-07 北京邮电大学 Crowd trajectory prediction method based on multi-level space-time diagram neural network
CN113435356A (en) * 2021-06-30 2021-09-24 吉林大学 Track prediction method for overcoming observation noise and perception uncertainty
CN113505812A (en) * 2021-06-11 2021-10-15 国网浙江省电力有限公司嘉兴供电公司 High-voltage circuit breaker track action identification method based on double-current convolutional network
CN113910224A (en) * 2021-09-30 2022-01-11 达闼科技(北京)有限公司 Robot following method and device and electronic equipment
CN114789440A (en) * 2022-04-22 2022-07-26 深圳市正浩创新科技股份有限公司 Target docking method, device, equipment and medium based on image recognition
CN114897941A (en) * 2022-07-13 2022-08-12 长沙超创电子科技有限公司 Target tracking method based on Transformer and CNN
CN117079196A (en) * 2023-10-16 2023-11-17 长沙北斗产业安全技术研究院股份有限公司 Unmanned aerial vehicle identification method based on deep learning and target motion trail

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663492A (en) * 2012-03-19 2012-09-12 南京理工大学常熟研究院有限公司 Maneuvering target tracking system based on nerve network data fusion
CN104778699A (en) * 2015-04-15 2015-07-15 西南交通大学 Adaptive object feature tracking method
CN104794737A (en) * 2015-04-10 2015-07-22 电子科技大学 Depth-information-aided particle filter tracking method
CN107818571A (en) * 2017-12-11 2018-03-20 珠海大横琴科技发展有限公司 Ship automatic tracking method and system based on deep learning network and average drifting
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations
CN109800689A (en) * 2019-01-04 2019-05-24 西南交通大学 A kind of method for tracking target based on space-time characteristic fusion study

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663492A (en) * 2012-03-19 2012-09-12 南京理工大学常熟研究院有限公司 Maneuvering target tracking system based on nerve network data fusion
CN104794737A (en) * 2015-04-10 2015-07-22 电子科技大学 Depth-information-aided particle filter tracking method
CN104778699A (en) * 2015-04-15 2015-07-15 西南交通大学 Adaptive object feature tracking method
CN107818571A (en) * 2017-12-11 2018-03-20 珠海大横琴科技发展有限公司 Ship automatic tracking method and system based on deep learning network and average drifting
US10176405B1 (en) * 2018-06-18 2019-01-08 Inception Institute Of Artificial Intelligence Vehicle re-identification techniques using neural networks for image analysis, viewpoint-aware pattern recognition, and generation of multi- view vehicle representations
CN109800689A (en) * 2019-01-04 2019-05-24 西南交通大学 A kind of method for tracking target based on space-time characteristic fusion study

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIMIN WANG .ETC: "Action recognition with trajectory-pooled deep-convolutional descriptors", 《2015 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR)》 *
闵召阳等: "基于卷积神经网络检测的单镜头多目标跟踪算法", 《舰船电子工程》 *

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291631B (en) * 2020-01-17 2023-11-07 北京市商汤科技开发有限公司 Video analysis method and related model training method, device and apparatus thereof
CN111291631A (en) * 2020-01-17 2020-06-16 北京市商汤科技开发有限公司 Video analysis method and related model training method, device and apparatus
CN111382318A (en) * 2020-03-14 2020-07-07 平顶山学院 Dynamic community detection method based on information dynamics
CN111382318B (en) * 2020-03-14 2024-02-02 平顶山学院 Dynamic community detection method based on information dynamics
CN111339449A (en) * 2020-03-24 2020-06-26 青岛大学 User motion trajectory prediction method, device, equipment and storage medium
CN111524164A (en) * 2020-04-21 2020-08-11 北京爱笔科技有限公司 Target tracking method and device and electronic equipment
CN111524164B (en) * 2020-04-21 2023-10-13 北京爱笔科技有限公司 Target tracking method and device and electronic equipment
CN111626121A (en) * 2020-04-24 2020-09-04 上海交通大学 Complex event identification method and system based on multi-level interactive reasoning in video
CN111626121B (en) * 2020-04-24 2022-12-20 上海交通大学 Complex event identification method and system based on multi-level interactive reasoning in video
CN111862153A (en) * 2020-07-10 2020-10-30 电子科技大学 Long-time multi-target tracking method for pedestrians
CN111881840A (en) * 2020-07-30 2020-11-03 北京交通大学 Multi-target tracking method based on graph network
CN111881840B (en) * 2020-07-30 2023-09-22 北京交通大学 Multi-target tracking method based on graph network
CN112597796A (en) * 2020-11-18 2021-04-02 中国石油大学(华东) Robust point cloud representation learning method based on graph convolution
CN112465006A (en) * 2020-11-24 2021-03-09 中国人民解放军海军航空大学 Graph neural network target tracking method and device
CN112465006B (en) * 2020-11-24 2022-08-05 中国人民解放军海军航空大学 Target tracking method and device for graph neural network
CN112541449A (en) * 2020-12-18 2021-03-23 天津大学 Pedestrian trajectory prediction method based on unmanned aerial vehicle aerial photography view angle
CN112598698A (en) * 2021-03-08 2021-04-02 南京爱奇艺智能科技有限公司 Long-time single-target tracking method and system
CN113221676B (en) * 2021-04-25 2023-10-13 中国科学院半导体研究所 Target tracking method and device based on multidimensional features
CN113221676A (en) * 2021-04-25 2021-08-06 中国科学院半导体研究所 Target tracking method and device based on multi-dimensional features
CN113253684B (en) * 2021-05-31 2021-09-21 杭州蓝芯科技有限公司 Multi-AGV (automatic guided vehicle) scheduling method and device based on graph convolution neural network and electronic equipment
CN113253684A (en) * 2021-05-31 2021-08-13 杭州蓝芯科技有限公司 Multi-AGV (automatic guided vehicle) scheduling method and device based on graph convolution neural network and electronic equipment
CN113505812A (en) * 2021-06-11 2021-10-15 国网浙江省电力有限公司嘉兴供电公司 High-voltage circuit breaker track action identification method based on double-current convolutional network
CN113435356B (en) * 2021-06-30 2023-02-28 吉林大学 Track prediction method for overcoming observation noise and perception uncertainty
CN113435356A (en) * 2021-06-30 2021-09-24 吉林大学 Track prediction method for overcoming observation noise and perception uncertainty
CN113362368A (en) * 2021-07-26 2021-09-07 北京邮电大学 Crowd trajectory prediction method based on multi-level space-time diagram neural network
CN113910224A (en) * 2021-09-30 2022-01-11 达闼科技(北京)有限公司 Robot following method and device and electronic equipment
CN114789440A (en) * 2022-04-22 2022-07-26 深圳市正浩创新科技股份有限公司 Target docking method, device, equipment and medium based on image recognition
CN114789440B (en) * 2022-04-22 2024-02-20 深圳市正浩创新科技股份有限公司 Target docking method, device, equipment and medium based on image recognition
CN114897941A (en) * 2022-07-13 2022-08-12 长沙超创电子科技有限公司 Target tracking method based on Transformer and CNN
CN117079196A (en) * 2023-10-16 2023-11-17 长沙北斗产业安全技术研究院股份有限公司 Unmanned aerial vehicle identification method based on deep learning and target motion trail
CN117079196B (en) * 2023-10-16 2023-12-29 长沙北斗产业安全技术研究院股份有限公司 Unmanned aerial vehicle identification method based on deep learning and target motion trail

Also Published As

Publication number Publication date
CN110660082B (en) 2022-03-08

Similar Documents

Publication Publication Date Title
CN110660082B (en) Target tracking method based on graph convolution and trajectory convolution network learning
CN109800689B (en) Target tracking method based on space-time feature fusion learning
CN110298404B (en) Target tracking method based on triple twin Hash network learning
CN107122736B (en) Human body orientation prediction method and device based on deep learning
CN110569793A (en) Target tracking method for unsupervised similarity discrimination learning
CN111161315B (en) Multi-target tracking method and system based on graph neural network
CN107146237B (en) Target tracking method based on online state learning and estimation
CN105528794A (en) Moving object detection method based on Gaussian mixture model and superpixel segmentation
CN106570490B (en) A kind of pedestrian's method for real time tracking based on quick clustering
CN107833239B (en) Optimization matching target tracking method based on weighting model constraint
CN109993770B (en) Target tracking method for adaptive space-time learning and state recognition
CN108537825B (en) Target tracking method based on transfer learning regression network
CN112052802A (en) Front vehicle behavior identification method based on machine vision
CN109493370B (en) Target tracking method based on space offset learning
CN109272036B (en) Random fern target tracking method based on depth residual error network
CN113096159B (en) Target detection and track tracking method, model and electronic equipment thereof
CN114820765A (en) Image recognition method and device, electronic equipment and computer readable storage medium
CN112507859B (en) Visual tracking method for mobile robot
Gong et al. Multi-target trajectory tracking in multi-frame video images of basketball sports based on deep learning
CN110197121A (en) Moving target detecting method, moving object detection module and monitoring system based on DirectShow
CN110111358B (en) Target tracking method based on multilayer time sequence filtering
CN113379795A (en) Multi-target tracking and segmenting method based on conditional convolution and optical flow characteristics
Casagrande et al. Abnormal motion analysis for tracking-based approaches using region-based method with mobile grid
Elbaşi Fuzzy logic-based scenario recognition from video sequences
CN110378938A (en) A kind of monotrack method based on residual error Recurrent networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant