CN111161315A

CN111161315A - Multi-target tracking method and system based on graph neural network

Info

Publication number: CN111161315A
Application number: CN201911312114.9A
Authority: CN
Inventors: 蒋婷婷; 高旭; 李佳河
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2019-12-18
Filing date: 2019-12-18
Publication date: 2020-05-15
Anticipated expiration: 2039-12-18
Also published as: CN111161315B

Abstract

The application discloses a multi-target tracking method and system based on a graph neural network, comprising the following steps: preprocessing a training set to obtain a node set, an edge set and a global variable; inputting the node set, the edge set and the global variable to a graph neural network to obtain an appearance similarity matrix and a motion similarity matrix; training a graph neural network according to the loss function, the appearance similarity matrix and the motion similarity matrix by using an optimization algorithm, determining the setting parameters of the graph neural network, and then training to obtain the trained graph neural network; processing the data set by using the trained neural network to obtain a similarity matrix; and calculating the similarity matrix by using a matching algorithm to obtain a matching result of the target in the data set. The node set, the edge set and the global variables can be updated by using the graph neural network and through an optimization algorithm and a loss function, and the global variables are introduced and updated outside the nodes and the edges, so that the graph neural network can capture the global information and improve the performance of multi-target tracking.

Description

Multi-target tracking method and system based on graph neural network

Technical Field

The application relates to the field of multi-target tracking, in particular to a multi-target tracking method and system based on a graph neural network.

Background

With the development of multimedia technology and the popularization of video acquisition equipment, the output of multimedia products such as videos is larger and larger, and the effect of the multimedia products in daily life is larger and larger. Tracking multiple targets in a video is a basic and important task in the fields of motion analysis, human-computer interaction, video monitoring (abnormal behavior recognition), automatic driving and the like. The video multi-target tracking technology has great research value in theoretical research and practical application.

The tracking task is divided into: single target tracking tasks and multi-target tracking tasks. The single-target tracking task is to track one initially specified target, and the multi-target tracking task is to track all targets in a field. The following difficulties to be overcome by the tracking task are: the tracking target is shielded, and the appearance characteristic is changed due to motion blurring. Although single target tracking methods can be applied directly to multi-target tracking tasks, simply applying the approach is not feasible. Compared with a single-target tracking task, the number of tracking targets in the multi-target tracking task is changed. The multi-target tracking method needs to determine the number of tracking targets in the current frame, which targets leave the visual field and which targets enter the visual field. Compared with a single-target tracking task, the targets in the multi-target tracking task are shielded more frequently, and simultaneously, the targets with similar appearances are more. If a single target tracking method is simply used for a multi-target tracking task, the tracking performance may be poor.

Video multi-target tracking methods are generally classified into off-line multi-target tracking, on-line multi-target tracking, and approximate on-line multi-target tracking. The off-line multi-target tracking method completes multi-target tracking by utilizing the information of all frames in the video. However, in practical applications, it is more necessary to track a plurality of targets in real time, rather than waiting for the entire video to be obtained and then tracking. At this time, an online multi-target tracking method is required. The on-line multi-target tracking method is to utilize the information of the current frame and the previous frame of the current frame to carry out multi-target tracking. Because the off-line multi-target tracking method utilizes more information, the tracking performance of the off-line multi-target tracking method is generally better than that of the on-line multi-target tracking method. In order to meet the application requirements of tracking performance and real-time performance, an approximate online multi-target tracking method is produced. The approximate on-line multi-target tracking method is a multi-target tracking method between an off-line multi-target tracking method and an on-line multi-target tracking method. Compared with an on-line multi-target tracking method, the approximate on-line multi-target tracking method also utilizes information of a period of time after the current frame for tracking. Compared with an online multi-target tracking method, the approximate online multi-target tracking method utilizes more information for tracking, and can obtain better tracking performance; compared with an off-line multi-target tracking method, the approximate on-line multi-target tracking method does not need to track after acquiring the whole video, and can carry out real-time tracking with a certain time delay.

The strategy of the multi-target tracking method comprises the following steps: detection-then-tracking strategies and model-less strategies. The detection-then-tracking strategy is as follows: firstly, detecting a target in each frame in a video, and giving the target in a form of a labeling frame; and then tracking is performed. The purpose of the multi-target tracking task adopting the detection-before-tracking strategy is as follows: and connecting the labeling frames belonging to the same target in different frames. The model-free strategy is: only the labeling frame of the target to be tracked in the first frame is given, and then the tracking method needs to perform detection first and then perform tracking. During the tracking process, a target may leave the field of view at any time, and a new target may enter the field of view. The modeless strategy, while more conforming to the tracked scenario, still requires that an initial annotation box be given for the newly entered target. Although less labeled boxes need to be provided for the wood model, an initial box is always needed for tracking the target. Therefore, the multi-target tracking method mostly adopts a strategy of firstly detecting and then tracking.

The existing multi-target tracking method mainly utilizes three types of characteristic information for tracking: appearance features, motion features, and interaction features. The appearance features mainly comprise traditional features and depth features. Wherein the traditional appearance characteristics include: color histogram (RGB histogram, HSV histogram, etc.), shape and texture. The depth features are extracted through a pre-trained deep neural network. The motion characteristics refer to modeling the motion speed of the target, and the motion models comprise a linear motion model and a nonlinear model. Wherein the linear model assumes that the object motion velocity is constant. Interactive features refer to modeling the relationship between multiple objects in a video, e.g., the direction and speed of motion of a group is the same. The appearance features, motion features and interaction features can provide useful information for the multi-target tracking method.

Multi-target tracking methods include two broad categories: the method comprises the steps of carrying out multi-target tracking on a traditional method and a multi-target tracking method based on a deep neural network. Because of the expansion of the basic data set and the improvement of the computer computing power, the deep neural network is put into application. The performance of deep neural network based methods has outperformed traditional methods, and even humans, on many visual tasks. For example, on the task of image classification, the classification accuracy of the deep neural network-based method is higher than that of human beings. Therefore, the multi-target tracking method based on the deep neural network also becomes the mainstream of the current multi-target tracking method. The multi-target tracking method mostly adopts a strategy of firstly detecting and then tracking, and a multi-target tracking task is converted into a bipartite graph matching problem. Thus, the multi-target tracking method needs to accurately calculate the similarity between two labeling frames. The similarity of the two labeling frames is calculated based on the characteristics of the labeling frames, and the more accurate characteristics can obtain more accurate similarity, so that a more accurate similarity matrix is obtained to obtain better tracking performance. In order to obtain better characteristics, the multi-target tracking method based on the deep neural network is integrated into the design of the multi-target tracking method based on a single-target tracker, an attention mechanism, LSTM and Re-ID characteristics and the like.

The multi-target tracking task can be completed by a graph model: minimum cost streams and conditional random fields. However, in these graph models, the structure of the graph is fixed, so that the nodes and edges are also fixed, which results in that the characteristic inaccurate nodes and edges cannot be corrected.

In view of the foregoing, it is desirable to provide a method and system for online or near online multi-target tracking capable of updating nodes and edges with high accuracy.

Disclosure of Invention

In order to solve the problems, the application provides a multi-target tracking method and system based on a graph neural network.

In one aspect, the application provides a multi-target tracking method based on a graph neural network, including:

preprocessing a training set to obtain a node set, an edge set and a global variable;

inputting the node set, the edge set and the global variable to a graph neural network to obtain an appearance similarity matrix and a motion similarity matrix;

training the graph neural network according to a loss function, the appearance similarity matrix and the motion similarity matrix by using an optimization algorithm, determining the setting parameters of the graph neural network, and then training to obtain the trained graph neural network;

processing the data set by using the trained neural network to obtain a similarity matrix;

and calculating the similarity matrix by using a matching algorithm to obtain a matching result of the targets in the data set.

Preferably, the preprocessing the training set to obtain a node set, an edge set and a global variable includes:

dividing the training set into a first training set and a second training set;

extracting depth features of all the labeled boxes in the frame of the first training set by using a preprocessing neural network to obtain an appearance node set;

calculating the overlapping degree of the labeling frames of all the adjacent frames in the first training set to obtain an edge set between the two labeling frames;

extracting the position, size and movement data of the labeling frames in all frames in the first training set to obtain the motion characteristics of each labeling frame;

normalizing all motion characteristics to obtain a motion node set;

one-dimensional vectors are initialized randomly as global variables.

Preferably, the inputting the node set, the edge set, and the global variable into a graph neural network to obtain an appearance similarity matrix and a motion similarity matrix includes:

inputting an appearance node set, an edge set and a global variable to an appearance feature map neural network to obtain an appearance similarity matrix;

and inputting the motion node set, the edge set and the global variable to a motion characteristic graph neural network to obtain a motion similarity matrix.

Preferably, the inputting the appearance node set, the edge set and the global variable into an appearance feature map neural network to obtain an appearance similarity matrix includes:

inputting an edge set, two appearance nodes corresponding to each edge in the input edge set and a global variable to a first edge neural network to obtain a first updating edge;

inputting the first updating edge, two appearance nodes connected through the first updating edge and a global variable to a node neural network to obtain a first updating node;

fusing all the obtained first updating edges to obtain a second updating edge;

fusing all sending nodes in the appearance nodes and the obtained first updating node to obtain a second updating node;

fusing all the first updating edges to obtain a second updating edge;

inputting the global variable, a second updating node and a second updating edge to a first global neural network to obtain a first updating global variable;

and inputting the updated global variable, all the sending nodes in the appearance nodes, the first updated node and the first updated edge to a second edge neural network to obtain an appearance similarity matrix.

Preferably, the inputting the motion node set, the edge set, and the global variable to the motion feature map neural network to obtain a motion similarity matrix includes:

fusing all edges in the edge set to obtain a third updating edge;

fusing all motion nodes in the motion node set to obtain a third update node;

inputting the global variable, a third updating edge and a third updating node to a second global neural network to obtain a second updating global variable;

inputting an edge set, two motion nodes corresponding to each edge in the input edge set, and the second updated global variable to a third neural network to obtain a motion similarity matrix.

Preferably, the using optimization algorithm, training the graph neural network according to a loss function, the appearance similarity matrix and the motion similarity matrix, determining setting parameters of the graph neural network, and then training to obtain the trained graph neural network, includes:

training an appearance feature map neural network according to an appearance loss function and the appearance similarity matrix by using an optimization algorithm;

if the appearance loss is lower than the appearance loss threshold value, determining the setting parameters of the appearance feature map neural network by using a second training set;

training an appearance feature map neural network with set parameters according to an appearance loss function and a training set by using an optimization algorithm, and if the obtained appearance retraining loss is lower than an appearance loss threshold value, obtaining the trained appearance feature map neural network;

training a motion characteristic diagram neural network according to a motion loss function and the motion similarity matrix by using an optimization algorithm;

if the motion loss is lower than the motion loss threshold value, determining the setting parameters of the neural network of the motion characteristic diagram by using a second training set;

training a motion characteristic diagram neural network with set parameters according to a motion loss function and a training set by using an optimization algorithm, and if the obtained motion retraining loss is lower than a motion loss threshold value, obtaining the trained motion characteristic diagram neural network;

and finishing training of the appearance characteristic diagram neural network and the motion characteristic diagram neural network to obtain the trained diagram neural network.

Preferably, the determining of the appearance loss function comprises:

determining a first loss function according to the target similarity and the appearance similarity matrix;

determining node difference according to a receiving node in the appearance nodes and a first updating node corresponding to the receiving node;

determining a second loss function according to the node difference and the target similarity;

and determining an appearance loss function according to the first loss function and the second loss function.

Preferably, the processing the data set by using the trained neural network to obtain the similarity matrix includes:

processing the data set by using the trained neural network to obtain an appearance similarity matrix and a motion similarity matrix of the data set;

and carrying out weighted sum on the appearance similarity matrix and the motion similarity matrix of the data set to obtain a similarity matrix.

Preferably, after the calculating the similarity matrix by using the matching algorithm to obtain the matching result of the target in the dataset, the method further includes:

a single object tracker and/or a linear motion model is used to recover missing objects in the data set.

In a second aspect, the present application provides a multi-target tracking system based on a graph neural network, including:

the preprocessing module is used for preprocessing the training set to obtain a node set, an edge set and a global variable;

the training module is used for inputting the node set, the edge set and the global variable to a graph neural network to obtain an appearance similarity matrix and a motion similarity matrix; training the graph neural network according to a loss function, the appearance similarity matrix and the motion similarity matrix by using an optimization algorithm, determining the setting parameters of the graph neural network, and then training to obtain the trained graph neural network;

the processing module is used for processing the data set by using the trained graph neural network to obtain a similarity matrix; and calculating the similarity matrix by using a matching algorithm to obtain a matching result of the targets in the data set.

The application has the advantages that: the method has the advantages that the node set, the edge set and the global variable can be updated by using the graph neural network through an optimization algorithm and a loss function, the trained graph neural network can be obtained by training after the setting parameters of the graph neural network are determined, the accuracy of the graph neural network can be improved, the characteristic types are increased by introducing and updating the global variable outside the nodes and the edges, the graph neural network can capture global information, the accuracy of multi-target tracking similarity can be improved, and the multi-target tracking performance is improved.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to denote like parts throughout the drawings. In the drawings:

FIG. 1 is a schematic diagram illustrating steps of a multi-target tracking method based on a graph neural network according to the present application;

FIG. 2 is a schematic diagram of an appearance characteristic diagram neural network of a multi-target tracking method based on the diagram neural network provided by the application;

FIG. 3 is a schematic diagram of a neural network of a motion characteristic diagram of a multi-target tracking method based on the neural network of the present application;

FIG. 4 is a schematic tracking flow diagram of a multi-target tracking method based on a graph neural network provided by the present application;

fig. 5 is a schematic diagram of a multi-target tracking system based on a graph neural network provided in the present application.

Detailed Description

Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In a first aspect, according to an embodiment of the present application, a multi-target tracking method based on a graph neural network is provided, as shown in fig. 1, including:

s101, preprocessing a training set to obtain a node set, an edge set and a global variable;

s102, inputting a node set, an edge set and a global variable to a Graph Neural Network (GNN) to obtain an appearance similarity matrix and a motion similarity matrix;

s103, training a graph neural network according to the loss function, the appearance similarity matrix and the motion similarity matrix by using an optimization algorithm, determining setting parameters of the graph neural network, and then training to obtain the trained graph neural network;

s104, processing the data set by using the trained neural network to obtain a similarity matrix;

and S105, calculating the similarity matrix by using a matching algorithm to obtain a matching result of the target in the data set.

Preprocessing a training set to obtain a node set, an edge set and a global variable, wherein the method comprises the following steps:

dividing the training set into a first training set and a second training set;

extracting the position, size and movement data of the labeling frames in all the frames in the first training set to obtain the motion characteristics of each labeling frame;

normalizing all motion characteristics to obtain a motion node set;

one-dimensional vectors are initialized randomly as global variables.

Inputting the node set, the edge set and the global variable into a graph neural network to obtain an appearance similarity matrix and a motion similarity matrix, wherein the method comprises the following steps:

and inputting the motion node set, the edge set and the global variable to a neural network of a set motion characteristic diagram to obtain a motion similarity matrix.

Inputting an appearance node set, an edge set and a global variable into an appearance feature map neural network to obtain an appearance similarity matrix, wherein the method comprises the following steps:

inputting a first updating edge, two appearance nodes connected through the first updating edge and a global variable to a node neural network to obtain a first updating node;

fusing all the obtained first updating edges to obtain a second updating edge;

fusing all the first updating edges to obtain a second updating edge;

Inputting a motion node set, an edge set and a global variable to a motion characteristic graph neural network to obtain a motion similarity matrix, wherein the motion similarity matrix comprises the following steps:

fusing all edges in the edge set to obtain a third updating edge;

fusing all motion nodes in the motion node set to obtain a third update node;

inputting an edge set, two motion nodes corresponding to each edge in the input edge set, and a second updated global variable to a third neural network to obtain a motion similarity matrix.

Training a graph neural network according to the loss function, the appearance similarity matrix and the motion similarity matrix by using an optimization algorithm, determining the setting parameters of the graph neural network, and then training to obtain the trained graph neural network, wherein the training comprises the following steps:

training an appearance characteristic diagram neural network according to an appearance loss function and an appearance similarity matrix by using an optimization algorithm;

training a motion characteristic diagram neural network according to a motion loss function and a motion similarity matrix by using an optimization algorithm;

Determining an appearance loss function, comprising:

an appearance loss function is determined according to the first loss function and the second loss function.

Determination of a motion loss function, comprising:

and determining a motion loss function according to the target similarity and the motion similarity matrix.

Processing the data set by using the trained neural network to obtain a similarity matrix, wherein the similarity matrix comprises the following steps:

After the similarity matrix is calculated by using a matching algorithm to obtain a matching result of the target in the data set, the method further comprises the following steps:

The missing target for the current frame (the frame currently being processed) is recovered back with the single target tracker. And if a first recovery threshold value exists, tracking the lost target of the current frame by using the single-target tracker, and when the single-target tracker gives a tracking result with a confidence coefficient higher than the first recovery threshold value, recovering the lost target of the current frame by using the tracking result given by the single-target tracker.

The object lost before the current frame and retrieved in the current frame is recovered by a linear motion model. Assume that there is a second recovery threshold and that the tracked object has a constant moving speed during the period of loss. If the number of lost frames is less than the second recovery threshold, the target in the lost frame is recovered by using the two connected marking frames, and if the number of lost frames is greater than the second recovery threshold, the target is blocked and does not need to be recovered.

The global variable is a one-dimensional vector and is used for representing global information captured by the neural network of the graph in the updating process.

The extracted depth features are used to obtain appearance features (appearance nodes) that do not include: position information and motion information.

The appearance node comprises the following steps according to the time of the frame: the system comprises a sending node and a receiving node, wherein the node at the front of time is the sending node, and the node at the back of time is the receiving node. An edge connects a node in each of two adjacent frames. At the same time as an edge is input, two nodes (a transmitting node and a receiving node) connected through the edge are input.

The first update node is a receiving node updated using a first update edge, two appearance nodes connected by the first update edge, and a global variable.

The difference function for calculating the node (node feature) difference includes: Mean-Square Error (MSE), Cosine Distance (Cosine Distance), and the like.

The labeling boxes in the training set are artificially labeled.

Take for example two adjacent frames a and b. Assuming that the frame a is a previous frame, in which there are 3 labeled frames, and the frame b is a next frame, in which there are 2 labeled frames, after the frame a and the frame b are input to the graph neural network, there are 6 edges in total, that is, the number of edges between each labeled frame in the frame a and each labeled frame in the frame b. Each edge represents the similarity between the two objects (labeled boxes) connected to it.

According to the characteristics of the multi-target tracking task, the appearance feature map neural network and the motion feature map neural network have respective sub-modules and a mode of updating the map neural network edge.

Next, as shown in fig. 2 and 3, the present embodiment will be further explained.

Firstly, an appearance feature map neural network based on appearance features and a motion feature map neural network based on motion features are constructed.

And (3) extracting the appearance features of the labeling frame by using a pre-trained neural network (pre-processing neural network) based on the appearance feature neural network of the appearance feature graph, and expressing the outer nodes by using the extracted appearance features to obtain an appearance node set. The edges of two labeling frames in adjacent frames use their overlapping degrees (IoU) as initialized edges to obtain an edge set. The overlapping degree is the ratio of the area of the common area of the two labeling boxes to the area of the coverage area of the two labeling boxes.

The motion feature extraction method based on the motion feature neural network of the motion feature map comprises the following steps: position, size, movement data, etc. Expressing the motion nodes by the normalized motion characteristics to obtain a motion node set; the edges of the two label boxes are also used as the initialized edges by using the overlapping degree of the edges to obtain an edge set.

The sub-modules of the appearance feature map neural network comprise: updating two neural networks of edge e in the edge set, the first edge neural network

And a second edge neural network

First global neural network for updating global variable u

Updating a receiving node v_rNode neural network of

The node in the node set is v, and the sending node is v_sThe receiving node is v_r. The appearance feature based graph neural network uses symbols with superscript a.

The sub-modules of the motor characteristic map neural network comprise: second global neural network for updating global variables

Third edge neural network for updating edges

The motion feature based graph neural network uses symbols with superscript m.

The neural network in the appearance characteristic diagram neural network and the motion characteristic diagram neural network comprises: multi-tier perceptrons and/or RNNs, etc.

The updating mode of the neural network based on the map of the appearance features is that, as shown in figure 2,

using global variables u^aTwo nodes connected by an edge

(the sending node) of the first node,

(receiving node) and edge e^aTo update this edge to get the first updated edge e^a′。

Using global variables u^aTwo nodes connected by an edge

And a first updated edge e obtained after updating^a′Updating the receiving node to obtain a first updated node

Representing, fusing all the sending nodes and the updated receiving nodes to obtain a second updated node

Representing and fusing all the updated edges to obtain a second updated edge

Using global variables u^aAnd a second updating edge obtained after the fusion updating

And a second update node obtained after the fusion update

Updating the global variable to obtain a first update u^a′。

Using the first updated global variable u obtained after updating^a′Node connected by edge

And a first update edge e^a′The edge is updated again to obtain the appearance edge moment e^a″。

The updating mode of the neural network based on the motion characteristic of the motion characteristic diagram is, as shown in figure 3,

using global variables u^mAnd a third update node obtained after fusion

And a third updated edge obtained after the fusion

Updating the global variable to obtain a second updated global variable u^m′。

Updating the global variable u by using the second updated global variable obtained after updating^m′Two nodes connected by edges

And edge e^mTo update the edge to obtain a motion similarity matrix e^m′。

Representing and fusing all the edges to obtain a third updated edge

Representing, fusing all the sending nodes and the receiving nodes to obtain a third updating node

The fusion mode comprises the following steps: addition and averaging, RNN fusion, etc.

And training the graph neural network by using the multi-target tracking training data set.

The training data set is divided into a training set (first training set) and a validation set (second training set).

An appearance loss function and a motion loss function are determined.

For the graph neural network, the cross entropy L of the similarity (target similarity) p of the real similarity and the similarity (appearance similarity matrix or motion similarity matrix) p' of the output of the appearance feature or motion feature graph neural network is adopted_match-plogp '- (1-p) log (1-p') as a first loss function. For a graph neural network with updated nodes of the appearance feature graph neural network, since it makes sense to update the node features only when two nodes belong to the same target, the two nodes should not be updated when they are not the same target. Therefore, the nerves are mapped to the appearance featuresSecond loss function for node update in the network: l is_NodeUpdate＝(1-p)Difference(v_r,v′_r) Wherein v is_rDenotes a receiving node, v'_rIs represented by the formula_rDifference represents a function for calculating the Difference in node characteristics with respect to an updated receiving node belonging to an object. The function that calculates the difference in node characteristics may be mean square error, cosine distance, and the like. Therefore, the appearance loss function of the appearance feature map neural network is: l is_a＝L_match+L_NodeUpdate(ii) a The motion loss function of the neural network of the motion profile is: l is_m＝L_match. When the first loss function is used for the neural network of the appearance characteristic diagram, the similarity p 'is an appearance similarity matrix, and when the first loss function is used for the neural network of the motion characteristic diagram, the similarity p' is a motion similarity matrix.

The similarity p is a direct similarity matrix of each labeled frame in adjacent frames, and the numerical value is 1 or 0, i.e., the same is 1, and the different is 0. The similarity p' of the neural network output of the graph is a similarity matrix with similarity values between 0 and 1.

And based on the loss function, training the appearance feature map neural network and the motion feature map neural network respectively by using a first training set and an optimization algorithm. The optimization algorithm may preferably be a gradient-based optimization algorithm, such as a Stochastic Gradient Descent (SGD) method, Adam and adagard, and the like.

And when the appearance loss obtained according to the appearance loss function is smaller than an appearance loss threshold value, obtaining a trained appearance feature map neural network. Using the verification set, setting parameters of the neural network of the appearance feature map are determined.

And when the motion loss obtained according to the motion loss function is smaller than a motion loss threshold value, obtaining a trained motion characteristic diagram neural network. Using the verification set, setting parameters of the neural network of the appearance feature map are determined.

The setting parameters mainly comprise some hyper-parameters, such as \ alpha and some threshold settings. And selecting the setting of the appearance characteristic diagram neural network and the movement characteristic diagram neural network with the best tracking performance by running out the tracking performance (the tracking performance has special indexes: MOTA, IDF1, FP, FN, IDsw and the like) on the verification set.

Inputting training data comprising a training set and a verification set to the appearance feature map neural network with set parameters, retraining the appearance feature map neural network with the set parameters according to an appearance loss function by using an optimization algorithm, and obtaining the trained appearance feature map neural network when the obtained appearance retraining loss is lower than an appearance loss threshold.

Inputting training data comprising a training set and a verification set to a motion characteristic diagram neural network with set parameters, retraining the motion characteristic diagram neural network with the set parameters according to a motion loss function by using an optimization algorithm, and obtaining the trained motion characteristic diagram neural network when the obtained motion retraining loss is lower than an appearance loss threshold value.

And obtaining the trained neural network of the image after the neural network of the appearance characteristic diagram and the neural network of the motion characteristic diagram with the set parameters are trained.

During network training, global variables are initially obtained by random initialization and then updated according to training. And after the trained network is obtained, using the updated global variable as the global variable of the trained network to perform multi-target tracking.

And performing multi-target tracking on the test data set by using the trained graph neural network, as shown in FIG. 4.

And constructing nodes and edge stages of the neural network of the graph, and taking each labeling box in each frame as a node respectively and expressing the node by using an external feature or a motion feature. And extracting depth features by using a pre-trained network, marking the position, the size and the speed of a frame by using the motion features, and normalizing the motion features by using the size of the picture.

The connection from the previous frame of the label box to the current label box is represented by a directed edge, and IoU of the two connected label boxes is used for initializing the corresponding edge.

And respectively updating global variables of the appearance feature map neural network and the motion feature map neural network.

An updating edge stage of the graph neural network, wherein the appearance characteristic graph neural network and the motion characteristic graph neural network respectively update edges, and the appearance similarity matrix M of the similarity matrix is obtained according to the last updated edge_aAnd motion similarity matrix M_m。

In the tracking stage, the two obtained similarity matrixes are weighted and combined to obtain a final similarity matrix M which is α M_a+(1-α)M_mα is a hyper-parameter (weight) set of optimal parameters chosen on the verification set.

And obtaining an optimal matching result (result with the maximum probability) by using a matching algorithm based on the obtained final similarity matrix, wherein the obtained matching result is the tracking result of the graph neural network. The matching algorithm comprises the following steps: hungarian algorithm, greedy algorithm, etc.

For the target which is not tracked in the current frame, firstly, a single target tracker is used for recovery. Will exceed tau_outThe target node is removed from the neural network of the graph; and adding the frame which is not connected with the target in the frame before the current frame as a new target into the graph neural network. Wherein, tau_outAnd the parameters are hyper-parameters, and the optimal parameters are selected from the verification set.

For some previous frames, the single-target tracker does not track the recovered targets, but the neural network tracks the recovered targets, and the linear motion model is used to recover the lost targets in the frames. In the tracking result, the frame before the current frame without matching the labeled frame in the frame before the current frame is recovered by using a single target tracker, such as DaSiamRPN. In particular, when the confidence given by the single-target tracker is higher than the first recovery threshold τ_confAnd in time, the target is recovered by using a tracking result given by the single-target tracker. When the number of the frame of the lost label frame in the middle of the connection is less than the second recovery threshold tau_gapThen, the linear motion model is used for recovery; otherwise, these missing parts are considered occluded and not restored. Tau is_confAnd τ_gapHyper-parametric, selected optima on verification setAnd setting parameters.

In a second aspect, according to an embodiment of the present application, there is further provided a multi-target tracking system based on a graph neural network, as shown in fig. 5, including:

the preprocessing module 101 is configured to preprocess the training set to obtain a node set, an edge set, and a global variable;

the training module 102 is used for inputting the node set, the edge set and the global variable into a graph neural network to obtain an appearance similarity matrix and a motion similarity matrix; training a graph neural network according to the loss function, the appearance similarity matrix and the motion similarity matrix by using an optimization algorithm, determining the setting parameters of the graph neural network, and then training to obtain the trained graph neural network;

the processing module 103 is configured to process the data set by using the trained neural network to obtain a similarity matrix; and calculating the similarity matrix by using a matching algorithm to obtain a matching result of the target in the data set.

According to the method, the node set, the edge set and the global variable can be updated through an optimization algorithm and a loss function by using the graph neural network, the graph neural network is trained after the set parameters of the graph neural network are determined, the trained graph neural network is obtained, the accuracy of the graph neural network can be improved, the characteristic types are increased by introducing and updating the global variable outside the nodes and the edges, the global information can be captured by the graph neural network, the accuracy of the multi-target tracking similarity can be improved, and therefore the multi-target tracking performance is improved. And the similarity among the nodes in the neural network of the graph is updated through the neural network of the graph, so that more accurate similarity is obtained, and better tracking performance is obtained. Because nodes and edges can be updated in the graph neural network, in addition, the graph model does not model global information, and the introduction of global variables in the graph neural network enables the graph neural network to capture the global information. The method and the device are suitable for online and near-online multi-target tracking. The existing multi-target tracking method has no multi-target tracking method based on a graph neural network.

The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A multi-target tracking method based on a graph neural network is characterized by comprising the following steps:

2. The method of claim 1, wherein preprocessing the training set to obtain a set of nodes, a set of edges, and global variables comprises:

dividing the training set into a first training set and a second training set;

normalizing all motion characteristics to obtain a motion node set;

one-dimensional vectors are initialized randomly as global variables.

3. The method of claim 1, wherein said inputting the set of nodes, the set of edges, and the global variables into a graph neural network, resulting in an appearance similarity matrix and a motion similarity matrix, comprises:

4. The method of claim 3, wherein inputting the set of appearance nodes, the set of edges, and the global variable into an appearance feature map neural network, resulting in an appearance similarity matrix, comprises:

fusing all the obtained first updating edges to obtain a second updating edge;

fusing all the first updating edges to obtain a second updating edge;

5. The method of claim 3, wherein inputting the set of motion nodes, the set of edges, and the global variable to motion feature map neural network results in a motion similarity matrix comprising:

fusing all edges in the edge set to obtain a third updating edge;

fusing all motion nodes in the motion node set to obtain a third update node;

6. The method of claim 1, wherein the using an optimization algorithm to train the graph neural network according to a loss function, the appearance similarity matrix and the motion similarity matrix, determine setting parameters of the graph neural network, and train again to obtain the trained graph neural network comprises:

7. The method of claim 6, wherein the determining of the appearance loss function comprises:

8. The method of claim 1, wherein processing the data set using the trained neural network to obtain a similarity matrix comprises:

9. The method of claim 1, wherein after said calculating the similarity matrix using a matching algorithm to obtain a matching result for the objects in the dataset, further comprising:

10. A multi-target tracking system based on a graph neural network is characterized by comprising: