CN111814604A

CN111814604A - Pedestrian tracking method based on twin neural network

Info

Publication number: CN111814604A
Application number: CN202010584083.9A
Authority: CN
Inventors: 王云涛; 潘海鹏; 马淼
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT; Zhejiang Sci Tech University ZSTU; Zhejiang University of Science and Technology ZUST
Priority date: 2020-06-23
Filing date: 2020-06-23
Publication date: 2020-10-23

Abstract

The invention belongs to the field of computer vision, and particularly relates to a pedestrian tracking method based on a twin neural network, which comprises the following steps: inputting a video; marking the pedestrian; acquiring a pedestrian space-time group; establishing and training a twin neural network, and storing the trained twin neural network; acquiring a short track of a pedestrian; and acquiring a long track of the pedestrian. The method is adopted to track the pedestrians, and the accuracy rate of tracking the pedestrians is effectively improved.

Description

Pedestrian tracking method based on twin neural network

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a pedestrian tracking method based on a twin neural network.

Background

With the development of artificial intelligence technology, computer vision has been widely applied to human daily life such as smart home, video surveillance and intelligent transportation, and pedestrian tracking is one of the key problems in these fields. Because the pedestrian target can change in posture, size, appearance and pedestrian shelter from in the motion process for it becomes very difficult to realize the accurate tracking of pedestrian.

The early target tracking algorithm is mainly based on target modeling or tracking target features, and the main methods are as follows: (1) the feature matching method comprises the steps of firstly extracting target features, and then finding the most similar features in subsequent frames to carry out target positioning; (2) based on a search method, people add a prediction algorithm into tracking and search targets near predicted values, so that the search range is reduced, however, the methods no longer meet the current pedestrian tracking requirements, and new methods are urgently needed to replace the methods; and the traditional method is greatly influenced by factors such as image illumination change, pedestrian posture change, image noise and the like. With the rapid development of deep learning, the method plays an important promoting role in the field of pedestrian tracking, wherein the twin neural network is a model capable of judging whether two input modes are similar or not, and can be used in the field of pedestrian tracking to judge whether appearances of pedestrians are similar or not. Accordingly, there is a need for improvements in the art.

Disclosure of Invention

The invention aims to solve the problem of providing a pedestrian motion trajectory calculation method based on a twin neural network, which is used for tracking pedestrians and effectively improves the accuracy of pedestrian tracking.

In order to solve the above problems, the present invention provides a pedestrian tracking method based on a twin neural network, comprising the following steps:

step 1, video input:

acquiring by monitoring equipment (a camera and other monitoring equipment) outdoors or indoors to obtain a video file, and inputting the acquired video file into a computer, wherein the acquired video file comprises each frame of video image of a video sequence;

step 2, pedestrian marking:

for each frame of video image input in the step 1, detecting and marking the position of a pedestrian on each frame of video image by using a DPM pedestrian detection technology to obtain a video sequence with a pedestrian position mark;

step 3, acquiring pedestrian space-time groups:

marking the position with the pedestrian in the step 2The recorded video sequence is divided into n sections at intervals of 1 second, and each section is a video image i; for each video image i, obtaining pedestrian space-time group by using a hierarchical clustering method according to pedestrian positions

N is a positive integer, i is not more than n, and j represents the space-time group serial number in the ith video image;

step 4, establishing and training a twin neural network, and storing the trained twin neural network;

step 5, acquiring a short track of the pedestrian:

step 5-1, for each spatio-temporal group in each video image i obtained in step 3

The pedestrians in the pedestrian seat are combined in pairs, and each combination is two pictures containing the pedestrians and having the size of 128 × 64 × 3;

step 5-2, stacking the two pictures containing the pedestrians in each combination obtained in the step 5-1 and having the size of 128 × 64 × 3 into a form of 128 × 64 × 6, inputting the pictures into the twin neural network trained in the step 4, and obtaining each space-time group

The appearance similarity P of each pair of pedestrians in the interior_AS；

Step 5-3, calculating the pedestrian movement speed

For the pedestrian position p in the t-th frame, finding the detection p closest to p by calculating the distance in the adjacent frame_kK is the index of the frame, and each space-time group is calculated according to the following formula

Inner pedestrian velocity v_pedestrian：

Step 5-4, calculating the movement affinity

Calculating each spatiotemporal group according to the following formula by using the pedestrian movement speed calculation obtained in the step 5-2

Within each pedestrian pair motion affinity P_MA：

P_MA＝max(1-β*e(P₁,P₂),0)，

e(P₁,P₂)＝min(e_forward(P₁,P₂),e_backward(P₁,P₂))，

Wherein P is₁And P₂For two pedestrians, β is the parameter to be regulated, e_Xforward,e_Yforward,e_XbackwardAnd e_YbackwardIs the forward error and the reverse error of the pedestrian movement in the X direction and the Y direction, e_forwardAnd e_back_wardIs the forward error and reverse error of the pedestrian's motion, P_1x,P_1y,P_2xAnd P_2yIs the X and Y coordinates of two pedestrian positions, t₁And t₂Is the frame index of two pedestrians, v obtained in step 5-3_pedestrianDecomposed into X-and Y-direction movement velocities, v_1x,v_1y,v_2xAnd v_2yThe moving speeds of the two pedestrians in the X direction and the Y direction respectively;

step 5-5, establishing a pedestrian correlation matrix

Using sigmoid function to combine each spatiotemporal group obtained in step 5-2

Appearance similarity P of pedestrians in each team_ASAnd each spatio-temporal group obtained in step 5-4

Within each pedestrian pair motion affinity P_MAThe pedestrian correlation matrix PeC is fused as follows:

wherein λ and μ are parameters for adjustment;

step 5-6, solving classification

Using the maximum correlation principle for each spatio-temporal group obtained in step 5-5

Solving and classifying the pedestrian correlation matrix PeC in the pedestrian identification matrix, wherein each obtained class respectively represents the pedestrian with the same identity; a graph G ═ V, E, W is provided, where the vertex V represents each pedestrian detection, E represents the side to which the pedestrian detections are connected, and W represents the correlation between two pedestrians, and the maximum correlation principle is as follows:

wherein v is₁,v₂,v₃Representing pedestrian detection in V, w represents the correlation degree between pedestrians, and the value of x is 0 or 1;

step 5-7, obtaining short tracks of pedestrians

Merging the pedestrians in each class obtained in the step 5-6, so as to obtain short tracks of the pedestrians in each video image i;

step 6, acquiring long tracks of pedestrians

6-1, selecting key pedestrians

Processing short tracks of the pedestrians in each section of the video image i obtained in the step 5-5 in the sliding time window, and selecting key pedestrians in the short tracks in each section of the video image i by using the appearance similarity of the pedestrians for all the short tracks in one sliding time window, and using the key pedestrians to represent one short track;

step 6-2, combining every two key pedestrians in short tracks

In a sliding time window, combining every two key pedestrians with each short track in a certain section of video image i selected in the step 6-1 and every two key pedestrians with each short track in other sections of video images j, wherein each combination is two pictures with the size of 128 × 64 × 3 and containing the key pedestrians;

step 6-3, calculating the similarity of the short tracks

Inputting two pictures with the size of 128 × 64 × 3 containing key pedestrians into the twin neural network trained in the step 4, and outputting the result as the appearance similarity of the two key pedestrians with short tracks

Then, the operation is performed according to the following formula, and the similarity of the two short tracks is output:

wherein the content of the first and second substances,

representing the appearance similarity of two short-track key pedestrians, and N representing the coincidence condition

K of_ASSet, N represents the number of elements in N;

step 6-4, calculating the movement speed v of each short track_trackl_et

The motion speed of each short track is calculated according to the following formula:

P_Sindicating the start of a short track, P_eIndicating the end of a short track, T_SIndicating the start of a short track, T_eA termination frame representing a short track;

step 6-5, calculating the motion affinity Tr between the short tracks_MA

Tr_MA＝max(1-β*e(Tr₁,Tr₂),0)

e(Tr₁,Tr₂)＝e_forward(Tr₁,Tr₂)+e_backward(Tr₁,Tr₂)

Wherein P is₁And P₂For two short tracks, beta is the parameter to be adjusted, e_Xforward,e_Yforward,e_XbackwardAnd e_YbackwardIs the forward and reverse error of the short track in the X and Y directions, e_forwardAnd e_backwardAre the forward error and the reverse error of the short track motion,

and

is the end point coordinate of a short track,

and

is the starting point coordinate of another short track, t₁And t₂Of one short track end point coordinate and another short trackFrame index of start point coordinates, v obtained by S0604_trackl_etDecomposed into X-and Y-direction movement velocities, v_1x,v_1y,v_2xAnd v_2yThe moving speeds of the two short tracks in the X direction and the Y direction are respectively;

step 6-6, establishing a short track correlation matrix

Similarity Tr of short tracks obtained in step 6-3 by using sigmoid function_sAnd the kinematic affinity Tr of the short trajectory obtained in step 6-5_MAAnd (3) fusing into a short track correlation matrix TrC:

where λ and μ are parameters for regulation;

step 6-7, solving classification

Solving and classifying the short track correlation matrix TrC obtained in the step 6-6 by using a maximum correlation principle, wherein each obtained class respectively represents the short tracks with the same identity, the maximum correlation principle is as follows, and a graph G ═ V, E, W is set, wherein a vertex V represents each short track, E represents an edge connected with the short tracks, and W represents the correlation between the two short tracks:

wherein v is₁,v₂,v₃Representing short tracks in V, w represents the correlation degree between the short tracks, and the value of x is 0 or 1;

step 6-8, merging the growth tracks

Merging the short tracks of each type obtained by solving and classifying in the step 6-7, so as to merge the short tracks of the pedestrians with the same identity into a new track; as the sliding time window progresses, a complete long trajectory for all pedestrians is finally obtained.

The invention relates to an improvement of a pedestrian tracking method based on a twin neural network, which comprises the following steps: step 4, the establishment and training of the twin neural network comprises the following steps:

step 4-1, creating a twin neural network

A twin neural network is built by using a Pythrch frame, and the structure is as follows: two 128 × 64 × 3 pedestrian pictures are stacked into 128 × 64 × 6 and then used as the input of the twin neural network, and pass through three layers of convolution layers, wherein the convolution kernel size of the first layer is 9 × 12, the convolution kernel size of the second layer is 5 × 16, and the convolution kernel size of the third layer is 5 × 24; each layer convolution contains batch normalization, relu activation function, pooling of 2 x 2; the output of the three-layer convolution is subjected to one-dimensional operation to obtain a vector of 1 x 1152, then the vector passes through two full-connected layers with the sizes of 1 x 150 and 1 x 2, and finally the output result is the appearance similarity of two pedestrian pictures;

step 4-2, training twin neural network

Data set setting: downloading a CUHK02 pedestrian tracking data set from the Internet, wherein the data set comprises 1816 pedestrian identities, each pedestrian is shot by 2 camera angles, each camera angle has 2 photos, the first 80% of pedestrians are taken as a training set, and the rest pedestrians are taken as a testing set;

training process: two rounds of twin neural network training established in the step 4-1 are carried out, in the first round, training data are pictures of a first shooting angle of pedestrians in a training set, sample combinations are randomly generated in the training process, and the proportion of positive samples to negative samples is controlled to be 1: 1, the positive samples are two pedestrians with the same identity, and the negative samples are two pedestrians with different identities; in the second round, the picture of the second camera angle of the pedestrian in the training set is used as training data, and the generation mode of the positive sample and the negative sample is the same as that of the first round;

training and testing: after training is finished, testing the trained twin neural network by using the test set, wherein the output of the twin neural network is the appearance similarity of two pedestrians, and when the appearance similarity of the two pedestrians with the same identity is more than or equal to 0.5, the two pedestrians are considered to be correct; on the contrary, the two pedestrians with different identities are considered to be correct when the appearance similarity is less than or equal to 0.5; and finally, the accuracy of the test result reaches more than 98%, and the trained twin neural network is stored.

The invention has the following technical advantages:

1. according to the invention, the twin neural network is used for obtaining the appearance similarity between pedestrians, the influence of factors such as image illumination change, pedestrian posture change and image noise is small, and accurate pedestrian tracking can be still carried out when the problems are met;

2. the invention realizes pedestrian tracking in a layered mode: firstly, short tracks are generated by pedestrian detection, then long tracks are generated by the short tracks, and the generation of the short tracks and the long tracks is realized by the twin neural network in the two steps;

3. the pedestrian information is fused based on the sigmoid function, and the pedestrian (short track) similarity and the motion affinity mapping based on the twin neural network are fused into the pedestrian correlation (short track correlation);

4. according to the invention, the appearance similarity and the motion affinity of the pedestrian are fully considered when the short track of the pedestrian is generated, the similarity and the motion affinity of the short track are fully considered when the long track of the pedestrian is generated, each step is comprehensively considered from two aspects, so that the tracking failure caused by the shielding of the pedestrian can be effectively avoided, and in addition, each step is solved and classified according to the maximum correlation principle to ensure the correct tracking of the pedestrian.

The invention is derived from the following fund: zhejiang province natural science fund project (LQ19F030014), and Zhejiang university youth innovation project (2019Q 035).

Drawings

The following detailed description of embodiments of the invention is provided in conjunction with the appended drawings:

FIG. 1 is a schematic diagram of an algorithm flow of a twin neural network-based pedestrian tracking method;

fig. 2 is a pedestrian position image in the PETS2009-S2L1 dataset detected using the DPM detector in step 2 of experiment 1;

fig. 3 is an image of the pedestrian location in the Town Center data set detected using the DPM detector in step 2 of experiment 1;

fig. 4 is a schematic diagram of an error between the trajectory of a pedestrian in the PETS2009-S2L1 data set obtained in step 5 of experiment 1 and the trajectory noted in the groudtuth file;

FIG. 5 is a schematic illustration of the error between the pedestrian's trajectory in the Town Center dataset obtained in step 5 of experiment 1 and the trajectory noted in the grountruth file;

FIG. 6 is a graph of the experimental effect of pedestrian trajectories for the PETS2009-S2L1 dataset obtained at step 5 in experiment 1;

fig. 7 is a graph showing the experimental effect of the pedestrian trajectory of the Town Center data set obtained in step 5 of experiment 1.

Detailed Description

The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto.

Embodiment 1, a pedestrian tracking method based on a twin neural network, as shown in fig. 1, includes the following steps:

s01, video input

Inputting a video file containing an object to be tracked into a computer, wherein the video file comprises each frame of video image, and the video file is acquired by a camera and other monitoring equipment outdoors or indoors;

s02, pedestrian marking:

for each frame of video image input in the step S01, detecting and marking the position of a pedestrian on each frame of video image by using a DPM pedestrian detection technology, to obtain a video sequence with a pedestrian position mark;

the DPM pedestrian detection technology belongs to the conventional technology, and for example, the DPM pedestrian detection algorithm disclosed in IEEETransactions on Pattern Analysis & Machine Analysis, 2010, by Felzenszwalb et al can be referred to.

S03, acquiring pedestrian space-time groups:

splitting the video sequence with the pedestrian position mark in S02 into n sections every 1 second, wherein each section is a video image i;

for each video image i, a plurality of pedestrian space-time groups are obtained by using a hierarchical clustering method according to the positions of pedestrians

s04, creating and training twin neural network

S0401, creation of twin neural networks:

a twin neural network is built by using a Pythrch frame, and the structure is as follows: two 128 × 64 × 3 pedestrian pictures are stacked into 128 × 64 × 6 and then used as the input of the twin neural network, and pass through three layers of convolution layers, wherein the convolution kernel size of the first layer is 9 × 12, the convolution kernel size of the second layer is 5 × 16, and the convolution kernel size of the third layer is 5 × 24. Each layer convolution contains batch normalization, relu activation function, pooling of 2 x 2; the output of the three-layer convolution is subjected to one-dimensional operation to obtain a vector of 1 x 1152, then the vector passes through two full-connected layers with the sizes of 1 x 150 and 1 x 2, and finally the output result is the appearance similarity of two pedestrian pictures;

s0402, training twin neural network

training process: two rounds of twin neural network training established for S0401 are performed, in the first round, training data are pictures of a first shooting angle of pedestrians in a training set, sample combinations are randomly generated in the training process, and the proportion of positive samples to negative samples is controlled to be 1: 1, the positive samples are two pedestrians with the same identity, and the negative samples are two pedestrians with different identities; in the second round, the picture of the second camera angle of the pedestrian in the training set is used as training data, and the generation mode of the positive sample and the negative sample is the same as that of the first round;

training and testing: after training is finished, testing the trained twin neural network by using the test set, wherein the output of the twin neural network is the appearance similarity of two pedestrians, and when the appearance similarity of the two pedestrians with the same identity is more than or equal to 0.5, the two pedestrians are considered to be correct; on the contrary, the two pedestrians with different identities are considered to be correct when the appearance similarity is less than or equal to 0.5; the accuracy of the final test result reaches more than 98%, and the trained twin neural network is stored;

s05, acquiring short tracks of pedestrians

For each video image i obtained in S03 and the pedestrian space-time group therein

The following treatments were carried out:

s0501, grouping each space-time

s0502, stacking two pictures with the size of 128 × 64 × 3 containing pedestrians in each combination obtained by S0501 into a form of 128 × 64 × 6, inputting the pictures into the S0402 trained twin neural network, and obtaining each space-time group

The appearance similarity P of each pair of pedestrians in the interior_AS；

S0503, calculating the pedestrian movement speed:

the pedestrian position in the t frame is p, and a plurality of detections p with the closest distance to p are found in the adjacent frames by calculating the distance_k(k is the index of the frame) for each spatio-temporal group

Inner pedestrian velocity v_pedestrianCalculated as follows:

s0504, calculation of kinetic affinity

Pedestrian movement velocity v obtained using S0502_pedestrianCalculating each space-time group according to the following formula

Within each pedestrian pair motion affinity P_MA：

P_MA＝max(1-β*e(P₁,P₂),0)，

e(P₁,P₂)＝min(e_forward(P₁,P₂),e_backward(P₁,P₂))，

Wherein P is₁And P₂For two pedestrians, β is the parameter that can be adjusted, e_Xforward,e_Yforward,e_XbackwardAnd e_YbackwardIs the forward error and the reverse error of the pedestrian movement in the X direction and the Y direction, e_forwardAnd e_backwardIs the forward error and reverse error of the pedestrian's motion, P_1x,P_1y,P_2xAnd P_2yIs the X and Y coordinates of two pedestrian positions, t₁And t₂Is the frame index of two pedestrians, v obtained by S0503_pedestrianCan be decomposed into X-direction and Y-direction moving speed, v_1x,v_1y,v_2xAnd v_2yThe moving speeds of the two pedestrians in the X direction and the Y direction respectively;

s0505, establishing pedestrian correlation matrix

Using sigmoid function to acquire each spatio-temporal group of S0502

Appearance similarity P of pedestrians in each team_ASAnd each spatio-temporal group obtained in S0504

wherein λ and μ are adjustable parameters;

s0506, solve classification

Each spatio-temporal group obtained for S0505 using the maximum correlation principle

Solving and classifying the pedestrian correlation matrix PeC in the pedestrian identification matrix, wherein each obtained class represents the pedestrian with the same identity; a graph G ═ V, E, W is provided, where the vertex V represents each pedestrian detection, E represents the side to which the pedestrian detections are connected, and W represents the correlation between two pedestrians, and the maximum correlation principle is as follows:

s0507, obtaining short track

Merging the pedestrians in each class obtained in the step S0506, namely obtaining short tracks of the pedestrians in each video image i;

s06, acquiring long tracks of pedestrians

In the sliding time window, the short trajectory of the pedestrian in each segment of video image i obtained in S0507 is processed:

s0601, selecting key pedestrians

For all short tracks in a sliding time window, selecting key pedestrians in the short tracks in each image video i by using the appearance similarity of the pedestrians, and representing one short track by using the key pedestrians;

s0602, two-by-two combination of key pedestrians between short tracks

In a sliding time window, combining every two key pedestrians with each short track in a certain section of video image i selected in S0601 and every two key pedestrians with each short track in other sections of video images j, wherein each combination is two pictures with the size of 128 × 64 × 3 and containing the key pedestrians;

s0603, calculating similarity of short tracks

Inputting two pictures with size of 128 × 64 × 3 containing key pedestrians into the S0402 trained twin neural network in the same way as the S0502, and outputting the result as the appearance similarity of the two key pedestrians with short tracks

Further operation is carried out according to the following formula, and the similarity Tr of the two short tracks is output_s：

Wherein the content of the first and second substances,

K of_ASSet, N represents the number of elements in N;

s0604, calculating motion speed v of short track_tracklet

s0605, calculating short-track motion affinity Tr_MA

Using the moving velocity of the short trajectories obtained in S0604, the moving affinity Tr between the short trajectories was calculated as follows_MA

Tr_MA＝max(1-β*e(Tr₁,Tr₂),0)

e(Tr₁,Tr₂)＝e_forward(Tr₁,Tr₂)+e_backward(Tr₁,Tr₂)

Wherein P is₁And P₂For two short tracks, beta is a parameter which can be adjusted, e_Xforward,e_Yforward,e_XbackwardAnd e_YbackwardIs the forward and reverse error of the short track in the X and Y directions, e_forwardAnd e_backwardAre the forward error and the reverse error of the short track motion,

and

is the end point coordinate of a short track,

and

is anotherCoordinates of the start of the short track of the bar, t₁And t₂V is obtained from the frame index of the end point coordinate of one short track and the frame index of the start point coordinate of the other short track, S0604_trackletCan be decomposed into X-direction and Y-direction moving speed, v_1x,v_1y,v_2xAnd v_2yThe moving speeds of the two short tracks in the X direction and the Y direction are respectively;

s0606, establishing short track correlation matrix

Similarity Tr of short tracks obtained by S0603 using sigmoid function_sAnd the kinematic affinity Tr of the short trajectory obtained in S0605_MAAnd (3) fusing into a short track correlation matrix TrC:

where λ and μ are adjustable parameters;

s0607, solving classification

Solving and classifying the short track correlation matrix TrC obtained by S0606 by using a maximum correlation principle, wherein each obtained class represents a short track of a pedestrian with the same identity, and a graph G (V, E, W) is set, wherein a vertex V represents each short track, E represents an edge connected with the short tracks, W represents the correlation between the two short tracks, and the maximum correlation principle is as follows:

s0608, merging growth trajectories

Merging the short tracks of each type obtained by solving and classifying in the S0607, namely merging the short tracks of pedestrians with the same identity into a new track; as the sliding time window progresses, a complete long trajectory for all pedestrians is finally obtained.

Experiment 1, the pedestrian tracking method based on the twin neural network established in the embodiment 1 is used for online tracking:

step 1, video input

Selecting a video data set file of an object to be tracked and inputting the video data set file into a computer, wherein the video data set file comprises each frame of video image of a video sequence and a grountruth file, and the grountruth file comprises a pedestrian track marked artificially;

step 2, pedestrian marking:

for each frame of video image input in step 1, detecting and marking the position of a pedestrian on each frame of video image by using a DPM pedestrian detector, as shown in fig. 2 and 3, obtaining a video sequence with a pedestrian position mark in a PETS2009-S2L1 data set and a TownCenter data set;

step 3, acquiring pedestrian space-time groups:

splitting the video sequence with the pedestrian position marks in the step 2 into n sections every 1 second, wherein each section is a video image i; for each video image i, a plurality of pedestrian space-time groups are obtained by using a hierarchical clustering method according to the positions of pedestrians

step 4, acquiring short tracks of pedestrians

Step 4-1, for each spatio-temporal group in each video image i obtained in step 3

The pedestrians in each group were combined two by two, each group was two pictures with size of 128 x 64 x 3 containing pedestrians, and the two pictures with size of 128 x 64 x 3 in each group were stacked into 128 x 64 x 6 and inputted into the example 1 for trainingTo obtain each spatio-temporal group

The appearance similarity P of each pair of pedestrians in the interior_AS；

Step 4-2, calculating the pedestrian movement speed:

for the pedestrian position p in the t-th frame, finding several detections p closest to p by calculating the distance in the adjacent frames_k(k is the index of the frame) and the pedestrian movement velocity v in each space-time group is calculated according to the following formula_pedestrian：

Step 4-3, calculating the movement affinity

Calculating each spatiotemporal group according to the following formula by using the pedestrian movement speed calculation obtained in the step 4-2

Within each pedestrian pair motion affinity P_MA：

P_MA＝max(1-β*e(P₁,P₂),0)

e(P₁,P₂)＝min(e_forward(P₁,P₂),e_backward(P₁,P₂))

Step 4-4, establishing a pedestrian correlation matrix

Using sigmoid function to combine each spatiotemporal group obtained in step 4-1

Appearance similarity P of pedestrians in each team_ASAnd each spatio-temporal group obtained in step 4-3

step 4-5, solving the classification

Using the maximum correlation principle for each spatio-temporal group obtained in step 4-4

step 4-6, obtaining short tracks of pedestrians

Merging the pedestrians in each class, and obtaining short tracks of the pedestrians in each video image i;

step 5, acquiring long tracks of pedestrians

Step 5-1, selecting key pedestrians

For all short tracks in a sliding time window, selecting key pedestrians in the short tracks in each image video i by using the appearance similarity of the pedestrians, and using the pedestrians to represent one short track;

step 5-2, combining every two key pedestrians in short tracks

In a sliding time window, combining every two key pedestrians with each short track in a certain section of video image i selected in the step 5-1 and every two key pedestrians with each short track in other sections of video images j, wherein each combination is two pictures with the size of 128 × 64 × 3 and containing the key pedestrians;

step 5-3, calculating the similarity of the short tracks

Inputting two pictures with the size of 128 × 64 × 3 containing key pedestrians into the twin neural network trained in the embodiment 1, wherein the output result of the trained twin neural network is the appearance similarity of two key pedestrians with short tracks

Then, the similarity Tr of the two short tracks is further calculated according to the following formula_s：

Step 5-4, calculating the motion speed v of each short track_trackl_et

step 5-5, calculating the motion affinity Tr between the short tracks_MA

Calculating the motion affinity Tr between the short tracks by using the motion velocity of the short tracks obtained in step 5-4_MA

Tr_MA＝max(1-β*e(Tr₁,Tr₂),0)

e(Tr₁,Tr₂)＝e_forward(Tr₁,Tr₂)+e_backward(Tr₁,Tr₂)

Step 5-6, establishing a short track correlation matrix

Similarity Tr of short tracks obtained in step 5-3 by using sigmoid function_sAnd the kinematic affinity Tr of the short trajectory obtained in step 5-5_MAAnd (3) fusing into a short track correlation matrix TrC:

step 5-7, solving and classifying

Solving and classifying the short track correlation matrix TrC obtained in the step 5-6 by using a maximum correlation principle, wherein each obtained class respectively represents the short tracks with the same identity, the maximum correlation principle is as follows, and a graph G ═ V, E, W is set, wherein a vertex V represents each short track, E represents an edge connected with the short tracks, and W represents the correlation between the two short tracks:

step 5-8, combining the short tracks in each class obtained by solving and classifying in the step 5-7, namely combining the short tracks of pedestrians with the same identity into a new track; as shown in fig. 6 and 7, a complete long trajectory of all pedestrians is finally obtained as the sliding time window goes by.

Comparing the complete long track of the pedestrian obtained in the experiment 1 with the pedestrian track marked artificially in the grountruth file to obtain the final experimental error and experimental effect, as shown in fig. 4 and 5, it is shown that the algorithm can completely accurately and quickly mark the motion track of the pedestrian and highly coincide with the actual track, wherein the pedestrian tracking accuracy of the PETS2009-S2L1 data set reaches 93.23%, and the pedestrian tracking accuracy of the Town Center data set reaches 78.85%; similarly, the invention uses the steps of experiment 1 to verify batch data, and the result is also valid; thus, the procedure of experiment 1 also verifies that the same procedure is also applicable to other on-line traces.

Finally, it is also noted that the above-mentioned lists merely illustrate a few specific embodiments of the invention. It is obvious that the invention is not limited to the above embodiments, but that many variations are possible. All modifications which can be derived or suggested by a person skilled in the art from the disclosure of the present invention are to be considered within the scope of the invention.

Claims

1. A pedestrian tracking method based on a twin neural network is characterized by comprising the following steps:

step 1, video input:

acquiring a video file outdoors or indoors by monitoring equipment, and inputting the acquired video file into a computer, wherein the acquired video file comprises each frame of video image of a video sequence;

step 2, pedestrian marking:

step 3, acquiring pedestrian space-time groups:

splitting the video sequence with the pedestrian position marks in the step 2 into n sections every 1 second, wherein each section is a video image i; for each video image i, obtaining pedestrian space-time group by using a hierarchical clustering method according to pedestrian positions

Wherein n is a positive integer, i is not more than n, jRepresenting the spatio-temporal group number in the ith video image;

step 5, acquiring a short track of the pedestrian;

and 6, acquiring a long track of the pedestrian.

2. The twin neural network-based pedestrian tracking method according to claim 1, further characterized in that the step 5 comprises the steps of:

The appearance similarity P of each pair of pedestrians in the interior_AS；

Step 5-3, calculating the pedestrian movement speed

Inner pedestrian velocity v_pedestrian：

Step 5-4, calculating the movement affinity

Within each pedestrian pair motion affinity P_MA：

P_MA＝max(1-β*e(P₁,P₂),0)，

e(P₁,P₂)＝min(e_forward(P₁,P₂),e_backward(P₁,P₂))，

Wherein P is₁And P₂For two pedestrians, β is the parameter to be regulated, e_Xforward,e_Yforward,e_XbackwardAnd e_YbackwardIs the forward error and the reverse error of the pedestrian movement in the X direction and the Y direction, e_forwardAnd e_backwardIs the forward error and reverse error of the pedestrian's motion, P_1x,P_1y,P_2xAnd P_2yIs the X and Y coordinates of two pedestrian positions, t₁And t₂Is the frame index of two pedestrians, v obtained in step 5-3_pedestrianDecomposed into X-and Y-direction movement velocities, v_1x,v_1y,v_2xAnd v_2yThe moving speeds of the two pedestrians in the X direction and the Y direction respectively;

step 5-5, establishing a pedestrian correlation matrix

wherein λ and μ are parameters for adjustment;

step 5-6, solving classification

step 5-7, obtaining short tracks of pedestrians

And merging the pedestrians in each class obtained in the step 5-6, so as to obtain the short track of the pedestrian in each video image i.

3. The twin neural network-based pedestrian tracking method according to claim 2, further characterized in that the step 6 comprises the steps of:

6-1, selecting key pedestrians

step 6-2, combining every two key pedestrians in short tracks

step 6-3, calculating the similarity of the short tracks

Then, the operation is carried out according to the following formula, and the similarity of the two short tracks is output:

wherein the content of the first and second substances,

K of_ASSet, N represents the number of elements in N;

step 6-4, calculating eachSpeed of motion v of short track of strip_tracklet

step 6-5, calculating the motion affinity Tr between the short tracks_MA

Tr_MA＝max(1-β*e(Tr₁,Tr₂),0)

e(Tr₁,Tr₂)＝e_forward(Tr₁,Tr₂)+e_backward(Tr₁,Tr₂)

and

is the end point coordinate of a short track,

and

is the starting point coordinate of another short track, t₁And t₂V is obtained from the frame index of the end point coordinate of one short track and the frame index of the start point coordinate of the other short track, S0604_trackletDecomposed into X-and Y-direction movement velocities, v_1x,v_1y,v_2xAnd v_2yThe moving speeds of the two short tracks in the X direction and the Y direction are respectively;

step 6-6, establishing a short track correlation matrix

where λ and μ are parameters for regulation;

step 6-7, solving classification

step 6-8, merging the growth tracks

4. A twin neural network based pedestrian tracking method according to claim 3, further characterized by: the step 4 of establishing and training the twin neural network comprises the following steps:

step 4-1, creating a twin neural network

step 4-2, training twin neural network