CN112581496A

CN112581496A - Multi-target pedestrian trajectory tracking method based on reinforcement learning

Info

Publication number: CN112581496A
Application number: CN201910934151.7A
Authority: CN
Inventors: 卿粼波; 许盛宇; 何小海; 苏婕; 吴晓红; 牛通
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2021-03-30

Abstract

The invention provides a multi-target pedestrian trajectory tracking method based on reinforcement learning, and mainly relates to tracking multi-target pedestrian trajectories in a complex scene video in a deep reinforcement learning mode. The method comprises the following steps: and distributing a single-target tracker obtained by deep reinforcement learning training for each tracked target to obtain the track of each target, detecting the position of the target in the current frame by using a high-precision target detector, and performing data association on the detection result and the tracking track by using a Hungary algorithm according to the current apparent information and position information of the target, thereby realizing continuous multi-target pedestrian track tracking in the video sequence. The invention integrates the advantages of deep learning and reinforcement learning, and better tracks the target position. In addition, data association is carried out by using a cost matrix integrating the appearance and the position characteristics, the problems of shielding, missing detection and the like are effectively avoided, and the multi-target track tracking accuracy is improved.

Description

Multi-target pedestrian trajectory tracking method based on reinforcement learning

Technical Field

The invention relates to a multi-target tracking problem in the field of machine learning, in particular to a multi-target pedestrian trajectory tracking method based on reinforcement learning.

Background

With the research of artificial intelligence brought into the strategic level, the state puts forward and strengthens the research and development application of the artificial intelligence of a new generation, expands the intelligent life by developing the intelligent industry, and greatly improves the traditional industry by applying new technology, new industry state and new mode. In recent years, target tracking has received much attention from domestic and foreign scholars as a topic of intensive research in the field of computer vision.

Target tracking refers to the process of continuously deducing the state of a target in a video sequence, and the task is to locate the target in each frame of a video and then associate the target with the frame to form a pedestrian motion track. Target tracking can be divided into single target tracking and multi-target tracking, and compared with single target tracking, the multi-target tracking problem is more complex because not only each target needs to be effectively tracked, but also the problem of mutual interference among different targets needs to be solved. Although the multi-target tracking problem has great challenges, the multi-target tracking problem has huge application requirements in many scenes, and particularly the practical application value and the application prospect of multi-pedestrian tracking are particularly outstanding. The method is widely applied to various fields such as intelligent monitoring, automatic driving, robot visual navigation, man-machine interaction and the like.

The traditional multi-target tracking algorithm comprises a multi-hypothesis multi-target tracking algorithm, a multi-target tracking algorithm based on relevant filtering, an approximate multi-target tracking algorithm based on local flow characteristics and the like, and the problems of multi-target shielding, track drifting and the like in a complex scene cannot be solved mostly by the methods. With the rapid development of deep learning in recent years, the target detection precision is continuously improved, and the development of a multi-target tracking technology based on detection is promoted to a certain extent. But is limited by the accuracy of target position prediction and is difficult to achieve very good results. With the rapid development of the field of machine learning, a deep reinforcement learning algorithm combining deep learning and reinforcement learning obtains a plurality of excellent achievements on the decision problem, and becomes a new research direction of the multi-target tracking problem.

Disclosure of Invention

The invention aims to provide a multi-target pedestrian trajectory tracking method based on reinforcement learning, which converts a tracking task into a Markov decision process for solving, trains a neural network by utilizing a mode of combining deep learning and reinforcement learning, and predicts and tracks the position of a target.

For convenience of explanation, the following concepts are first introduced:

markov Decision Process (MDP): the markov decision process is a mathematical model of sequential decisions for simulating stochastic strategies and returns achievable by an agent in an environment where the system state has markov properties. The MDP is built based on a set of interactive objects, namely agents and environments, with elements including state, actions, policies and rewards. In the simulation of MDP, the agent perceives the current system state and acts on the environment in a strategic manner, thereby changing the state of the environment and receiving rewards, the accumulation of which over time is referred to as rewards.

Reinforcement Learning (RL): also known as refinish learning, evaluation learning or reinforcement learning, is one of the paradigms and methodologies of machine learning, and is used to describe and solve the problem of an agent (agent) in interacting with the environment to achieve maximum return or achieve a specific goal through learning strategies.

The invention specifically adopts the following technical scheme:

a multi-target pedestrian trajectory tracking method based on reinforcement learning is characterized by comprising the following steps:

a. converting the tracking task into a Markov decision process for solving;

b. training a single-target tracking network by using a mode of combining supervised learning and reinforcement learning;

c. fusing target appearance information and position information to perform data association on the multi-target tracking track;

the method mainly comprises the following steps:

(1) training a single-target tracking network by using a mode of combining deep learning and reinforcement learning;

(2) distributing a single target tracker obtained in the step (1) to each tracked target in the current video frame, and tracking the positions of a plurality of targets at the same time;

(3) detecting a target in a current video frame through a high-precision target detector, and extracting apparent characteristics and position information;

(4) generating a cost matrix through the apparent similarity and the position similarity between the tracking track set and the detection result set, and performing multi-target data association by using a Hungarian algorithm to obtain a tracking result of the current frame;

(5) and (5) taking the tracking result of the current frame as the input of the next tracking, and repeating the steps (2) - (4) to realize the tracking of the multi-target pedestrian track in the whole video sequence.

The invention has the beneficial effects that:

(1) the advantages of the exciting strategy of reinforcement learning are fully developed, the machine automatically learns the optimal decision, and the target tracking effect is improved.

(2) And the video sequence is detected frame by frame in the tracking process, so that the problems of shielding, track drifting and the like are effectively avoided.

(3) And the apparent characteristics and the position information of the target are fused for data association, so that the conditions of false detection and missing detection and track loss caused by target shielding are prevented.

(4) The supervised learning and the reinforcement learning are combined, the problem of low accuracy of the traditional method is solved, and the research value is improved.

Drawings

FIG. 1 is a diagram of a single target tracker network architecture based on reinforcement learning.

Detailed Description

The present invention is further described in detail with reference to the drawings and examples, it should be noted that the following examples are only for illustrating the present invention and should not be construed as limiting the scope of the present invention, and those skilled in the art should be able to make certain insubstantial modifications and adaptations to the present invention based on the above disclosure and should still fall within the scope of the present invention.

The group emotion recognition method based on the motion characteristics specifically comprises the following steps:

(1) pre-training a single-target tracking network in a supervised learning mode to enable the network to have the capability of selecting correct actions, and further optimizing network parameters by using a reinforcement learning strategy gradient algorithm to enable the network parameters to predict and track the target position;

(2) allocating a single target tracker obtained in the step (1) to each tracked target in the current video frame, and tracking the positions of a plurality of targets to obtain the apparent characteristics and position information of the targets in the current frame;

(3) detecting a target in a current video frame through a high-precision target detector, and extracting apparent characteristics and position information of the target in a detection result boundary frame;

(4) generating a cost matrix through the apparent similarity and the position similarity between the tracking track set and the detection result set, performing multi-target data association by using a Hungarian algorithm, matching the tracking tracks with the detection sets one by one, and finally obtaining the tracking result of the current frame;

Claims

1. A multi-target pedestrian trajectory tracking method based on reinforcement learning is characterized by comprising the following steps:

a. converting the tracking task into a Markov decision process for solving;

the method mainly comprises the following steps:

2. The reinforcement learning-based multi-target pedestrian trajectory tracking method according to claim 1, characterized in that in step (1), each single-target tracker is used as an independent agent to construct a Markov decision process, and a tracking target state and action-taking mapping are learned through a reward and punishment mechanism, and a tracking strategy is optimized in combination with time sequence information.

3. The reinforcement learning-based multi-target pedestrian trajectory tracking method according to claim 1, characterized in that in the step (1), a single-target tracking network is trained in a manner of combining supervised learning and reinforcement learning, network parameters are optimized for multiple times, and accuracy of position prediction and tracking is improved.

4. The multi-target pedestrian trajectory tracking method based on reinforcement learning as claimed in claim 1, wherein the apparent features and position information of the tracking trajectory and the detection result are extracted in step (4), and are fused to form a similarity cost matrix, so that the characteristic difference between different targets is sufficiently mined, the target confusion under a complex scene is reduced, and the tracking algorithm has certain robustness to the problems of target occlusion, false detection and missed detection and the like.