CN111539979B

CN111539979B - Human body front tracking method based on deep reinforcement learning

Info

Publication number: CN111539979B
Application number: CN202010341730.3A
Authority: CN
Inventors: 张雅帆; 张堃博; 孙哲南; 胡清华
Original assignee: Tianjin Zhongke Intelligent Identification Industry Technology Research Institute Co ltd; Tianjin University
Current assignee: Tianjin Zhongke Intelligent Identification Co ltd; Tianjin University
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2022-12-27
Anticipated expiration: 2040-04-27
Also published as: CN111539979A

Abstract

The invention discloses a human body front tracking method based on deep reinforcement learning, which comprises the following steps: s1: building a plurality of non Engine 4 virtual environments for training and testing; s2: constructing a convolutional neural network and an Actor-Critic network; s3: the input of the convolutional neural network is the observation visual angle of a tracker, and a network model is trained until the model converges; s4: testing a tracking effect in a UE4 virtual test scene; s5: and migrating the model meeting the requirements after the test to a real scene. Different from the traditional tracking work that two functional modules of human body detection and camera control must be respectively realized, the invention integrates the two modules by using an end-to-end active tracking method, does not need human body detection, inputs the video stream of the visual angle of a tracker, directly outputs the most effective action for tracking, and saves the complex flow of the traditional human body tracking.

Description

Human body front tracking method based on deep reinforcement learning

Technical Field

The invention relates to the technical field of computer vision and machine learning, in particular to the field of human body tracking, which can be used for a series of intelligent service robots, human face or iris acquisition systems without artificial cooperation and the like, and particularly relates to a human body front tracking method based on deep reinforcement learning.

Background

Human body tracking is a process of accurately detecting and tracking the position of a human body from a complex environment by using a continuous video sequence as an input. In the fields of market monitoring, traffic control and the like in real life, a camera is generally in a static state, namely a tracking background does not change within a certain time period, which is called static human body tracking. In recent years, social development has made new demands for human body tracking, and when a camera is mounted on a mobile robot and the position of the camera changes, the background of an image captured by the camera changes, which is called dynamic human body tracking. The latter is a main overcoming difficulty in the current human body tracking field. The dynamic human body tracking technology has significance in scientific research and practical value in a plurality of social fields.

The explosion of computer technology will likely replace more and more of the less intelligent, less human intervention-free service industry with intelligent robots. Like market shopping guide: the mobile intelligent robot captures a real-time video sequence, firstly obtains the accurate position of a customer through human body detection, then carries out human body tracking, considers the comfort of human-to-human communication, generally needs the robot to move to the front of a person to be served for face-to-face communication, can improve the service quality and increase the happiness of the person to be served. Meanwhile, other applications such as a nursing robot, an educational service robot, a home service robot, and the like are also being widely developed.

With the continuous development of the computing power of hardware such as a GPU, the deep learning-based method gradually shows incomparable advantages. If the deep reinforcement learning can be applied to human body tracking, the real-time performance and the high efficiency of the human body tracking can be further improved.

Disclosure of Invention

The invention aims to provide a human body front tracking method based on deep reinforcement learning, which directly outputs actions required by a tracker without human body detection by taking a video sequence as input so as to realize end-to-end active front human body tracking. The invention utilizes the low-cost virtual environment without labels for training, and abundant data effectively inhibits the overfitting problem which is easy to occur in the process of training the convolutional neural network, thereby obtaining better generalization capability and coping with the application in an uncontrollable scene. .

In order to achieve the purpose of the invention, the invention provides a human body front tracking method based on deep reinforcement learning, which comprises the following specific steps:

s1, establishing a UE4 virtual training and testing environment, including abundant illumination changes, changes of backgrounds and human body surface textures;

s2, constructing a convolutional neural network and connecting an Actor-Critic network;

s3, inputting the visual angle of the tracker into the network constructed in the step S2 in a video stream mode until the successful tracking time reaches more than 300 seconds, namely, the model converges;

the reinforcement learning algorithm realizes positive tracking through automatic learning to maximize the final reward value, and the following formula is a set reward and punishment function:

r is a reward and punishment value given by each execution action of the tracker by the model, A is an artificially set reward and punishment value upper limit, deltax and Delay represent offsets between a target and the tracker on an x axis and a y axis, d is an ideal distance expected to be maintained between the target and the tracker, omega and theta are angles at which the tracker and the target need to rotate in a positive opposite direction respectively, and c, lambda and beta are normalized parameters;

s4, testing the trained model according to the method of the S3 by using a UE4 virtual test environment, and outputting successful tracking time;

and S5, in order to verify the performance of the model in the real world, migrating the model meeting the requirements after testing in the virtual environment into a real scene, and evaluating the tracking effect by matching with artificial observation according to the output successful tracking time.

The data input into the neural network constructed in the step S2 is a video stream of a continuous tracker view, so that the neural network is connected with an LSTM structure behind a multilayer convolutional neural network, and a subsequent Actor-Critic network module directly outputs an action to be taken by a tracker to maintain the action on the front of a target human body.

The purpose that the tracker always tracks the front of the human body target is achieved by simultaneously reducing the difference value between the actual distance and the expected distance between the human body target and the tracker and the two rotation angles.

Compared with the prior art, the method has the beneficial effects that the method is beneficial to improving the high efficiency, convenience and comfort of life, and the beneficial effects are embodied in the following aspects:

1. the invention uses the deep reinforcement learning in the human body front tracking for the first time, and can automatically learn the action which a tracker needs to take to maintain the front tracking without manual participation.

2. The invention uses an end-to-end tracking method, thereby omitting the complex flow of the traditional human body tracking, needing no two modules of processing target detection and camera control respectively, and being capable of directly obtaining the action required to be taken by a tracker from an input video sequence.

3. Different from the prior art that a great amount of labeled data are needed for training the convolutional neural network, the method effectively inhibits the overfitting problem which is easy to occur in the training of the convolutional neural network by using the UE4 virtual environment training model and abundant data.

The invention can positively track the human body target, can be applied to the fields of mobile intelligent robots or biological characteristic collection and identification and the like, and effectively improves the comfort of communication and the happiness of service by face-to-face tracking.

Drawings

FIG. 1 is a flow chart of an end-to-end active human body front tracking method based on deep reinforcement learning according to the present invention;

FIG. 2 is an example of a virtual training environment used in the present invention, where there are texture changes of illumination, characters and background during actual training, and the lower right window is the view angle of the tracker, and the tracker faces the target at this time, and this state is a successful tracking;

FIG. 3 is a graphical illustration of the calculation of the angle at which the tracker and target need to rotate, respectively, to the opposite side in the present invention;

fig. 4 is a flow chart for front tracking of human targets in the real world using the proposed method of the invention.

Detailed Description

It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is described in further detail below with reference to the figures and specific examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.

In practical applications, human body tracking technology has many challenges, especially in the field of mobile human body tracking where the image background can change in real time. If the mobile robot needs to take account of the image processing module and the mobile control module, the image processing module needs to process illumination and distance change, and the mobile control module needs to pay attention to the real-time performance of sending and executing the mobile instruction.

The traditional moving human body tracking method generally separates an image processing module and a camera control module for processing, and the depth reinforcement learning method designed by the invention uniformly considers the image processing module and the camera control module, so that a complicated middle step is omitted by using an end-to-end learning method. Meanwhile, the virtual training and testing environment is set up by using UE4 software, and the overfitting problem is avoided due to abundant data volume.

As shown in fig. 1, the human body front tracking method based on deep reinforcement learning provided by the invention comprises the following steps:

s1, establishing a UE4 virtual training and testing environment which comprises abundant illumination changes and changes of backgrounds and human body surface textures, so that overfitting can be effectively inhibited; as shown in fig. 2.

The step S1 specifically comprises the following steps: in view of the problem of difficult data acquisition and labeling in a real scene, a virtual environment similar to the real scene and built by UE4 software is used for training and testing the model. In order to effectively inhibit the overfitting phenomenon during training, the invention adds abundant illumination change in the virtual environment, randomly replaces different human body targets and various background textures, and on the other hand, the virtual environment provides abundant low-cost training data.

the step S2 specifically comprises the following steps: because the data input into the neural network is a continuous video stream, the neural network is connected with an LSTM structure suitable for learning serialized data after a plurality of layers of convolutional neural networks, and a subsequent Actor-Critic network module takes the characteristics extracted by the convolutional neural networks as input and directly outputs the action which a tracker should take to realize that the data are always maintained on the front of a target human body.

S3, inputting the visual angle of the tracker into the network constructed in the S2 in a video stream mode until successful tracking time reaches more than 300 seconds, namely, model convergence;

the step S3 specifically includes: the reinforcement learning algorithm realizes positive tracking by maximizing a final reward value, and the following formula is a reward and punishment function set by the invention:

the method includes the steps that r is a reward and penalty value given by a model to each execution action of a tracker, A is an artificially set reward and penalty value upper limit, Δ x and Δ y represent offsets between a target and the tracker on an x axis and a y axis, d is an ideal distance expected to be maintained between the target and the tracker, ω and θ are angles at which the tracker and the target need to rotate opposite to each other, respectively, see fig. 3, arrows a and B in fig. 3 represent orientations of the tracker and the target, respectively, and c, λ and β are normalized parameters. The invention can obtain the real-time position coordinates and Euler angles of the target and the tracker in the virtual environment by calling the method function provided by the UE4 software and interacting with the virtual environment, and then respectively calling arctan (delta x, delta y) and arctan (delta y, delta x) to calculate omega and theta. The purpose that the tracker always tracks the front of the human body target is achieved by simultaneously reducing the difference value between the actual distance and the expected distance between the human body target and the tracker and two rotation angles. The criterion of whether the model converges is whether the human target and the tracker are in a face-to-face state and maintain a proper distance for 300 seconds.

S4, testing the model trained according to the method in the step S3 by using a UE4 virtual test environment, and outputting successful tracking time;

the step S4 specifically comprises the following steps: in the stage, the test environment which is completely different from the training environment is used for testing the generalization capability of the converged model, and by means of an API document disclosed by UE4 software, a method function is called to determine whether the human body target and the tracker are in a face-to-face state or not and maintain a proper ideal distance, and the total time for successful tracking is output.

And S5, in order to verify the performance of the model in the real world, migrating the model meeting the requirements after being tested in the virtual environment into a real scene, and evaluating the tracking effect according to the output successful tracking time and artificial observation.

Specifically, in a real scene, a camera is mounted on a four-wheel-drive autonomous mobile platform, and a video stream acquired by the camera is used as the input of a trained network model. The flow chart at this stage is shown in figure 4.

Examples of applications of the invention are listed below:

application example 1: the human body front tracking method based on deep reinforcement learning is applied to a mobile intelligent service robot.

The invention can be applied to mobile intelligent service robot technology, such as a family service robot, a teaching service robot, an accompanying robot and the like. By using the technology, the intelligent robot always moves face to face along with the target, provides service or responds to other instructions at any time, reduces the time of manpower input, improves the service quality, and increases the interest of service and the happiness of the serviced personnel. Different from the traditional moving human body tracking work that image processing and camera control are divided into two modules for processing, the end-to-end active human body front tracking method based on deep reinforcement learning uniformly considers the two modules, fully utilizes the advantages of the deep learning in the aspect of image processing and the excellent performance of the reinforcement learning in the aspect of processing complex, multi-aspect and serialized data, and solves the human body front tracking problem by using the end-to-end method.

Application example 2: the human body front tracking method based on deep reinforcement learning is applied to human face or iris acquisition and recognition equipment without human cooperation.

The invention can be applied to biological characteristic acquisition equipment without manual cooperation. Face or iris recognition is a biometric technology for identity recognition based on biometric features of a person. And acquiring an image or video stream containing the face or the iris by using a camera, and automatically carrying out subsequent identity recognition. Biometric identification requires acquisition of high-quality biometric information, requires natural light of images without overexposure, and requires no occlusion of biometric features and resolution. The traditional biological characteristic acquisition equipment needs manual matching equipment to stand in a limited area, and the tracking method used by the invention can ensure that the moving robot automatically moves to the front of the human body and maintains a proper distance without manual matching, thereby being quite convenient for biological characteristic acquisition and identity identification.

The technical means not described in detail in the present application are known techniques.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should be regarded as the protection scope of the present invention.

Claims

1. A human body front tracking method based on deep reinforcement learning is characterized by comprising the following specific steps:

s3, inputting the visual angle of the tracker into the network constructed in the S2 in a video stream mode until the successful tracking time reaches more than 300 seconds, namely, the model is converged;

the reinforcement learning algorithm realizes positive tracking by automatically learning to maximize the final reward value, and the following formula is a set reward and punishment function:

r is a reward and penalty value given by the model to each execution action of the tracker, A is an artificially set reward and penalty value upper limit, deltax and Deltay represent offsets between a target and the tracker on an x axis and a y axis, d is an ideal distance expected to be maintained between the target and the tracker, omega and theta are angles which are required to be rotated by the tracker and the target to face the opposite side respectively, and c, lambda and beta are normalized parameters; calling a method function provided by UE4 software and interacting with a virtual environment to obtain real-time position coordinates and Euler angles of a target and a tracker in the virtual environment, calling arctan (delta x, delta y) and arctan (delta y, delta x) respectively at the moment to calculate omega and theta, and simultaneously reducing the difference value between the actual distance and the expected distance of the human target and the tracker and two rotation angles to achieve the purpose that the tracker always tracks the front of the human target;

s5, in order to verify the performance of the model in the real world, the model meeting the requirements after being tested in the virtual environment is transferred to a real scene, and the tracking effect is evaluated by matching with artificial observation according to the output successful tracking time;

because the data input into the neural network constructed in the step S2 is a continuous video stream of the view angle of the tracker, the neural network is connected with an LSTM structure after a plurality of layers of convolutional neural networks, and a subsequent Actor-Critic network module takes the characteristics extracted by the convolutional neural networks as input and directly outputs the action which the tracker should take to realize that the data is always maintained on the front of the target human body.