WO2021259055A1

WO2021259055A1 - Human body tracking method and device based on rgb-d image

Info

Publication number: WO2021259055A1
Application number: PCT/CN2021/098724
Authority: WO
Inventors: 冀怀远; 蔡忠强; 王文光; 刘江
Original assignee: 苏宁易购集团股份有限公司
Priority date: 2020-06-22
Filing date: 2021-06-07
Publication date: 2021-12-30
Also published as: CN111709974B; CN111709974A

Abstract

Disclosed in the present invention are a human body tracking method and device based on an RGB-D image, capable of improving the accuracy of pedestrian trajectory tracking. The method comprises: dividing a monitoring area into a target outer area, a registration area, and a target inner area; detecting a human body frame, a human head frame, and an area position of a pedestrian in each depth image, respectively, and binding the human body frame and the human head frame of a same pedestrian in the depth images; performing tracking matching on a predicted position of a 3D center-of-gravity point of each pedestrian trajectory and an actual position of a 3D center-of-gravity point of the corresponding human head frame in each depth image, and updating a trajectory tracking state of each pedestrian trajectory according to the tracking matching result; updating a trajectory area state of each pedestrian trajectory on the basis of an area position of the 3D center-of-gravity point in each depth image; and if the trajectory tracking states corresponding to any pedestrian trajectory in continuous x frames of depth images are a lost state, and the trajectory area state is a registered state or an entering state, using an ReID mode to match and find the lost pedestrian trajectory and correspondingly updating the lost pedestrian trajectory, otherwise, correspondingly updating the pedestrian trajectory according to position coordinates of the 3D center-of-gravity point of the human head frame tracked and matched in each depth image.

Description

Human body tracking method and device based on RGB-D image

Technical field

The invention relates to the technical field of human body tracking, in particular to a method and device for human body tracking based on RGB-D images.

Background technique

Driven by the strong momentum of information technology, all walks of life have undergone tremendous changes, and concepts such as smart cities, smart industries, and smart retail have emerged. The use of vision technology to liberate people from heavy repetitive work has become a trend. Video surveillance is an important field of vision technology applications. In the field of video surveillance, cross-camera multi-target tracking technology (Multi-target Multi-camera tracking (MTMC tracking) is a very important research topic. This technology can be widely used in crime and criminal investigation, warehouse management, unmanned shopping, unmanned driving and other scenarios, and has high practical value.

technical problem

The cross-camera multi-target tracking technology is mainly to solve the problem of continuous positioning and tracking confirmation of pedestrians between different cameras. At present, the more mature cross-camera multi-target tracking technology mostly tracks targets close to parallel perspective in open scenes, but the actual monitoring scenes are mostly limited by environmental factors. For example, in order to obtain a larger indoor shooting angle, the camera is usually installed as a The angle of oblique shooting, so that the problem of pedestrian occlusion will follow, which will cause large differences in the posture of pedestrians from multiple viewing angles. These problems will directly affect the tracking quality of pedestrian trajectories. Solve these problems for cross-camera multi-target tracking technology. It is of great significance to move from academic research to actual production.

Technical solutions

The object of the present invention is to provide a human body tracking method and device based on RGB-D images, which adopts multiple RGB-D depth cameras to shoot the monitored area from the top, and by tracking the 3D center of gravity of the pedestrian’s head, it is possible to avoid the appearance of the larger and larger shape of the human body frame tracking. The problem of easy occlusion improves the accuracy of pedestrian trajectory tracking.

In order to achieve the above objective, the first aspect of the present invention provides a human body tracking method based on RGB-D images, including:

Divide the surveillance area into the target area, the registration area and the target area in sequence according to the route of travel, and use multiple distributed depth cameras to capture real-time depth images from top-down photography;

Detect the human body frame, human head frame and the location of the area of the pedestrian in each depth image, and bind the human body frame and human head frame of the same pedestrian in the depth image;

The predicted position of the 3D center of gravity of each pedestrian trajectory is matched with the actual position of the 3D center of gravity of the corresponding head frame of each depth image, and the trajectory tracking status of each pedestrian trajectory is updated according to the tracking matching result. The updated status includes new status and normal Status, lost status and deleted status;

Update the trajectory area status of each pedestrian trajectory based on the location of the 3D center of gravity in each depth image. The updated status includes the initial state, the entry state, the registration state, and the exit state;

When the tracking state of any pedestrian trajectory in the continuous x-frame depth image is lost, and the state of the trajectory area is registered or entered, the ReID method is used to match and retrieve the lost pedestrian trajectory and update accordingly, otherwise according to the depth The position coordinates of the 3D center of gravity of the head frame matched by the tracking in the image are updated correspondingly to the pedestrian trajectory, where x>0, and x is an integer.

Preferably, detecting the human body frame, the human head frame, and the location of the area of the pedestrian in each depth image respectively, and the method of binding the human body frame and the human head frame of the same pedestrian in the depth image to each other includes:

Polling each depth image corresponding to the current frame, using the RGB-D target detection method to obtain the pedestrian's human body frame, head frame, and the location of the area where the pedestrian is located in each depth image;

Polling the human body frame area and human head frame area appearing in each depth image, and traverse the inclusion degree of each pair of human body frame and human head frame;

Based on the inclusion degree corresponding to each depth image, the bipartite graph maximum matching algorithm is used to screen out the human body frame and head frame belonging to the same pedestrian in each depth image to bind each other.

Preferably, the method of using the bipartite graph maximum matching algorithm to filter out the human body frame and the human head frame belonging to the same pedestrian in each depth image based on the inclusion degree corresponding to each depth image includes:

According to the degree of inclusion corresponding to each depth image, use the bipartite graph maximum matching algorithm to filter out the human body frame and the human head frame in each depth image for initial pairing;

Compare the inclusion degree in the initial pairings corresponding to each depth image with the coincidence degree threshold, and screen out the initial pairs whose inclusion degree is greater than or equal to the coincidence degree threshold for binding confirmation, and screen out the initial pairs whose inclusion degree is less than the coincidence degree threshold for binding confirmation. The binding is released.

Preferably, the method for calculating the predicted position of the 3D center of gravity of the pedestrian trajectory includes:

Transform each depth image into three-dimensional coordinates, and calculate the 3D center of gravity of the human head frame in the depth image;

Multi-dimensional modeling is performed on the spatial position of the 3D center of gravity of the pedestrian trajectory. The dimensional vector of the model includes ( x, y, z, h, V _x , V _y , V _z ), where x, y, z correspond to 3D The three-dimensional coordinates of the center of gravity point, V _x , V _y , and V _z correspondingly represent the movement speed of the 3D center of gravity point in the coordinate direction of the corresponding dimension, and h represents the height of the pedestrian to which the 3D center of gravity point belongs;

Based on the current x- axis coordinates, y- axis coordinates, z- axis coordinates of the 3D center of gravity point of the pedestrian trajectory , and the corresponding movement speed in the x- axis direction V _x , the movement speed in the y- axis direction V _x , the movement speed in the z- axis direction z Calculate the predicted position in the x- axis direction, the predicted position in the y- axis direction, and the predicted position in the z- axis direction of the current pedestrian trajectory 3D center of gravity in the next frame of depth image.

Preferably, the method of tracking and matching the predicted position of the 3D center of gravity of each pedestrian trajectory with the actual position of the 3D center of gravity of the corresponding head frame of each depth image, and updating the trajectory tracking status of each pedestrian trajectory according to the tracking matching result includes:

The Kalman filter tracking algorithm is used to track the actual position of the 3D center of gravity of the corresponding head frame in each depth image of the current frame to obtain the actual position of the 3D center of gravity;

Traverse and calculate the cost metric of the actual position of each 3D center of gravity in each depth image of the current frame and the predicted position of each pedestrian trajectory 3D center of gravity to obtain a cost matrix;

After the traversal calculation of each depth image in the current frame is completed, the bipartite graph maximum matching algorithm is used based on the cost matrix to filter out the initial pairing of each pedestrian trajectory with the actual position of the 3D center of gravity in each depth image in the current frame;

The primaries whose cost metric is less than or equal to the cost threshold are screened out and considered as successful, and the primaries whose cost metric is greater than the cost threshold are screened out as unmatched;

The unpaired primary selection pairing includes the remaining unpaired head frame 3D center of gravity points and the remaining unpaired pedestrian trajectories. For the remaining unpaired head frame 3D center of gravity points in the area outside the target in each depth image of the current frame, create a new Pedestrian trajectory and update the trajectory tracking status to the new state, and at the same time update the trajectory area status of the new pedestrian trajectory to the initial state, and/or, for the remaining unpaired pedestrian trajectories in each depth image of the current frame, change the pedestrian trajectory The trajectory tracking status of is updated to the lost status;

After filtering the pairing threshold for the initial selection of a successful pairing, update the trajectory tracking status of the paired pedestrian trajectory to the normal state, and at the same time update the actual position of the 3D center of gravity of the paired head frame to the 3D center of gravity of the current pedestrian trajectory;

For the pedestrian trajectory whose 3D center of gravity of the paired head frame is in the area outside the target and the trajectory tracking state for consecutive n frames is the lost state, and/or the trajectory area state is the pedestrian trajectory in the off state, and/or the trajectory tracking state is the initial state State and continuous m frames of trajectory tracking state are pedestrian trajectories in the lost state, and the trajectory tracking state of the pedestrian trajectory is updated to the deleted state, where n>0, m>0, and n and m are both integers.

Further, it also includes:

When the trajectory tracking state corresponding to the pedestrian trajectory is the deleted state, delete the pedestrian trajectory and its corresponding bottom library feature data table;

Preferably, it also includes:

The method of updating the trajectory area status of each pedestrian trajectory based on the regional position of the 3D center of gravity in each depth image includes:

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that appears in the area outside the target, and set the state of the trajectory area corresponding to the pedestrian trajectory to the initial state;

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that appears in the registered area, set the status of the trajectory area corresponding to the pedestrian trajectory to the registered state, register and update the base library feature data table in real time;

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that appears in the target area, and set the state of the trajectory area corresponding to the pedestrian trajectory to the entering state;

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that leaves the area inside the target and enter the area outside the target, and set the state of the trajectory area corresponding to the pedestrian trajectory to the away state.

Preferably, the method for judging that the track tracking state of the pedestrian track is the lost state includes:

Identify the 3D center of gravity of the head frame in each depth image of the current frame, and if the pedestrian trajectory cannot match the 3D center of gravity of the head frame in any depth image, the trajectory tracking state of the pedestrian trajectory is considered to be a lost state.

The second aspect of the present invention provides a human body tracking device based on RGB-D images, which is applied to the RGB-D image-based human body tracking method described in the above technical solution, and the device includes:

The partition setting unit is used to divide the monitoring area into the target area, the registration area, and the target area in sequence according to the route of travel, and use multiple distributed depth cameras to capture real-time depth images;

The detection frame binding unit is used to detect the human body frame, human head frame and the location of the area of the pedestrian in each depth image, and bind the human body frame and human head frame of the same pedestrian in the depth image;

Trajectory tracking state detection unit is used to track and match the predicted position of the 3D center of gravity of each pedestrian trajectory with the actual position of the 3D center of gravity of the corresponding head frame of each depth image, and update the trajectory tracking status of each pedestrian trajectory according to the tracking matching result, The updated status includes new status, normal status, lost status and deleted status;

The trajectory area status detection unit updates the trajectory area status of each pedestrian trajectory based on the location of the 3D center of gravity in each depth image. The updated status includes the initial status, the entry status, the registration status, and the exit status;

The trajectory tracking unit, when the corresponding trajectory tracking state of any pedestrian trajectory in the continuous x-frame depth image is the lost state and the trajectory area state is the registered state or the entered state, the ReID method is used to match and retrieve the lost pedestrian trajectory and update it accordingly , Otherwise, update the pedestrian trajectory correspondingly according to the coordinates of the 3D center of gravity of the human head frame matched by the tracking in the depth image, where x>0, and x is an integer.

A third aspect of the present invention provides a computer-readable storage medium on which a computer program is stored. When the computer program is run by a processor, the steps of the human body tracking method based on RGB-D images are executed.

Beneficial effect

Compared with the prior art, the present invention has the following beneficial effects:

In the human body tracking method based on RGB-D images provided by the present invention, the monitoring area is divided into the target outside area, the registration area and the target area in sequence according to the travel route, that is to say, the monitoring area that pedestrians first enter is the target outside area. Then enter the target area through the registered area through the outside area of the target. The pedestrian leaving the surveillance area is the opposite of the above entry route. There are multiple overhead depth cameras distributed on the surveillance area to collect depth images of each area in real time. Detect the human body frame, head frame, and location of the pedestrian in each depth image, and bind the human body frame and head frame of the same pedestrian in the depth image to each other, and then according to the actual 3D center of gravity of the head frame in each depth image The position is matched with the predicted position of the 3D center of gravity of the pedestrian trajectory, and the tracking status of each pedestrian trajectory is updated according to the obtained tracking matching result, and according to the 3D center of gravity of the head frame in each depth image of the current frame. The location of the area updates the status of the trajectory area of the pedestrian trajectory, so that when the tracking and matching of the 3D center of gravity of the head frame is normal, the position information method (the position coordinate of the 3D center of gravity of the head frame) is used to update the pedestrian trajectory. When the 3D center-of-gravity point tracking and matching fails, that is, when the corresponding trajectory tracking status of the pedestrian trajectory in the continuous x-frame depth image is lost, and the trajectory area status is registered or entered, the ReID strategy is used to match the lost pedestrian trajectory Retrieve and update.

It can be seen that, in the present invention, the 3D center of gravity point of the head frame is effectively tracked and matched by the way of taking the depth picture overhead and polling calculation, and solves the problem of tracking failure caused by occlusion in the process of cross-camera pedestrian tracking. In addition, by setting the registration area, it is possible to automatically register the base library feature data table for the pedestrian when the pedestrian enters the registration area, so that when the pedestrian trajectory matching based on the location coordinate fails, it will automatically switch to the ReID strategy using deep learning to match the pedestrian trajectory. Retrieval improves the accuracy and reliability of trajectory tracking results.

Description of the drawings

The drawings described here are used to provide a further understanding of the present invention and constitute a part of the present invention. The exemplary embodiments of the present invention and the description thereof are used to explain the present invention, and do not constitute an improper limitation of the present invention. In the attached picture:

FIG. 1 is a schematic flowchart of a human body tracking method based on RGB-D images in Embodiment 1 of the present invention;

Fig. 2 is a schematic flow chart of the mutual binding processing of the human body frame and the human head frame of the same pedestrian in each depth image of the current frame in Fig. 1.

Embodiments of the present invention

In order to make the foregoing objectives, features, and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

实施例一Example one

Please refer to FIG. 1. This embodiment provides a human body tracking method based on RGB-D images, including:

When the tracking status of any pedestrian trajectory is lost and the status of the trajectory area is registered, the lost pedestrian trajectory is retrieved and updated according to the ReID method. Otherwise, the 3D center of gravity of the matched head frame is tracked in the depth image. The point position coordinates update the pedestrian trajectory correspondingly.

In the human body tracking method based on RGB-D images provided in this embodiment, the monitoring area is divided into the target outside area, the registration area, and the target area in sequence according to the travel route, that is to say, the monitored area that pedestrians first enter is the target outside area. , And then enter the target inner area through the registered area through the outside area of the target. The route for pedestrians leaving the surveillance area is opposite to the above entry route. There are multiple overhead depth cameras distributed on the surveillance area to collect depth images of each area in real time. Polling detects the human body frame, head frame and the location of the area of the pedestrian in each depth image, and binds the human body frame and head frame of the same pedestrian in the depth image, and then according to the 3D center of gravity of the head frame in each depth image The actual position and the predicted position of the 3D center of gravity of the pedestrian trajectory are tracked and matched, and the tracking status of each pedestrian trajectory is updated according to the obtained tracking matching results, and the 3D center of gravity of the head frame in each depth image of the current frame is located Update the status of the trajectory area of the pedestrian trajectory, so that when the tracking and matching of the 3D center of gravity of the head frame is normal, the position information method (the position coordinate of the 3D center of gravity of the head frame) is used to update the pedestrian trajectory, and the pedestrian trajectory is updated in the head frame. When the 3D center-of-gravity point tracking matching fails, that is, when the corresponding trajectory tracking state of the pedestrian trajectory in the continuous x-frame depth image is the lost state and the trajectory area state is the registered state or the entered state, the ReID strategy is used to perform the lost pedestrian trajectory The match is retrieved and updated.

It can be seen that, in this embodiment, the 3D center of gravity point of the head frame is effectively tracked and matched by the way of taking the depth picture overhead and polling calculation, which solves the problem of tracking failure caused by occlusion in the process of cross-camera pedestrian tracking. In addition, by setting the registration area, it is possible to automatically register the base library feature data table for the pedestrian when the pedestrian enters the registration area, so that when the pedestrian trajectory matching based on the location coordinate fails, it will automatically switch to the ReID strategy using deep learning to match the pedestrian trajectory. Retrieval improves the accuracy and reliability of trajectory tracking results.

In specific implementation, the above-mentioned target area, registration area, and target area are divided by setting a 3D coordinate range boundary line on the monitoring area, and the registration area is a part of the target area. At least one depth camera is set above the area outside the target, the registration area and the area inside the target. It is used to collect the depth image of the captured area in real time. Through the setting of multiple depth cameras, it can collect the depth from multiple angles in real time. image.

Referring to FIG. 2, in the above embodiment, the human body frame, human head frame, and the location of the area of the pedestrian in each depth image are respectively detected, and the method of binding the human body frame and human head frame of the same pedestrian in the depth image includes:

It is understandable that, assuming that there are a total of k depth cameras, each depth image of the current frame corresponds to a total of k depth images collected in the current frame, and the area of the human body frame and the area of the human head frame appearing in each depth image are sequentially calculated. , And calculate the inclusion degree of the human body frame and the human head frame in each depth image. The two pairs here include all the paired combinations of the human body frame and the human head frame in the depth image, and then for each The inclusion degree corresponding to the depth image uses the bipartite graph maximum matching algorithm to filter out the human body frame and the head frame belonging to the same pedestrian in each depth image to bind each other. The method for calculating the inclusion degree is: dividing the overlapping area of a pair of human body frame and human head frame by the area of the human head frame therein to obtain the inclusion degree of the pair of human body frame and human head frame.

Specifically, the method of using the bipartite graph maximum matching algorithm to filter out the human body frame and the human head frame belonging to the same pedestrian in each depth image based on the inclusion degree corresponding to each depth image includes:

In specific implementation, the RGB-D target detection technology is used to poll and detect the human body frame, head frame, and location of the pedestrian in the depth image corresponding to each depth camera, and polling to calculate the binding of the human frame and head frame in each depth image result. According to prior knowledge, the human head is a part of the human body. Therefore, the human body frame and the human head frame of each instance of the target detection output will have a greater degree of overlap, but the prior art cannot directly output the binding of the human head frame and the human body frame. Therefore, the matching problem of the human body frame and the human head frame in the same depth image can be modeled as an assignment problem for binding. This embodiment uses the bipartite graph maximum matching algorithm (KM algorithm) as the modeling solution method for the assignment problem, and A new cost measurement method-inclusion degree is designed to solve the problem of calculation of assignment cost. The calculation of the cost metric can be expressed by the following formula:

D _inclusion=S _h D _inclusion =S _h _∩∩ _bb /S _h /S _h

Wherein S _h _∩ _b represent detection frames and frame head detection area of overlap, S _h is the area of the detection head frame, D _inclusion contains values represents.

Example: Suppose there are two human body frames {B _body1 , B _body2 } and two human head frames { B _head1 , B _head2 } in a certain depth image collected by a certain depth camera, there are a total of 4 pairs of binding combinations, traverse calculation For the inclusion degree of each pair of human body frame and human head frame, the inclusion degree set is { D _body1head1 , D _body1head2 , D _body2head1 , D _body2head2 }. The optimization goal of the KM algorithm is to match as many human body frames and human head frames as possible. At the same time, the sum of the inclusion degrees of the matching results should be as large as possible. If the hypothetical set of inclusion degrees { D _body1head1 , D _body1head2 , D _body2head1 , D _body2head2 } corresponds to {0.4, 0.5, 0.9, 0.1}, the result assigned by the KM algorithm is that B _body1 and B _head2 are the body frame sum of the same pedestrian Human head frame, B _body2 and B _head1 are the human body frame and head frame of another pedestrian, the total cost value is 1.4.

In the specific implementation process, because the detection may misdetect the human head frame or the human frame, the coincidence degree threshold is set to filter the assignment result, which can be expressed by the following formula:

；

In the above formula, D _{matchedBodyN_HeadM} represents the inclusion degree of the human body frame B _bodyN and the human head frame B _HeadM that are paired after being assigned by the KM algorithm , Filter_Thresh is the inclusion threshold, and the matching result M ( D _{matchedBodyN_HeadM} ) below the threshold is judged to be 0, and the correspondence is cancelled For the pairing relationship between the human body frame and the human head frame, the matching result M ( D _{matchedBodyN_HeadM} ) that is higher than the threshold is judged to be 1, and the pairing relationship is maintained as a legal output.

Through the above-mentioned binding strategy, the human head frame and the human body frame of the same pedestrian instance in each depth image of the current frame can be bound to each other, providing reliable preprocessing input for subsequent tracking and matching.

In the foregoing embodiment, the method for calculating the predicted position of the 3D center of gravity of the pedestrian trajectory includes:

Multi-dimensional modeling is performed on the spatial position of the 3D center of gravity of the pedestrian trajectory. The dimensional vector of the model includes ( x, y, z, h, V _x , V _y , V _z ), where x, y, z correspond to 3D The three-dimensional coordinates of the center of gravity, V _x , V _y , and V _z correspond to the movement speed of the 3D center of gravity in the coordinate direction of the corresponding dimension, and h represents the height of the pedestrian to which the 3D center of gravity belongs.

During specific implementation, this embodiment uses a single-camera polling method to update the multi-target tracking trajectory status, which can simply and effectively solve the problem of human body deduplication in overlapping regions in cross-camera tracking. In addition, since the depth image of the single camera is a two-dimensional image, the internal and external parameters of the pre-calibrated depth camera are used to obtain the coordinate system conversion formula, and the points in the two-dimensional image coordinates RGB-D are converted into three-dimensional coordinate points. The coordinates of the 3D center of gravity can be obtained by projecting the average center of gravity in the depth map of the head frame to a three-dimensional coordinate system.

In this embodiment, a 3D Kalman filter is used to model the pedestrian trajectory of the human head point movement in space, and a 6-dimensional space position state vector ( x, y, z, h, V _x , V _y , V _z ) is used to Describe the pedestrian trajectory, x, y, z represent the three dimensions of the space coordinates of the 3D center of gravity of the pedestrian's head, h represents the height of the pedestrian, and V _x , V _y , and V _z represent the speed of the pedestrian in the corresponding dimension. The predicted position of the pedestrian in the current frame can be obtained through the 3D Kalman filter using the following formula:

x _estimate = x+V _x *t,y _estimate =y+V _y *t,z _estimate = z+V _z *t

In the above formula, the variable with the estimate subscript represents the predicted output of the pedestrian position of the 3D Kalman filter in the current frame, and x, y, z and V _x , V _y , and V _z are the state parameters of the 3D Kalman filter. t represents the time used for two adjacent frames.

In the above implementation, the method of tracking and matching the predicted position of the 3D center of gravity of each pedestrian trajectory with the actual position of the 3D center of gravity of the corresponding head frame of each depth image, and updating the trajectory tracking status of each pedestrian trajectory according to the tracking matching result includes:

In specific implementation, the algorithm of the 3D Kalman filter needs to update the algorithm parameters according to the coordinate position of the 3D center of gravity of the pedestrian trajectory in the previous frame to calculate the predicted position of the 3D center of gravity of the head frame paired in the next frame. In this cycle, the algorithm parameters of the 3D Kalman filter are repeatedly updated to realize the continuous prediction of the 3D center of gravity position of the head frame paired in the next frame.

In this embodiment, all pedestrian trajectories in each depth image of the previous frame and the 3D center of gravity of the head frame in each depth image of the current frame are assigned. The assigned cost metric can be Mahalanobis distance, and the 3D center of gravity of each pedestrian trajectory is used to predict The position and the actual position of the 3D center of gravity of the human head frame detected by the current frame of each depth image is calculated to calculate the cost matrix. For example, the KM algorithm is used as the assignment algorithm to implement the best assignment, and the assignment result is filtered by the threshold to obtain a reliable matching result. For assignment success and passing The matching result of threshold filtering can further prevent the pedestrian trajectory from being mismatched by adding a pedestrian height verification mechanism, and finally update the trajectory and status of the pedestrian trajectory that has been successfully matched.

Preferably, m is 3 and n is an integer greater than or equal to 5. That is to say, when the trajectory tracking state is the initial state and the trajectory tracking state in each of the 3 consecutive depth images is the pedestrian trajectory in the lost state, the pedestrian The trajectory is regarded as noise deletion. When the 3D center of gravity of the head frame is in the area outside the target and the trajectory tracking state in each depth image for 5 consecutive frames is a pedestrian trajectory in a lost state, the pedestrian trajectory needs to be regarded as noise deleted, and there is a trajectory Pedestrian trajectories whose regional status is in the away state need to be deleted as noise. For other complex scenes, such as the 3D center of gravity of the head frame in the target area, the trajectory tracking status in each depth image is the same regardless of how many consecutive frames. In the lost state, the pedestrian trajectory will not be deleted, but the trajectory matching will continue until the matching is successful. This setting mainly takes into account the unmanned store tracking scene, people in the target area will not disappear out of thin air, and tracking failures are mostly caused by technical problems, so follow-up tracking needs to be continued.

It further includes: when the track tracking state corresponding to the pedestrian track is in the deleted state, deleting the pedestrian track and its corresponding bottom library feature data table.

In the foregoing embodiment, the method for updating the state of the trajectory area of each pedestrian trajectory based on the location of the area where the 3D center of gravity point in each depth image is located includes:

In the foregoing embodiment, the method for judging that the track tracking state of the pedestrian track is the lost state includes:

Identify the 3D center of gravity of the head frame in each depth image of the current frame. If the pedestrian trajectory cannot match the 3D center of gravity of the head frame in any depth image of the current frame, the trajectory tracking state of the pedestrian trajectory is considered to be a lost state.

In specific implementation, this embodiment can continuously track all detected pedestrian trajectories in the depth image, and can also adopt different processing strategies for pedestrian trajectories in different locations according to actual needs, but consider the coverage of the depth camera and Application scenario characteristics, some depth cameras may capture pedestrians outside the target area, because pedestrians outside the target area may interfere with pedestrian tracking in the target area, so this embodiment designs a set of pedestrian trajectory area status management The strategy is elaborated as follows:

1. The state of the pedestrian trajectory area outside the target area is set to the initial state. Pedestrians in this state will not perform additional processing on their lost trajectories. If the lost trajectory will not be retrieved through the ReID method, the remaining unmatched areas in the area outside the target The 3D center of gravity of the head frame can create a new pedestrian trajectory;

2. Set the status of the pedestrian trajectory area from the area outside the target into the area inside the target to enter the state, and pedestrians in this state are the focus of tracking;

3. After the pedestrian enters the registration area, the status of the pedestrian trajectory area is set to the registered state. The pedestrian in this state will complete the registration of the base library feature data table without perception, and the pedestrian trajectory in the target area after the registration is completed Always remain in the state of entry;

4. For pedestrians who walk from the area inside the target to the area outside the target, the area status of their pedestrian trajectory is updated to the Away state, the pedestrian trajectory in this state will be deleted, and the corresponding base library feature data table will also be deleted. Avoid affecting other tracking targets;

5. In addition to the above-mentioned normal tracking process, illegal behaviors that are not explained, such as the initial position of the pedestrian trajectory appear in the target area, and the corresponding alarm operation can be carried out according to actual needs.

For the above strategy 3, it needs to be further explained that the registration area is part of the target area, and this area is only used to realize the function of the registered pedestrian base library feature data table (such as the ReID base library picture), and the pedestrian trajectory after the registration is completed The state of the area can be set to enter the state.

The trajectory tracking state in this embodiment is divided into the following four states: a new state, a normal state, a lost state, and a deleted state. When the pedestrian trajectory is initially generated, its trajectory tracking state is the new state. After the pedestrian trajectory has successfully tracked the target in m frames, the trajectory tracking state is set to the normal state. The pedestrian trajectory in the normal state cannot be the same as the human head in the current frame. When the frame 3D center of gravity points match, the trajectory tracking state is set to the lost state, and the corresponding trajectory tracking state in the continuous x-frame depth image is the lost state and the trajectory area state is the registered state or the entered state, the ReID method is used to match and retrieve Lost pedestrian trajectory, if the pedestrian trajectory fails to be retrieved for a long time, or the status of the trajectory area is updated to the away state, set its trajectory tracking state to the deleted state at this time. In this state, the corresponding pedestrian trajectory and its bottom library characteristic data The table is deleted.

In summary, in the initial tracking stage of the pedestrian trajectory in this embodiment, a Kalman filter based on spatial position information is used for tracking. However, due to errors in the estimation of the 3D coordinates of the center of gravity of the human head, missed detection, and occlusion interference from dense crowds, etc. Pedestrian trajectories of normal walking pedestrians in the target area may be lost. In this case, this embodiment uses a deep learning feature-based matching strategy to detect and match the missing pedestrian trajectories and unmatched pedestrian trajectories. For example, the cosine distance is used as the cost metric. The KM algorithm is used to solve the assignment problem between the lost pedestrian trajectory and the unmatched pedestrian trajectory. The matching pedestrian trajectory is updated.

It should be noted that ReID mainly relies on the feature data corresponding to the bottom library feature data table and the regional position of the pedestrian trajectory in the pedestrian trajectory data table, and the regional status of the pedestrian trajectory to achieve pedestrian tracking. The specific implementation plan is for those skilled in the art. As is well known, this embodiment will not repeat this description.

The application scenarios of this embodiment are very rich, such as unmanned supermarkets, smart factories, warehouse anti-theft and loss monitoring, etc. The human body tracking method based on RGB-D images provided by this embodiment ensures the reliability and reliability of pedestrian trajectory tracking in the target area. Sustainability, at the same time, track area status management provides strong technical support for practical applications, which can reduce labor costs while improving management efficiency, and has strong application value and rich application scenarios.

实施例二Example two

This embodiment provides a human body tracking device based on RGB-D images, including:

Compared with the prior art, the beneficial effects of the RGB-D image-based human body tracking device provided by the embodiment of the present invention are the same as the beneficial effects of the RGB-D image-based human body tracking method provided in the first embodiment, which will not be described here. Go into details.

实施例三Example three

This embodiment provides a computer-readable storage medium on which a computer program is stored. When the computer program is run by a processor, the steps of the human body tracking method based on RGB-D images are executed.

Compared with the prior art, the beneficial effects of the computer-readable storage medium provided in this embodiment are the same as those of the RGB-D image-based human body tracking method provided by the above technical solutions, and will not be repeated here.

A person of ordinary skill in the art can understand that all or part of the steps in the above-mentioned inventive method can be implemented by a program instructing relevant hardware. The above-mentioned program can be stored in a computer readable storage medium. When the program is executed, it includes For each step of the method in the foregoing embodiment, the storage medium may be: ROM/RAM, magnetic disk, optical disk, memory card, etc.

The above are only specific embodiments of the present invention, but the protection scope of the present invention is not limited to this. Any person skilled in the art can easily conceive of changes or substitutions within the technical scope disclosed by the present invention, which shall cover Within the protection scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

A human body tracking method based on RGB-D images, which is characterized in that it includes:

Divide the surveillance area into the target area, the registration area and the target area in sequence according to the route of travel, and use multiple distributed depth cameras to capture real-time depth images from top-down photography;

Detect the human body frame, human head frame and the location of the area of the pedestrian in each depth image, and bind the human body frame and human head frame of the same pedestrian in the depth image;

The predicted position of the 3D center of gravity of each pedestrian trajectory is matched with the actual position of the 3D center of gravity of the corresponding head frame of each depth image, and the trajectory tracking status of each pedestrian trajectory is updated according to the tracking matching result. The updated status includes new status and normal Status, lost status and deleted status;

Update the trajectory area status of each pedestrian trajectory based on the location of the 3D center of gravity in each depth image. The updated status includes the initial state, the entry state, the registration state, and the exit state;

When the tracking state of any pedestrian trajectory in the continuous x-frame depth image is lost, and the state of the trajectory area is registered or entered, the ReID method is used to match and retrieve the lost pedestrian trajectory and update accordingly, otherwise according to the depth The position coordinates of the 3D center of gravity of the head frame matched by the tracking in the image are updated correspondingly to the pedestrian trajectory, where x>0, and x is an integer.
The method according to claim 1, wherein the human body frame, head frame, and location of the pedestrian in each depth image are respectively detected, and the human body frame and head frame of the same pedestrian in the depth image are bound to each other include:

Polling each depth image corresponding to the current frame, using the RGB-D target detection method to obtain the pedestrian's human body frame, head frame, and the location of the area where the pedestrian is located in each depth image;

Polling the human body frame area and human head frame area appearing in each depth image, and traverse the inclusion degree of each pair of human body frame and human head frame;

Based on the inclusion degree corresponding to each depth image, the bipartite graph maximum matching algorithm is used to screen out the human body frame and head frame belonging to the same pedestrian in each depth image to bind each other.
The method according to claim 2, characterized in that, based on the inclusion degree corresponding to each depth image, a bipartite graph maximum matching algorithm is used to screen out the human body frame and the human head frame belonging to the same pedestrian in each depth image to bind each other include:

According to the degree of inclusion corresponding to each depth image, use the bipartite graph maximum matching algorithm to filter out the human body frame and the human head frame in each depth image for initial pairing;

Compare the inclusion degree in the initial pairings corresponding to each depth image with the coincidence degree threshold, and screen out the initial pairs whose inclusion degree is greater than or equal to the coincidence degree threshold for binding confirmation, and screen out the initial pairs whose inclusion degree is less than the coincidence degree threshold for binding confirmation. The binding is released.
The method according to claim 1, wherein the method for calculating the predicted position of the 3D center of gravity of the pedestrian trajectory comprises:

Transform each depth image into three-dimensional coordinates, and calculate the 3D center of gravity of the human head frame in the depth image;

Multi-dimensional modeling is performed on the spatial position of the 3D center of gravity of the pedestrian trajectory. The dimensional vector of the model includes ( x, y, z, h, V x , V y , V z ), where x, y, z correspond to 3D The three-dimensional coordinates of the center of gravity point, V x , V y , and V z correspondingly represent the movement speed of the 3D center of gravity point in the coordinate direction of the corresponding dimension, and h represents the height of the pedestrian to which the 3D center of gravity point belongs;

Based on the current x- axis coordinates, y- axis coordinates, z- axis coordinates of the 3D center of gravity point of the pedestrian trajectory , and the corresponding movement speed in the x- axis direction V x , the movement speed in the y- axis direction V x , the movement speed in the z- axis direction z Calculate the predicted position in the x- axis direction, the predicted position in the y- axis direction, and the predicted position in the z- axis direction of the current pedestrian trajectory 3D center of gravity in the next frame of depth image.
The method of claim 4, wherein the predicted position of the 3D center of gravity of each pedestrian trajectory is matched with the actual position of the 3D center of gravity of the corresponding head frame of each depth image, and each pedestrian trajectory is updated according to the tracking matching result The methods of trajectory tracking state include:

The Kalman filter tracking algorithm is used to track the actual position of the 3D center of gravity of the corresponding head frame in each depth image of the current frame to obtain the actual position of the 3D center of gravity;

Traverse and calculate the cost metric of the actual position of each 3D center of gravity in each depth image of the current frame and the predicted position of each pedestrian trajectory 3D center of gravity to obtain a cost matrix;

After the traversal calculation of each depth image in the current frame is completed, the bipartite graph maximum matching algorithm is used based on the cost matrix to filter out the initial pairing of each pedestrian trajectory with the actual position of the 3D center of gravity in each depth image in the current frame;

The primaries whose cost metric is less than or equal to the cost threshold are screened out and considered as successful, and the primaries whose cost metric is greater than the cost threshold are screened out as unmatched;

The unpaired primary selection pairing includes the remaining unpaired head frame 3D center of gravity points and the remaining unpaired pedestrian trajectories. For the remaining unpaired head frame 3D center of gravity points in the area outside the target in each depth image of the current frame, create a new Pedestrian trajectory and update the trajectory tracking status to the new state, and at the same time update the trajectory area status of the new pedestrian trajectory to the initial state, and/or, for the remaining unpaired pedestrian trajectories in each depth image of the current frame, change the pedestrian trajectory The trajectory tracking status of is updated to the lost status;

After filtering the pairing threshold for the initial selection of a successful pairing, update the trajectory tracking status of the paired pedestrian trajectory to the normal state, and at the same time update the actual position of the 3D center of gravity of the paired head frame to the 3D center of gravity of the current pedestrian trajectory;

For the pedestrian trajectory whose 3D center of gravity of the paired head frame is in the area outside the target and the trajectory tracking state for consecutive n frames is the lost state, and/or the trajectory area state is the pedestrian trajectory in the off state, and/or the trajectory tracking state is the initial state State and continuous m frames of trajectory tracking state are pedestrian trajectories in the lost state, and the trajectory tracking state of the pedestrian trajectory is updated to the deleted state, where n>0, m>0, and n and m are both integers.
The method according to claim 5, further comprising:

When the track tracking state corresponding to the pedestrian track is the deleted state, delete the pedestrian track and its corresponding bottom library feature data table.
The method according to claim 6, wherein the method of updating the state of the trajectory area of each pedestrian trajectory based on the position of the area where the 3D center of gravity point is located in each depth image comprises:

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that appears in the area outside the target, and set the state of the trajectory area corresponding to the pedestrian trajectory to the initial state;

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that appears in the registered area, set the status of the trajectory area corresponding to the pedestrian trajectory to the registered state, register and update the base library feature data table in real time;

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that appears in the target area, and set the state of the trajectory area corresponding to the pedestrian trajectory to the entering state;

Traverse the 3D center of gravity of the head frame in each depth image of the current frame, identify the 3D center of gravity of the head frame that leaves the area inside the target and enter the area outside the target, and set the state of the trajectory area corresponding to the pedestrian trajectory to the away state.
The method according to claim 1, wherein the method for judging that the tracking state of the pedestrian trajectory is a lost state comprises:

Identify the 3D center of gravity of the head frame in each depth image of the current frame, and if the pedestrian trajectory cannot match the 3D center of gravity of the head frame in any depth image, the trajectory tracking state of the pedestrian trajectory is considered to be a lost state.
A human body tracking device based on RGB-D images, which is characterized in that it comprises:

The partition setting unit is used to divide the monitoring area into the target area, the registration area, and the target area in sequence according to the route of travel, and use multiple distributed depth cameras to capture real-time depth images;

The detection frame binding unit is used to detect the human body frame, human head frame and the location of the area of the pedestrian in each depth image, and bind the human body frame and human head frame of the same pedestrian in the depth image;

Trajectory tracking state detection unit is used to track and match the predicted position of the 3D center of gravity of each pedestrian trajectory with the actual position of the 3D center of gravity of the corresponding head frame of each depth image, and update the trajectory tracking status of each pedestrian trajectory according to the tracking matching result, The updated status includes new status, normal status, lost status and deleted status;

The trajectory area status detection unit updates the trajectory area status of each pedestrian trajectory based on the location of the 3D center of gravity in each depth image. The updated status includes the initial status, the entry status, the registration status, and the exit status;

The trajectory tracking unit, when the corresponding trajectory tracking state of any pedestrian trajectory in the continuous x-frame depth image is the lost state and the trajectory area state is the registered state or the entered state, the ReID method is used to match and retrieve the lost pedestrian trajectory and update it accordingly , Otherwise, update the pedestrian trajectory correspondingly according to the coordinates of the 3D center of gravity of the human head frame matched by the tracking in the depth image, where x>0, and x is an integer.
A computer-readable storage medium with a computer program stored on the computer-readable storage medium, wherein the computer program executes the steps of the method according to any one of claims 1 to 8 when the computer program is run by a processor.