CN112767442A

CN112767442A - Pedestrian three-dimensional detection tracking method and system based on top view angle

Info

Publication number: CN112767442A
Application number: CN202110064121.2A
Authority: CN
Inventors: 郑慧诚; 苏志荣
Original assignee: Sun Yat Sen University
Current assignee: Sun Yat Sen University
Priority date: 2021-01-18
Filing date: 2021-01-18
Publication date: 2021-05-07
Anticipated expiration: 2041-01-18
Also published as: CN112767442B

Abstract

The invention discloses a pedestrian three-dimensional detection tracking method and system based on a top view angle, wherein the method comprises the following steps: performing stereo correction and stereo matching processing on left and right images acquired by a binocular camera; filling the holes and converting to obtain a height map; modeling the height map to obtain a two-dimensional height foreground map; projecting and converting the image into a two-dimensional plane point cloud projection image under a top view angle, and detecting to obtain a three-dimensional coordinate point of the head of the pedestrian; mapping the three-dimensional coordinate points of the heads of the pedestrians back to a two-dimensional image coordinate system, and positioning a boundary frame of each pedestrian; and performing interframe pedestrian matching tracking by combining the three-dimensional coordinate points of the head of the pedestrian and the positioned pedestrian boundary frame. By using the invention, the problem of missed detection caused by shielding among pedestrians can be effectively solved, and the detection recall rate of the pedestrians is improved. The pedestrian three-dimensional detection tracking method and system based on the top view angle can be widely applied to the field of pedestrian detection.

Description

Pedestrian three-dimensional detection tracking method and system based on top view angle

Technical Field

The invention relates to the field of video image processing, in particular to a pedestrian three-dimensional detection tracking method and system based on a top view angle.

Background

Pedestrian detection is a popular research direction of computer vision, and is widely applied to daily life of people. And the pedestrian detection problem can be divided into top view angle pedestrian detection and squint angle pedestrian detection according to the difference of the position and the visual angle of the camera. Most of the current mainstream pedestrian detection algorithm researches are pedestrian detection at an oblique view angle with higher flexibility, the view angle has the defects that pedestrians are seriously shielded and adhered and are easy to miss detection, and the pedestrian detection based on the top view angle is a special application scene of a pedestrian detection task, but few people research the aspect at present.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a pedestrian three-dimensional detection tracking method based on a top view angle, which can effectively overcome the problem of missed detection caused by shielding among pedestrians and improve the detection recall rate of the pedestrians.

The first technical scheme adopted by the invention is as follows: a pedestrian three-dimensional detection tracking method based on top view angle comprises the following steps:

performing stereo correction and stereo matching processing on left and right images acquired by a binocular camera to obtain a depth map;

filling holes in the depth map, and converting the depth map to obtain a height map;

modeling the height map based on an MOG2 background modeling method to obtain a two-dimensional height foreground map;

specifically, the number of frames for modeling the background model was set to 500 frames, the learning rate was 0.002, the variance threshold was set to 16, and the number of gaussian components was set to 5.

Projecting the two-dimensional height foreground image into a three-dimensional height foreground point cloud image, converting the three-dimensional height foreground image into a two-dimensional plane point cloud projection image under a top view angle, and detecting to obtain a three-dimensional coordinate point of the head of the pedestrian;

mapping the three-dimensional coordinate points of the heads of the pedestrians back to a two-dimensional image coordinate system, and positioning a boundary frame of each pedestrian in the two-dimensional image by combining the three-dimensional coordinate points of the heads;

and performing interframe pedestrian matching tracking by combining the three-dimensional coordinate point of the head of the pedestrian and the positioned pedestrian boundary frame and utilizing the interframe distance of the central point of the pedestrian and the color histogram characteristics in the boundary frame.

Further, before the step of performing stereo correction and stereo matching processing on the left and right images acquired by the binocular camera to obtain the depth map, the method further includes:

and carrying out three-dimensional calibration on the binocular camera to obtain internal and external parameters and a perspective transformation matrix of the camera.

Specifically, a perspective transformation matrix Q of 4 × 4 can be calculated by the stereo calibration step.

Further, the step of performing stereo correction and stereo matching processing on the left and right images acquired by the binocular camera to obtain a depth map specifically includes:

acquiring images based on a binocular camera and performing stereo correction on left and right images acquired by the binocular camera to obtain corrected images;

and carrying out stereo matching on the corrected image based on an SGBM algorithm to obtain a depth map.

Further, the step of filling the holes in the depth map and converting the depth map into the height map specifically includes:

selecting reliable depth points near the cavity as reference values to fill the cavity to obtain a depth map after filling;

and converting the filled depth map into a height map according to the installation height of the binocular camera.

Further, the perspective transformation matrix Q expression is as follows:

in the above formula, c_xAnd c_yThe coordinates of the principal points of the left and right lenses of the binocular camera are respectively represented, f is the normalized focal length, and T is the baseline distance between the optical centers of the left and right lenses. Using this matrix enables two-dimensional height foregroundEach pixel coordinate point in the image is projected into a three-dimensional high foreground point cloud image.

Further, the step of projecting the two-dimensional height foreground map into the three-dimensional height foreground point cloud map and converting the two-dimensional height foreground map into a two-dimensional plane point cloud projection map under a top view angle, and detecting to obtain the three-dimensional coordinate point of the head of the pedestrian specifically comprises:

projecting the two-dimensional height foreground map to form a three-dimensional height foreground point cloud map based on a perspective transformation matrix, and establishing a mapping relation table of two-dimensional pixel coordinate points and three-dimensional physical world coordinate points;

and converting the three-dimensional height foreground point cloud image into a two-dimensional plane point cloud projection image under a top view angle, and detecting and positioning the head of the pedestrian based on the two-dimensional plane point cloud projection image under the top view angle to obtain the three-dimensional coordinate point of the head of the pedestrian.

Specifically, the eight-neighborhood height information of each point is utilized to detect and locate the head of the pedestrian.

Further, the step of mapping the three-dimensional coordinate point of the head of the pedestrian back to the coordinate system of the two-dimensional image and positioning the bounding box of each pedestrian in the two-dimensional image by combining the three-dimensional coordinate point of the head specifically includes:

traversing the whole image, and calculating each pixel point P through a perspective transformation matrix Q to obtain a three-dimensional physical world coordinate point W of each pixel point P;

suppose that the pedestrian head point set S is positioned in the top view angle two-dimensional point cloud projection image { H }₁,H₂,..,H_kIn which H is₁,H₂,..,H_kThree-dimensional coordinates representing the heads of k pedestrians;

calculating a physical point W and three-dimensional points H of each head_i(i∈[1,k]) The distance in the plane direction formed by the x axis and the y axis is obtained, and for the point W with the obtained distance smaller than a set distance threshold value, a two-dimensional pixel point P of the point W is marked to belong to the boundary frame of the ith pedestrian, so that a pixel point set of the boundary frame of each pedestrian is obtained;

and calculating the minimum area circumscribed rectangle of the pixel point set of each pedestrian boundary frame to obtain the boundary frame of each pedestrian.

The second technical scheme adopted by the invention is as follows: a pedestrian three-dimensional detection and tracking system based on top view comprises:

the depth map module is used for performing stereo correction and stereo matching processing on the left image and the right image acquired by the binocular camera to obtain a depth map;

the height map module is used for filling holes in the depth map and converting the holes to obtain a height map;

the two-dimensional height foreground map module is used for modeling the height map based on a MOG2 background modeling method to obtain a two-dimensional height foreground map;

the head three-dimensional coordinate point module is used for projecting the two-dimensional height foreground image into the three-dimensional height foreground point cloud image, converting the two-dimensional height foreground image into a two-dimensional plane point cloud projection image under a top view angle, and detecting to obtain a pedestrian head three-dimensional coordinate point;

the boundary frame module is used for mapping the three-dimensional coordinate points of the heads of the pedestrians back to a two-dimensional image coordinate system and positioning the boundary frame of each pedestrian in the two-dimensional image by combining the coordinates of the head points;

and the matching tracking module is used for combining the three-dimensional coordinate point of the head of the pedestrian and the positioned pedestrian boundary frame and carrying out interframe pedestrian matching tracking by utilizing the interframe distance of the central point of the pedestrian and the color histogram characteristics in the boundary frame.

The method and the system have the beneficial effects that: in addition, the invention projects the two-dimensional pixel points after filtering the high foreground image into the three-dimensional point cloud image, constructs the two-dimensional point cloud high foreground image with a vertex angle to detect and position the head of the pedestrian, can better overcome the problem of missed detection caused by shielding among pedestrians compared with the pedestrian detection of the traditional monocular camera, and improves the recall rate of the pedestrian detection.

Drawings

FIG. 1 is a flow chart of steps of a pedestrian three-dimensional detection and tracking method based on top view according to the invention;

fig. 2 is a structural block diagram of a pedestrian three-dimensional detection and tracking system based on a top view angle.

Detailed Description

The invention is described in further detail below with reference to the figures and the specific embodiments. The step numbers in the following embodiments are provided only for convenience of illustration, the order between the steps is not limited at all, and the execution order of each step in the embodiments can be adapted according to the understanding of those skilled in the art.

Referring to fig. 1, the invention provides a pedestrian three-dimensional detection and tracking method based on a top view angle, which comprises the following steps:

s1, performing stereo correction and stereo matching processing on the left image and the right image acquired by the binocular camera to obtain a depth map;

s2, filling holes in the depth map, and converting the holes to obtain a height map;

s3, modeling the height map based on an MOG2 background modeling method to obtain a two-dimensional height foreground map;

s4, projecting the two-dimensional height foreground image into the three-dimensional height foreground point cloud image, converting the two-dimensional height foreground image into a two-dimensional plane point cloud projection image under a top view angle, and detecting to obtain a three-dimensional coordinate point of the head of the pedestrian;

s5, mapping the three-dimensional coordinate points of the heads of the pedestrians back to a two-dimensional image coordinate system, and positioning a boundary frame of each pedestrian in the two-dimensional image by combining the three-dimensional coordinate points of the heads;

and S6, combining the three-dimensional coordinate point of the head of the pedestrian and the positioned pedestrian boundary frame, and performing inter-frame pedestrian matching tracking by using the inter-frame distance of the central point of the pedestrian and the color histogram characteristics in the boundary frame.

As a further preferred embodiment of the method, before the step of performing stereo correction and stereo matching on the left and right images acquired by the binocular camera to obtain the depth map, the method further includes:

Further as a preferred embodiment of the method, the step of performing stereo correction and stereo matching on the left and right images acquired by the binocular camera to obtain the depth map specifically includes:

specifically, left and right images acquired by a binocular camera are firstly scaled to 320 × 240 resolution, and then corrected binocular images are acquired by performing stereo correction by using camera internal and external parameters.

Further, as a preferred embodiment of the method, the step of filling the holes in the depth map and converting the depth map into the height map specifically includes:

Specifically, the depth map is filled with the holes, reliable depth points near the holes are selected as reference values to fill the holes, and specifically, the holes with different areas are filled with filtering by adopting average filtering kernels with different sizes, small filtering kernels are used for the holes with small areas, and large filtering kernels are used for the holes with large areas. Each depth value in the depth map is then subtracted from the camera height to convert to a height map, depending on the camera mounting height.

Further as a preferred embodiment of the method, the expression of the perspective transformation matrix Q is as follows:

in the above formula, c_xAnd c_yThe coordinates of the principal points of the left and right lenses of the binocular camera are respectively represented, f is the normalized focal length, and T is the baseline distance between the optical centers of the left and right lenses.

Further, as a preferred embodiment of the method, the step of projecting the two-dimensional height foreground map into the three-dimensional height foreground point cloud map and converting the two-dimensional height foreground map into a two-dimensional plane point cloud projection map under a top view angle to detect and obtain the three-dimensional coordinate point of the head of the pedestrian specifically includes:

Further, as a preferred embodiment of the method, the step of mapping the three-dimensional coordinate point of the head of the pedestrian back to the coordinate system of the two-dimensional image and positioning the bounding box of each pedestrian in the two-dimensional image by combining the three-dimensional coordinate point of the head specifically includes:

As shown in fig. 2, a pedestrian three-dimensional detection and tracking system based on top view includes:

The contents in the above method embodiments are all applicable to the present system embodiment, the functions specifically implemented by the present system embodiment are the same as those in the above method embodiment, and the beneficial effects achieved by the present system embodiment are also the same as those achieved by the above method embodiment.

While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A pedestrian three-dimensional detection tracking method based on a top view angle is characterized by comprising the following steps:

2. The pedestrian three-dimensional detection and tracking method based on the top view angle of claim 1, wherein before the step of performing stereo correction and stereo matching on the left and right images acquired by the binocular camera to obtain the depth map, the method further comprises:

3. The pedestrian three-dimensional detection and tracking method based on the top view angle of claim 2, wherein the step of performing stereo correction and stereo matching on the left and right images acquired by the binocular camera to obtain the depth map specifically comprises:

4. The pedestrian three-dimensional detection and tracking method based on the top view angle as claimed in claim 3, wherein the step of filling the hole in the depth map and converting the depth map into the height map specifically comprises:

5. The pedestrian three-dimensional detection and tracking method based on the top view angle as claimed in claim 4, wherein the perspective transformation matrix Q expression is as follows:

6. The pedestrian three-dimensional detection and tracking method based on the top view angle of claim 5, wherein the step of projecting the two-dimensional height foreground map into the three-dimensional height foreground point cloud map and converting the three-dimensional height foreground map into the two-dimensional plane point cloud projection map under the top view angle to detect the three-dimensional coordinate point of the head of the pedestrian specifically comprises:

7. The pedestrian three-dimensional detection and tracking method based on the top view angle as claimed in claim 6, wherein the step of mapping the three-dimensional coordinate point of the head of the pedestrian back to the coordinate system of the two-dimensional image and locating the bounding box of each pedestrian in the two-dimensional image in combination with the three-dimensional coordinate point of the head specifically comprises:

8. A pedestrian three-dimensional detection tracking system based on top view angle is characterized by comprising: