CN116740160A

CN116740160A - Millisecond level multi-plane real-time extraction method and device in complex traffic scene

Info

Publication number: CN116740160A
Application number: CN202310652717.3A
Authority: CN
Inventors: 张新钰; 熊一瑾; 郭世纯; 侯翊良; 肖哲丰; 王超
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-09-12

Abstract

The application provides a millisecond level multi-plane real-time extraction method and a millisecond level multi-plane real-time extraction device in a complex traffic scene, which relate to the technical field of three-dimensional reconstruction, wherein the method comprises the following steps: for each pixel point of the depth image, respectively searching the pixel points with depth values of which the upper, lower, left and right directions are not null in a preset searching range, wherein the pixel points are used as neighborhood points, and if all the four directions can search the neighborhood points, the pixel points are seed points; for each seed point, judging whether the seed points and the four neighborhood points meet the following conditions: the seed point and the position points of the left neighborhood point and the right neighborhood point under the camera coordinate system are on a straight line, and the seed point and the position points of the upper neighborhood point and the lower neighborhood point under the camera coordinate system are on a straight line; if yes, calculating a plane equation of a facet where the spatial position point corresponding to the seed point is located; and clustering the facets according to parameters of the plane equation of the facets to obtain the plane equation of each main plane in the depth image. The method improves the speed and the precision of extracting the plurality of planes of the depth image.

Description

Millisecond level multi-plane real-time extraction method and device in complex traffic scene

Technical Field

The application relates to the technical field of three-dimensional reconstruction, in particular to a millisecond level multi-plane real-time extraction method and device in a complex traffic scene.

Background

The plane is used as a third type of advanced features following the feature points and the feature lines, can be used for state estimation and feature matching, and has the characteristics of robustness and universality; and is widely used in three-dimensional reconstruction problems.

The laser radar commonly adopted for early data acquisition has the defects of high price and the like, so that the development of the laser radar is limited, and the problem is improved due to the appearance of a depth camera. Various plane detection algorithms of point cloud data, such as RANSAC-based three-point plane detection, region growing-based algorithms, etc., have thus emerged. Based on the demand of unmanned to real-time, these algorithms are difficult to meet the demand of real-time on CPU operation.

Parallel computing can greatly optimize computing speed with the high-speed development of GPUs, but some traditional plane detection algorithms are not suitable for parallel computing, so that time overhead is difficult to reduce.

Disclosure of Invention

In view of the above, the present application provides a millisecond level multi-plane real-time extraction method and apparatus in a complex traffic scene to solve the above technical problems.

In a first aspect, an embodiment of the present application provides a millisecond level multi-plane real-time extraction method in a complex traffic scene, including:

acquiring a depth image acquired by a depth camera;

for each pixel point of the depth image, respectively searching the pixel points with depth values of which the upper, lower, left and right directions are not null in a preset searching range, wherein the pixel points are used as neighborhood points, and if the neighborhood points can be searched in all the four directions, the pixel points are seed points;

for each seed point, judging whether the seed points and the four neighborhood points meet the following conditions: the seed point and the position points of the left neighborhood point and the right neighborhood point under the camera coordinate system are on a straight line, and the seed point and the position points of the upper neighborhood point and the lower neighborhood point under the camera coordinate system are on a straight line; if so, calculating a plane equation of a facet where the spatial position point corresponding to the seed point is located according to the spatial position coordinates of the seed point and the four neighborhood points;

and clustering the facets according to parameters of plane equations of facets of the spatial position points corresponding to the various sub-points to obtain plane equations of all main planes in the depth image.

Further, for each pixel point of the depth image, respectively searching a pixel point with depth values not being empty in the up-down, left-right directions within a preset searching range, and taking the pixel point as a neighborhood point, if all the four directions can search the neighborhood point, the pixel point is a seed point, including:

searching pixels in a minimum search distance search_min and a maximum search distance search_max along the left direction by taking each pixel p of the depth image as a center point, and searching the pixels p with the first depth value not being empty ₁ Pixel point p ₁ A neighborhood point in the left direction as a center point;

searching the pixel point in the minimum search distance search_min and the maximum search distance search_max along the upward direction by taking the pixel point p as a center point, and searching the pixel point p with the first depth value not being empty ₂ Pixel point p ₂ A neighborhood point in an upward direction as a center point;

searching the pixel point in the minimum search distance search_min and the maximum search distance search_max along the rightward direction by taking the pixel point p as a center point, and searching the pixel point p with the first depth value not being empty ₃ Pixel point p ₃ A neighborhood point in the rightward direction as a center point;

searching the pixel point in the minimum search distance search_min and the maximum search distance search_max along the downward direction by taking the pixel point p as a center point, and searching the pixel point p with the first depth value not being empty ₄ Pixel point p ₄ A neighborhood point in a downward direction as a center point;

if the pixel point p can search four neighborhood points p ₁ ，p ₂ ，p ₃ ，p ₄ And the pixel points are seed points.

Further, for each seed point, it is determined whether it and the four neighborhood points satisfy: the method for determining the position points of the seed point and the left and right neighborhood points in the camera coordinate system is on a straight line, and the position points of the seed point and the upper and lower neighborhood points in the camera coordinate system are on a straight line, and comprises the following steps:

for seed point p ₀ And a neighborhood point p in the left and right directions ₁ And p ₃ The constraint horizaontal in the horizontal direction was calculated:

horizaontal＝left*left_depth*(d ₀ -right_depth)+

right*right_depth*(d ₀ -left_depth)

wherein d ₀ For seed point p ₀ Depth values of (2); left is the neighborhood point p ₁ With the seed point p ₀ Is a distance of (2); left_depth is the neighborhood point p ₁ Depth values of (2); right is the neighborhood point p ₃ With the seed point p ₀ Is a distance of (2); right_depth is the neighborhood point p ₃ Depth values of (2);

if horizaontal < alpha, seed point p ₀ And a neighborhood point p in the left and right directions ₁ And p ₃ The position point under the camera coordinate system is on a straight line; alpha is a preset threshold value;

for seed point p ₀ And a neighborhood point p in the upper and lower directions ₂ And p ₄ The constraint vertical in the vertical direction is calculated as follows:

vertical＝up*up_depth*(d ₀ -down_depth)+

down*down_depth*(d ₀ -up_depth)

wherein up is the neighborhood point p ₂ With the seed point p ₀ Is a distance of (2); up_depth is the neighborhood point p ₂ Depth values of (2); down is the neighborhood point p ₄ With the seed point p ₀ Is a distance of (2); down_depth is the neighborhood point p ₄ Depth values of (2);

if vertical < alpha, seed point p ₀ And a neighborhood point p in the upper and lower directions ₂ And p ₄ Sitting on the cameraThe position points under the label system are on a straight line.

Further, according to the space position coordinates of the seed points and the four neighborhood points, calculating a plane equation of a small plane where the space position points corresponding to the seed points are located; comprising the following steps:

pixel point p ₀ ，p ₁ ，p ₂ ，p ₃ ，p ₄ Converting the position point coordinate under the camera coordinate system into the space position point coordinate of the geocentric rectangular coordinate system, and obtaining the pixel point p ₀ ，p ₁ ，p ₂ ，p ₃ ，p ₄ The corresponding spatial position point is

Calculating spatial location pointsNormal vector of plane equation of the facet at which +.>

Wherein A, B and C are normal vectorsThree elements of (2);

calculating the distance between the plane and the origin of the camera coordinate system to obtain a depth value d:

spatial location pointThe parameters of the plane equation of the facet at which it resides include a, B, C and d.

Further, clustering the facets according to parameters of plane equations of facets of spatial position points corresponding to various sub-points to obtain plane equations of main planes in the depth image; comprising the following steps:

splicing three elements of the normal vector of the facet of each seed point, taking the spliced value as an abscissa and the depth value as an ordinate to obtain a coordinate plane, and drawing all the points on the coordinate plane;

gridding the coordinate plane according to preset intervals, and dividing the coordinate plane into a plurality of grids;

counting the number of points in each grid, and putting the grids with the number of points larger than a preset threshold value into a set to be clustered;

randomly selecting a grid from a set to be clustered, clustering surrounding effective grids to generate a cluster, wherein the effective grids are grids with the number of points larger than a preset threshold value; thereby obtaining a plurality of clusters;

acquiring the number of points of all grids in each cluster: m is m ₁ 、m ₂ …m _n Wherein n is the number of grids, and the parameters A of the plane equation of the facets of the central points of all the grids are obtained ₁ 、B ₁ 、C ₁ 、A ₂ 、B ₂ 、C ₂ …A _n 、B _n 、C _n And a depth value d ₁ 、d ₂ …d _n ；

Calculating the normal vector (A) ₀ ，B ₀ ，C ₀ ) And depth value d ₀ ：

Thereby obtaining a plane equation for each principal plane in the depth camera data.

In a second aspect, an embodiment of the present application provides a millisecond level multi-plane real-time extraction device in a complex traffic scene, including:

the acquisition unit is used for acquiring the depth image acquired by the depth camera;

the neighborhood point acquisition unit is used for searching pixel points with depth values not being empty in the upper, lower, left and right directions in a preset search range for each pixel point of the depth image respectively, wherein the pixel points are seed points if the neighborhood points can be searched in the four directions;

the calculating unit is used for judging whether each seed point and four neighborhood points meet the following conditions: the seed point and the position points of the left neighborhood point and the right neighborhood point under the camera coordinate system are on a straight line, and the seed point and the position points of the upper neighborhood point and the lower neighborhood point under the camera coordinate system are on a straight line; if so, calculating a plane equation of a facet where the spatial position point corresponding to the seed point is located according to the spatial position coordinates of the seed point and the four neighborhood points;

and the clustering unit is used for clustering the facets according to the parameters of the plane equation of the facets of the spatial position points corresponding to the various sub-points to obtain the plane equation of each main plane in the depth image.

In a third aspect, an embodiment of the present application provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of the embodiments of the application when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer readable storage medium storing computer instructions that, when executed by a processor, perform a method of embodiments of the present application.

The application improves the speed and precision of extracting a plurality of planes of the depth image.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present application, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a millisecond level multi-plane real-time extraction method in a complex traffic scene provided by an embodiment of the application;

FIG. 2 is a schematic diagram of a seed point and four neighboring points according to an embodiment of the present application;

FIG. 3 is a schematic view of a facet provided by an embodiment of the present application;

fig. 4 is a functional block diagram of a millisecond level multi-plane real-time extraction device in a complex traffic scene according to an embodiment of the present application;

fig. 5 is a functional block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. The components of the embodiments of the present application generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the application, as presented in the figures, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

First, the design idea of the embodiment of the present application will be briefly described.

Currently, cameras are widely used in many scenarios of autonomous vehicles and robots, such as autonomous driving, object classification, object tracking, and SLAM. The images captured by the camera can provide a rich representation of the environment, but cannot express distance information. Lidar is commonly used to express range, but its development is limited due to its expensive and other drawbacks. The premise of using both lidar and cameras is that calibration is required, and when the relative position between the two sensors changes, the perceived result is no longer reliable. The advent of depth cameras ameliorates these problems, and currently most depth cameras are designed as RGB-D cameras. The system can collect RGB information and distance information simultaneously, and the problem of uncertainty of the position relation between the sensors is avoided through the integrated design. Since the sub-processes of the past plane extraction tasks are highly dependent on each other, the computation planes cannot be parallelized, resulting in a significant loss of time.

The existing method for extracting planes aiming at depth data, for example, RANSAC firstly extracts the largest plane in data and then searches other planes in the rest data; the extraction method based on Manhattan assumption, and many planes in the real environment do not meet the characteristic of orthogonality, so that the real-time performance and accuracy are not met.

Aiming at the defect of the existing extraction plane technology in the parallelization direction, the application provides a millisecond multi-plane real-time extraction method in a complex traffic scene, which can accurately and effectively extract all planes in a depth image, and each step of the implementation process can be calculated in parallel, namely, one step is decomposed into a series of repeated steps which can be executed in parallel, so that each step can be ensured to be completed by using a plurality of threads simultaneously.

According to the method, the characteristics of the depth camera data are utilized to conduct targeted optimization on the facet detection, the calculation of a facet equation and the facet clustering are increased, and the plane detection is achieved. In order to further improve the detection efficiency, a grid clustering mode is adopted, so that a clustering step is accelerated, and the influence of noise on a result is reduced. In addition, according to the characteristic that the pixels above, below, left and right of each pixel of the depth image are fixed, the GPU can extract plane information in the image in real time through parallel calculation.

According to the method, all planes in the depth camera can be detected, GPU acceleration can be adopted, the plane detection speed is further improved, and real-time detection of an automatic driving scene is realized; compared with the traditional method of calculating the distances from all other points to the plane to be detected by using three random points, the small plane clustering reduces the calculated amount, increases the number of the detected planes, optimizes the detection speed by using a parallel thought, and simultaneously ensures that the detection result is finer.

After the application scenario and the design idea of the embodiment of the present application are introduced, the technical solution provided by the embodiment of the present application is described below.

As shown in fig. 1, the embodiment of the application provides a millisecond level multi-plane real-time extraction method in a complex traffic scene, which comprises the following steps:

step 101: acquiring a depth image acquired by a depth camera;

step 102: for each pixel point of the depth image, respectively searching the pixel points with depth values of which the upper, lower, left and right directions are not null in a preset searching range, wherein the pixel points are used as neighborhood points, and if the neighborhood points can be searched in all the four directions, the pixel points are seed points;

the neighborhood searching method needs to consider both searching efficiency and accuracy. Searching for a neighborhood point is prepared for extracting a facet, and the neighborhood point needs to be selected to represent the plane change condition around the pixel point p as much as possible. The selection of four nearby points needs to represent the plane variation around p-points as much as possible. If the selected neighborhood points are too close or too far, there is no way to indicate the pixel change around p. A valid search range is first given. A search for valid nearby points is then performed within range. Outliers and invalid points are removed in the searching process, so that the effectiveness of facet extraction is guaranteed. An outlier is considered to be a point where the depth of the current point is large from all four points around. When more than one of the four points in the vicinity does not exist, this point is considered as an invalid point.

Specifically, the method comprises the following steps:

This procedure ensures that the selection of a neighborhood of points is adaptively adjusted according to the degree of density of points around the neighborhood. And in this process, most of the noise points (outliers and nulls) are eliminated.

p neighborhood point [ p ] ₁ ，p ₂ ，p ₃ ，p ₄ ]The selection is flexible, and when the depth information has great difference, the depth information can be along the width or height directionA suitable distance (empirical value) is explored. If there is a pixel of similar depth information within this distance, then this pixel point is preferably used for the facet calculation. The influence of partial noise is eliminated, and relatively more accurate calculation of the facet equation is realized. Meanwhile, the judgment of the noise point of p is a step to be completed. Only if there are valid neighborhood points in the up, down, left and right directions of the pixel point, the pixel point is considered as a seed point.

For each pixel point, the neighborhood points in four directions can be searched in parallel.

Step 103: for each seed point, judging whether the seed points and the four neighborhood points meet the following conditions: the seed point and the position points of the left neighborhood point and the right neighborhood point under the camera coordinate system are on a straight line, and the seed point and the position points of the upper neighborhood point and the lower neighborhood point under the camera coordinate system are on a straight line; if so, calculating a plane equation of a facet where the spatial position point corresponding to the seed point is located according to the spatial position coordinates of the seed point and the four neighborhood points;

as shown in fig. 2, the coordinates of the pixel point in the image plane are two-dimensional, and the pixel point is three-dimensional coordinates in the camera coordinate system after adding the depth value.

horizaontal＝left*left_depth*(d ₀ -right_depth)+

right*right_depth*(d ₀ -left_depth)

vertical＝up*up_depth*(d ₀ -down_depth)+

down*down_depth*(d ₀ -up_depth)

if vertical < alpha, seed point p ₀ And a neighborhood point p in the upper and lower directions ₂ And p ₄ The position points under the camera coordinate system are on a straight line.

For each seed point, horizaontal and vertical may be calculated in parallel and then judged.

Specifically, according to the space position coordinates of the seed points and the four neighborhood points, calculating a plane equation of a small plane where the space position points corresponding to the seed points are located; comprising the following steps:

Wherein A, B and C are normal vectorsThree elements of (2);

The plane equations for all facets can be computed using multithreading in parallel, with the effect shown in fig. 3.

Step 104: clustering the facets according to parameters of plane equations of facets of spatial position points corresponding to the various sub-points to obtain plane equations of all main planes in the depth image;

because of the common clustering of points into planes, the clustering process of different points is not independent of each other. This makes the clustering process more time consuming. And the different selection orders of the points to be clustered also affects the reliability of the extracted plane. In order to increase the calculation speed and to ensure the robustness of the extracted plane. In this embodiment, the facets are clustered, and the clustering steps may be calculated in parallel.

Specifically, the method comprises the following steps:

Based on the above embodiments, the embodiment of the present application provides a millisecond-level multi-plane real-time extraction device in a complex traffic scene, and referring to fig. 4, the millisecond-level multi-plane real-time extraction device 200 in a complex traffic scene provided by the embodiment of the present application at least includes:

an acquiring unit 201, configured to acquire a depth image acquired by a depth camera;

a neighborhood point obtaining unit 202, configured to search, for each pixel point of the depth image, for a pixel point whose depth values in up, down, left, and right directions are not null in a preset search range, as a neighborhood point, and if all the four directions can search for the neighborhood point, the pixel point is a seed point;

the calculating unit 203 is configured to determine, for each seed point, whether the seed point and the four neighboring points satisfy: the seed point and the position points of the left neighborhood point and the right neighborhood point under the camera coordinate system are on a straight line, and the seed point and the position points of the upper neighborhood point and the lower neighborhood point under the camera coordinate system are on a straight line; if so, calculating a plane equation of a facet where the spatial position point corresponding to the seed point is located according to the spatial position coordinates of the seed point and the four neighborhood points;

and the clustering unit 204 is configured to cluster the facets according to parameters of the plane equation of the facet of the spatial location point corresponding to each sub-point, so as to obtain the plane equation of each main plane in the depth image.

It should be noted that, the principle of the millisecond level multi-plane real-time extraction device 200 in the complex traffic scene provided by the embodiment of the present application to solve the technical problem is similar to the millisecond level multi-plane real-time extraction method in the complex traffic scene provided by the embodiment of the present application, so that the implementation of the millisecond level multi-plane real-time extraction device 200 in the complex traffic scene provided by the embodiment of the present application can refer to the implementation of the millisecond level multi-plane real-time extraction method in the complex traffic scene provided by the embodiment of the present application, and the repetition is omitted.

Based on the foregoing embodiments, the embodiment of the present application further provides an electronic device, as shown in fig. 5, where the electronic device 300 provided in the embodiment of the present application at least includes: the system comprises a processor 301, a memory 302 and a computer program stored in the memory 302 and capable of running on the processor 301, wherein the processor 301 executes the computer program to realize the millisecond level multi-plane real-time extraction method in the complex traffic scene provided by the embodiment of the application.

The electronic device 300 provided by embodiments of the present application may also include a bus 303 that connects the different components, including the processor 301 and the memory 302. Bus 303 represents one or more of several types of bus structures, including a memory bus, a peripheral bus, a local bus, and so forth.

The Memory 302 may include readable media in the form of volatile Memory, such as random access Memory (Random Access Memory, RAM) 3021 and/or cache Memory 3022, and may further include Read Only Memory (ROM) 3023.

The memory 302 may also include a program tool 3025 having a set (at least one) of program modules 3024, the program modules 3024 including, but not limited to: an operating subsystem, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The electronic device 300 may also communicate with one or more external devices 304 (e.g., keyboard, remote control, etc.), one or more devices that enable a user to interact with the electronic device 300 (e.g., cell phone, computer, etc.), and/or any device that enables the electronic device 300 to communicate with one or more other electronic devices 300 (e.g., router, modem, etc.). Such communication may occur through an Input/Output (I/0) interface 305. Also, electronic device 300 may communicate with one or more networks such as a local area network (Local Area Network, LAN), a wide area network (Wide Area Network, WAN), and/or a public network such as the internet via network adapter 306. As shown in fig. 5, the network adapter 306 communicates with other modules of the electronic device 300 over the bus 303. It should be appreciated that although not shown in fig. 5, other hardware and/or software modules may be used in connection with electronic device 300, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, disk array (Redundant Arrays of Independent Disks, RAID) subsystems, tape drives, data backup storage subsystems, and the like.

It should be noted that the electronic device 300 shown in fig. 5 is only an example, and should not be construed as limiting the function and the application scope of the embodiment of the present application.

The embodiment of the application also provides a computer readable storage medium, which stores computer instructions that, when executed by a processor, implement the method provided by the embodiment of the application. Specifically, the executable program may be built-in or installed in the electronic device 300, so that the electronic device 300 may implement the millisecond level multi-plane real-time extraction method in the complex traffic scene provided by the embodiment of the present application by executing the built-in or installed executable program.

The method provided by the embodiment of the present application may also be implemented as a program product including program code for causing the electronic device 300 to execute the millisecond level multi-plane real-time extraction method in the complex traffic scenario provided by the embodiment of the present application when the program product is executable on the electronic device 300.

The program product provided by the embodiments of the present application may employ any combination of one or more readable media, where the readable media may be a readable signal medium or a readable storage medium, and the readable storage medium may be, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof, and more specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a RAM, a ROM, an erasable programmable read-Only Memory (Erasable Programmable Read Only Memory, EPROM), an optical fiber, a portable compact disk read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The program product provided by embodiments of the present application may be implemented as a CD-ROM and include program code that may also be run on a computing device. However, the program product provided by the embodiments of the present application is not limited thereto, and in the embodiments of the present application, the readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

It should be noted that although several units or sub-units of the apparatus are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functions of two or more of the elements described above may be embodied in one element in accordance with embodiments of the present application. Conversely, the features and functions of one unit described above may be further divided into a plurality of units to be embodied.

Furthermore, although the operations of the methods of the present application are depicted in the drawings in a particular order, this is not required to either imply that the operations must be performed in that particular order or that all of the illustrated operations be performed to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform.

Finally, it should be noted that the above embodiments are only for illustrating the technical solution of the present application and are not limiting. Although the present application has been described in detail with reference to the embodiments, it should be understood by those skilled in the art that modifications and equivalents may be made thereto without departing from the spirit and scope of the present application, which is intended to be covered by the appended claims.

Claims

1. The millisecond level multi-plane real-time extraction method in the complex traffic scene is characterized by comprising the following steps of:

acquiring a depth image acquired by a depth camera;

2. The millisecond level multi-plane real-time extraction method in a complex traffic scene according to claim 1, wherein for each pixel point of the depth image, searching a pixel point whose depth values in up, down, left and right directions are not null in a preset search range, as a neighborhood point, and if all four directions can search the neighborhood point, the pixel point is a seed point, comprising:

searching the pixel point in the minimum search distance search_min and the maximum search distance search_max along the downward direction by taking the pixel point p as a center point, and searching the pixel point p with the first depth value not being empty ₄ Pixel point p ₄ As a centerNeighborhood points of the point in the downward direction;

3. The millisecond level multi-plane real-time extraction method in a complex traffic scene according to claim 2, wherein for each seed point, judging whether the seed point and four neighborhood points satisfy: the method for determining the position points of the seed point and the left and right neighborhood points in the camera coordinate system is on a straight line, and the position points of the seed point and the upper and lower neighborhood points in the camera coordinate system are on a straight line, and comprises the following steps:

horizaontal＝left*left_depth*(d ₀ -right_depth)+right*right_depth*(d ₀ -left_depth)

vertical＝up*up_depth*(d ₀ -down_depth)+down*down_depth*(d ₀ -up_depth)

if vertical<Alpha, seed point p ₀ And a neighborhood point p in the upper and lower directions ₂ And p ₄ The position points under the camera coordinate system are on a straight line.

4. The millisecond level multi-plane real-time extraction method in the complex traffic scene according to claim 3, wherein a plane equation of a facet where a spatial position point corresponding to a seed point is located is calculated according to the spatial position coordinates of the seed point and four neighborhood points; comprising the following steps:

pixel point p ₀ ,p ₁ ,p ₂ ,p ₃ ,p ₄ Converting the position point coordinate under the camera coordinate system into the space position point coordinate of the geocentric rectangular coordinate system, and obtaining the pixel point p ₀ ,p ₁ ,p ₂ ,p ₃ ,p ₄ The corresponding spatial position point is

Wherein A, B and C are normal vectorsThree elements of (2);

5. The millisecond level multi-plane real-time extraction method in the complex traffic scene according to claim 4, wherein the facets are clustered according to parameters of plane equations of facets of spatial position points corresponding to various sub-points to obtain plane equations of main planes in the depth image; comprising the following steps:

splicing three elements of the normal vector of the facet of each point, taking the spliced value as an abscissa and the depth value as an ordinate to obtain a coordinate plane, and drawing all the points on the coordinate plane;

Calculating the normal vector (A) ₀ ,B ₀ ,C ₀ ) And depth value d ₀ ：

6. The millisecond level multi-plane real-time extraction device in the complex traffic scene is characterized by comprising:

7. An electronic device, comprising: memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any one of claims 1-5 when the computer program is executed.

8. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the method of any one of claims 1-5.