CN117765503A

CN117765503A - 2D vision and 3D point cloud post-fusion sensing method, equipment and mobile device

Info

Publication number: CN117765503A
Application number: CN202211139894.3A
Authority: CN
Inventors: 李慧慧
Original assignee: Beijing Idriverplus Technologies Co Ltd
Current assignee: Beijing Idriverplus Technologies Co Ltd
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2024-03-26

Abstract

The embodiment of the invention provides a 2D vision and 3D point cloud post-fusion sensing method, equipment and a mobile device. The method comprises the following steps: 2D visual segmentation is carried out on the picture; performing first clustering on the 3D point cloud to obtain clustered 3D obstacle point coordinates; converting the 3D obstacle point coordinates into 3D obstacle two-dimensional coordinates, namely projecting the 3D obstacle two-dimensional coordinates to a 2D multi-path division picture to obtain a fusion result after 2D vision and 3D point cloud, carrying out edge precision optimization and second clustering on the 3D obstacle two-dimensional coordinates through the multi-classification division picture, carrying out speed correction of pixel positions on the 3D obstacle two-dimensional coordinates through the dynamic/static classification picture, and determining 3D ground points belonging to lane lines in the 3D point cloud through the lane line division picture. The fusion sensing result is more accurate, the consumption of resources, the abnormal time consumption caused by the preemption of the resources and the like are reduced, more effective information in an automatic driving scene can be extracted, and the fusion sensing effect is integrally improved.

Description

2D vision and 3D point cloud post-fusion sensing method, equipment and mobile device

Technical Field

The invention relates to the field of automatic driving, in particular to a 2D vision and 3D point cloud post-fusion sensing method, equipment and a mobile device.

Background

The automatic driving automobile collects the surrounding environment of the automobile in real time through the vehicle-mounted sensor, and comprises movable road targets such as vehicles, pedestrians, non-motor vehicle riders and the like, and static obstacles such as traffic cones, water horses, triangle cards and the like, and also comprises important irregular areas such as road surfaces, lane lines and the like. Various important effective information can be accurately, comprehensively identified in real time, and the decision planning module of the automatic driving system is helpful for predicting the driving route and the driving behavior, so that the motion state of the automatic driving automobile is better controlled.

The environment sensing system of the automatic driving automobile depends on various sensors, each of which can realize specific functions according to hardware characteristics, and various sensor combination schemes are adopted in practical application. The multi-line laser radar and the camera are two main sensors in the perception system of the automatic driving automobile at present: the multi-line laser radar can provide accurate three-dimensional positioning information, but cannot meet the diversified requirements of a sensing system due to the sparse characteristic of the multi-line laser radar and the lack of color textures of scanned objects; the camera can acquire abundant semantic information such as colors, textures and the like, but cannot determine the position of an environmental object. The characteristics of the multi-line laser radar and the camera complement each other, and the multi-line laser radar is an important combination configuration scheme of the sensor.

The multi-line laser radar and camera sensing fusion scheme in the automatic driving system mainly comprises two schemes. Firstly, projecting a 3D point cloud into a 2D image based on internal and external parameters, extracting channel characteristics simultaneously containing 2D and 3D information, and then training a model as input of a deep learning network, wherein the output of the model is a final perception result, which is called pre-fusion; the second is that the 2D and 3D perceptions of the environment are processed separately, and then the results of the respective processes are fused together according to a certain strategy, called post-fusion.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the related art:

pre-fusion: the accurate alignment of the picture content and the picture content after the 2D point cloud projection is required, so that the high time synchronization and the space synchronization between the sensors are ensured, the accurate calibration of the internal and external parameters of the laser radar and the camera is also required, and otherwise, the application effect of the trained model can not be expected far. However, errors exist in the space-time synchronization of the laser radar and the camera and the calibration of internal and external parameters, so that the requirements are difficult to meet.

Post fusion: in the face of complex scenes or corner cases, the traffic cone, the water Ma Dui and other special-shaped obstacles are easy to fail, and more false detection and missing detection situations can occur; for 3D, the phenomena of over-segmentation and under-segmentation are more, and the recognition effect on the category of the target special-shaped obstacle is poor; meanwhile, the existing perception scheme does not pay attention to detection of irregular areas such as pavement, road edges and the like, and the scene obstacle detection/segmentation network and the lane line detection/segmentation network are divided into independent modules, so that resources are preempted, consumed time is abnormal, and even the scene obstacle detection/segmentation network and the lane line detection/segmentation network cannot operate due to insufficient resource allocation.

Disclosure of Invention

The method aims at least to solve the problems that in the prior art, the effect is poor, the resources are preempted and the time consumption is abnormal for complex scenes or corner cases and irregular areas easily caused by post fusion. In a first aspect, an embodiment of the present invention provides a 2D vision and 3D point cloud post-fusion sensing method, which is applied to a mobile device equipped with a camera and a lidar, and includes:

2D visual segmentation is carried out on the picture acquired by the camera to obtain a 2D multi-path segmented picture, wherein the multi-path segmented picture at least comprises: the system comprises a multi-classification segmentation picture for obstacles in a scene, a dynamic/static classification picture for distinguishing the obstacles and a lane line segmentation picture;

performing first clustering on the 3D point cloud acquired by the laser radar to obtain clustered 3D obstacle point coordinates;

converting the 3D obstacle point coordinates into 3D obstacle two-dimensional coordinates, projecting the 3D obstacle two-dimensional coordinates to the 2D multi-path divided pictures to obtain a fusion result of 2D vision and 3D point cloud so as to at least

Edge precision optimization and second clustering of the 3D obstacle two-dimensional coordinates by the multi-class segmented picture,

-velocity correction of the pixel position of the 3D obstacle two-dimensional coordinates by the moving/static binary pictures,

-determining 3D ground points belonging to a lane line in the 3D point cloud by the lane line segmentation picture.

In a second aspect, an embodiment of the present invention provides a 2D vision and 3D point cloud post-fusion perception executing device, including:

the image segmentation module is used for carrying out 2D visual segmentation on the image acquired by the camera to obtain a 2D multi-path segmented image, wherein the multi-path segmented image at least comprises: the system comprises a multi-classification segmentation picture for obstacles in a scene, a dynamic/static classification picture for distinguishing the obstacles and a lane line segmentation picture;

the clustering module is used for carrying out first clustering on the 3D point cloud acquired by the laser radar to obtain clustered 3D obstacle point coordinates;

the fusion module is used for converting the 3D obstacle point coordinates into 3D obstacle two-dimensional coordinates, projecting the 3D obstacle two-dimensional coordinates to the 2D multi-path divided pictures to obtain a fusion result after 2D vision and 3D point cloud so as to at least

In a third aspect, there is provided an electronic device, comprising: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the 2D vision and 3D point cloud post fusion awareness method of any of the embodiments of the present invention.

In a fourth aspect, an embodiment of the present invention provides a mobile device, including a body and an electronic apparatus according to any one of the embodiments of the present invention mounted on the body.

In a fifth aspect, an embodiment of the present invention provides a storage medium having a computer program stored thereon, where the program when executed by a processor implements the steps of the 2D vision and 3D point cloud post-fusion awareness method of any embodiment of the present invention.

In a sixth aspect, embodiments of the present invention further provide a computer program product, which when run on a computer, causes the computer to perform the 2D vision and 3D point cloud post fusion awareness method according to any one of the embodiments of the present invention.

The embodiment of the invention has the beneficial effects that: the error caused by space-time synchronization and external parameter calibration which are difficult to be aligned among the multiple sensors can be adaptively adjusted, the channel characteristics without error are input into the deep learning network, and the output result is more accurate; and a plurality of network models are combined into a multi-path single network model, so that the consumption of resources, the abnormal time consumption caused by the preemption of the resources and the like are reduced. Compared with the scheme of fusion behind most perception lidars and cameras, the method can extract almost all effective information in the automatic driving scene, such as road surfaces, road edges, special-shaped barriers, road users, lane lines and the like, can acquire all effective information required by the automatic driving scene more comprehensively and accurately, and improves the fusion perception effect.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a 2D vision and 3D point cloud post-fusion perception method according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an overall structure of a 2D vision and 3D point cloud post-fusion sensing method according to an embodiment of the present invention;

FIG. 3 is a multi-path segmentation network construction structure diagram of a 2D vision and 3D point cloud post-fusion perception method according to an embodiment of the present invention;

fig. 4 is a schematic diagram of secondary clustering and speed restoration of a 3D point cloud of a 2D vision and 3D point cloud post-fusion perception method according to an embodiment of the present invention;

fig. 5 is a schematic diagram of 3D lane line point cloud extraction of a 2D vision and 3D point cloud post-fusion perception method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a 2D vision and 3D point cloud post-fusion sensing execution device according to an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of an embodiment of an electronic device with 2D vision and 3D point cloud post-fusion perception according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Those skilled in the art will appreciate that embodiments of the present application may be implemented as a system, apparatus, device, method, or computer program product. Accordingly, the present disclosure may be embodied in the following forms, namely: complete hardware, complete software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.

For ease of understanding, the technical terms referred to in this application are explained as follows:

the term "mobile device" as used herein includes, but is not limited to, six classes of automated driving technology vehicles, such as those specified by the International society of automaton (Society of Automotive Engineers International, SAE International) or the national Standard for automotive Automation Classification, L0-L5.

In some embodiments, the mobile device may be a vehicle device or a robotic device having various functions:

(1) Manned functions such as home cars, buses, etc.;

(2) Cargo functions such as common trucks, van type trucks, swing trailers, closed trucks, tank trucks, flatbed trucks, container trucks, dump trucks, special structure trucks, and the like;

(3) Tool functions such as logistics distribution vehicles, automatic guided vehicles AGVs, patrol vehicles, cranes, excavators, bulldozers, shovels, road rollers, loaders, off-road engineering vehicles, armored engineering vehicles, sewage treatment vehicles, sanitation vehicles, dust collection vehicles, floor cleaning vehicles, watering vehicles, floor sweeping robots, meal delivery robots, shopping guide robots, mowers, golf carts, and the like;

(4) Entertainment functions such as recreational vehicles, casino autopilots, balance cars, etc.;

(5) Special rescue functions such as fire trucks, ambulances, electric power emergency vehicles, engineering emergency vehicles and the like.

Fig. 1 is a flowchart of a 2D vision and 3D point cloud post-fusion sensing method according to an embodiment of the present invention, including the following steps:

s11: 2D visual segmentation is carried out on the picture acquired by the camera to obtain a 2D multi-path segmented picture, wherein the multi-path segmented picture at least comprises: the system comprises a multi-classification segmentation picture for obstacles in a scene, a dynamic/static classification picture for distinguishing the obstacles and a lane line segmentation picture;

S12: performing first clustering on the 3D point cloud acquired by the laser radar to obtain clustered 3D obstacle point coordinates;

s13: converting the 3D obstacle point coordinates into 3D obstacle two-dimensional coordinates, projecting the 3D obstacle two-dimensional coordinates to the 2D multi-path divided pictures to obtain a fusion result of 2D vision and 3D point cloud so as to at least

In the present embodiment, the present method can be applied to various types of mobile devices, for example, an automated guided vehicle, an automated floor cleaning vehicle, or a large road sweeper, each of which is equipped with a camera and a laser radar. Any automatic driving system with lidar and a camera as main sensing sensors may be applied to the present method. Assuming that the 3D point cloud obstacle clustering and tracking results and the 3D ground points are already extracted, a specific scheme is shown in fig. 2.

For step S11, taking an autopilot as an example, in the driving process, the laser radar and the camera carried by the autopilot collect information on the road in real time, so that time aligned pictures and 3D point cloud data can be obtained. Within the picture are various road conditions and obstacles present during travel, including, for example, pavement, road edges, fences, walls, road barriers, traffic signs, road signs, vehicles, pedestrians, riders, etc.

The 2D visual segmentation is carried out on the pictures acquired by the camera in real time, wherein the pictures are mainly multi-classification segmentation pictures used for obstacles in scenes, dynamic/static classification pictures used for distinguishing the obstacles and lane line segmentation pictures (the lane lines and the road surfaces are segmented), and the pictures can cover various real conditions in the road. In the road driving environment, vehicles coming from the beginning and pedestrians on the road are also seen, and the signs, the roadblocks and the like on the road are elements to be focused by fusion perception.

As an embodiment, the 2D visual segmentation of the image acquired by the camera includes: and 2D visual segmentation is carried out on the picture acquired by the camera by using a 2D visual segmentation model, wherein the 2D visual segmentation model is used for multi-classification scene segmentation, movable/stationary two-classification and lane line segmentation.

The 2D visual segmentation model is built by a deep learning neural network and comprises: the training device comprises an input layer, a feature extraction layer, a branch specific sub-module layer, an output layer and a training head;

the input layer is used for receiving pictures acquired by the camera;

the feature extraction layer is used for extracting features related to roads in the pictures;

The branch-specific sub-module layer includes: the scene branch network special sub-module layer and the lane line segmentation network special sub-module layer are used for carrying out corresponding segmentation operation on the characteristics according to different sub-module layers;

the output layer includes: the multi-classification segmentation output, the two-classification segmentation output and the lane line segmentation output are used for outputting segmentation results with different functions;

the training head includes: the multi-classification segmentation training head, the two-classification segmentation training head and the lane line segmentation training head are used for corresponding to different segmentation training tasks.

In this embodiment, a 2D visual segmentation model of a multi-path segmentation network may be built by using a deep learning neural network technique, and specifically, the segmentation of the method includes: scene segmentation multi-classification branches, movable/stationary classification branches, and lane line segmentation branches. The network structure comprises: the training device comprises an input layer, a feature extraction layer, a branch specific submodule layer, an output layer and a training head. Wherein the branch-specific sub-module layer comprises: the scene branch network special sub-module layer and the lane line segmentation network special sub-module layer have different characteristics, such as obstacle colors (colors of trees usually comprise brown and green), textures (e.g. smooth textures of a signboard), context association and the like (same obstacle of adjacent continuous frames), and after the whole picture is extracted, the corresponding operation is needed to be carried out on the features according to different branch functions: if different channel feature fusion modes, different sampling modes and the like are adopted, specifically, a scene branch network special sub-module layer can be used for carrying out targeted processing on the signboard. The output layer includes: the multi-class division output, the two-class division output, and the lane line division output correspond to different division functions. The training head includes: the multi-classification segmentation training head, the two-classification segmentation training head and the lane line segmentation training head correspond to different training tasks. A specific network structure is shown in fig. 3.

As an embodiment, the training mode of the 2D visual segmentation model includes: and carrying out different types of segmentation labeling on the 2D training picture data set to obtain a training set and a verification set, wherein the different types comprise: visual multi-classification segmentation category and lane line category of the obstacle;

inputting the training set after clipping and scaling pretreatment into the 2D visual segmentation model for training, wherein a deep learning framework of the 2D visual segmentation model comprises: pytorch, caffe, tensorflow.

In the present embodiment, different types of segmentation markers are performed on a 2D training picture data set prepared in advance. Segmentation labeling can be performed by professionals, and a training set and a verification set are obtained. The specific categories include a visual multi-classification segmentation category and a lane line category, wherein the visual multi-classification segmentation category comprises a pavement, a road edge, a fence, a wall, a roadblock, a traffic sign, a road warning board, a vehicle, a pedestrian, a rider and the like. The lane line categories comprise left, right and the like; the multi-classification segmentation class and the lane line class are not particularly limited.

The training set comprises input pictures and corresponding reference segmentation results, and for a 2D training picture data set, pretreatment such as cutting, scaling and the like can be performed on the 2D training picture data set before training.

And inputting the 640 x 320-size pictures of the training set into the 2D visual segmentation model to obtain a prediction segmentation result comprising a scene multi-classification segmentation result picture, a movable/static two-classification segmentation result picture and a lane line segmentation result picture. And training the 2D visual segmentation model by utilizing errors of the reference segmentation result and the prediction segmentation result. And testing the trained 2D visual segmentation model by using the verification set, so that the predicted segmentation result of the 2D visual segmentation model approaches to the reference segmentation result, and the training of the 2D visual segmentation model is completed. Training may take the form of a mainstream deep learning framework pytorch, caffe, tensorflow, etc., for example, the present method may take the form of a pytorch framework.

As an embodiment, after the obtaining the 2D multi-divided picture, the method further includes:

and carrying out post-processing of pixel filling and picture size adjustment on the 2D multi-path divided pictures.

In the present embodiment, the post-processing of the pictures of the division results: since the original picture is 1280×720, the result picture 640×320 is scaled to 1280×640 by upsampling, and then the top of the picture is filled with black pixels, so that the three output result pictures (i.e., the scene multi-classification segmentation result picture, the movable/stationary two-classification segmentation result picture and the lane line segmentation result picture) are all restored to the original picture size. It should be noted that the post-processing step of the picture is not necessary, and the method tests a plurality of methods of bilinear interpolation, nearest neighbor interpolation and custom interpolation up-sampling, and determines whether to perform the step according to actual requirements, if abnormal interpolation pixel points of the edge of the segmentation target can occur or saw teeth exist.

For step S12, the first clustering of the obstacles is performed on the 3D point cloud acquired by the laser radar while the picture is processed, so as to obtain the coordinates of the preliminarily clustered 3D obstacle points.

In step S13, since the 3D obstacle point coordinates are three-dimensional and cannot be directly converted into the 2D multi-path divided picture, it is necessary to perform two-dimensional conversion on the 3D obstacle point coordinates.

As an implementation manner, the converting the 3D obstacle point coordinates into 3D obstacle two-dimensional coordinates, and projecting the 3D obstacle point coordinates to the 2D multi-path divided picture includes:

performing 2D picture coordinate conversion on the 3D obstacle point coordinates by using the laser radar and the calibration outer part of the camera and participating in the internal parameters of the camera to obtain 3D obstacle two-dimensional coordinates;

specifically, the step of turning point cloud coordinates to image pixel coordinates:

1. converting the point cloud coordinates to camera coordinates (using the laser and camera's external parameters), which are also 3D, which are 2D after conversion to pixel coordinates;

2. the camera coordinates are rotated to image pixel coordinates (using the camera's own internal parameters).

Through the two steps, the point cloud coordinates are converted to pixel coordinates.

In the present embodiment, since the mobile device is mounted with the laser radar and the camera, the internal parameters of the laser radar and the camera, which are involved in the calibration of the camera, are recorded when the laser radar and the camera are mounted. Specifically, according to the external reference rotation matrix R and the translation matrix T from the point cloud to the camera, the 3D obstacle point coordinates of the point cloud are converted into a camera coordinate system, and then according to the internal reference projection matrix K and the image distortion matrix D of the camera, the 3D obstacle point coordinates of the point cloud are converted into an image coordinate system from the camera coordinate system, so that the positions of the 3D obstacle point coordinates of the point cloud in the image are obtained.

The specific calculation formula of the coordinate conversion from the 3D obstacle point coordinates of the point cloud to the coordinates of the camera is as follows:

the laser radar coordinates are x1, y1 and z1, and the camera coordinate system is x2, y2 and z2. Obtaining an image plane from camera coordinates: [ x2', y2' ], assuming x2 '=x2/Z2, y2' =y2/Z2, then add distortion, distortion D comprises 5 parameters: k1 K2, k3, p1, p2, wherein k1, k2, k3 bit radial distortion, p1, p2 are tangential distortion, and the added distortion calculation formula is as follows:

u′＝x2′*(1+k1*r ² +k2*r ⁴ +k3*r ⁶ )

v′＝y2′*(1+k1*r ² +k2*r ⁴ +k3*r ⁶ )

u″＝2*p1*x2′*y2′+p2*(r ² +2*x2′ ² )

v″＝p1*(r ² +2*y2′ ² )+2*p2*x2′*y2′

wherein: r is (r) ² ＝x2′ ² +y2′ ² ，r ⁴ ＝r ² *r ² ，r ⁶ ＝r ⁴ *r ² 。

Finally, pixel values (pixel coordinates) u, v are calculated based on the internal reference matrix K:

after the 3D obstacle is projected to the 2D multi-path divided pictures, corresponding post-fusion sensing is carried out by utilizing the multi-path output result (multi-classification divided pictures for the obstacle in the scene, dynamic/static classification pictures for distinguishing the obstacle and lane line divided pictures) of the 2D visual divided model.

As an embodiment, the edge accuracy optimization of the 3D obstacle two-dimensional coordinates by the multi-classification segmentation picture and the second classification includes:

and optimizing the obstacle type and the obstacle segmentation result which are judged by the two-dimensional coordinates of the 3D obstacle based on the two-dimensional coordinates of the obstacle in the multi-classification segmentation picture, and optimizing the attribution result of the edge point of the two-dimensional coordinates of the 3D obstacle by utilizing the distance position characteristic of the 3D obstacle point coordinate so as to perform the second clustering.

In the present embodiment, the 3D obstacle two-dimensional coordinates obtained from the first clustering of the 3D point cloud coordinates are clustered a second time. Before secondary clustering, the coordinates of the 3D obstacle points of the first cluster are converted into 2D multi-path divided pictures by using the calibration external parameters of the laser radar and the camera and the internal parameters of the camera, so as to obtain the two-dimensional coordinates of the 3D obstacle. After that, the visual area and the category information to which the point cloud belongs are determined from the position coordinates of each pixel in the two-dimensional coordinates of the 3D obstacle and the position coordinates of each pixel of the obstacle determined by the clustering in the multi-classification segmented picture for the obstacle in the scene determined in step S11. (e.g., judging whether the visual areas are the same, after some 3D barriers are projected, the coordinates may be projected to other areas due to problems such as viewing angles, and the same visual area can be judged by comparing the position coordinates of the pixels, and the same identification plate with the category of visual multi-category segmentation can be judged by pixel description).

Based on the method, the segmentation edge precision of the two-dimensional coordinates of the 3D obstacle is optimized based on the visual segmentation result (namely, the two-dimensional coordinates of each obstacle in the multi-classification segmentation picture), so that the problems of 3D point cloud obstacle category, 3D point cloud obstacle under-segmentation and over-segmentation can be avoided; secondly, due to possible errors in the alignment of a 2D picture (a multi-path division picture for 2D visual division of the 2D picture) and a 3D projection picture (a picture after two-dimensional projection of a 3D obstacle), ambiguity exists in the category of the edge point of the obstacle, and a final 3D point cloud secondary clustering result is obtained through the above two steps according to the 3D point cloud self clustering result and the distance position characteristic of the point cloud (accurate 3D position information can be provided in 3D point cloud data, and the spatial relationship of the obstacle in the picture can be reflected). In the method, in step S12, the result of the self clustering of the 3D point cloud is known, all targets are recalled by the obstacle clustering preferentially, the accuracy is lost, and the accuracy is optimized by the vision segmentation result, so that the point cloud secondary clustering is realized.

As one embodiment, the performing, by the moving/static classification picture, the velocity correction of the pixel position on the two-dimensional coordinates of the 3D obstacle includes:

and setting the speed of the two-dimensional coordinates of the static 3D obstacle, which is endowed with a non-0 speed, in the two-dimensional coordinates of the 3D obstacle as 0 based on the dynamic/static classification picture.

In this embodiment, before the speed correction, the coordinates of the 3D obstacle points of the first cluster are converted into 2D multi-path divided pictures by using the calibration external parameters of the laser radar and the camera and the internal parameters of the camera, so as to obtain the two-dimensional coordinates of the 3D obstacle. After that, the visual area and the movable/stationary category information to which the projected 3D obstacle two-dimensional coordinates belong in the multi-division picture are determined from the pixel positions after the projection of the 3D obstacle two-dimensional coordinates of the point cloud and the pixel positions given by the 2D image division. Firstly, defining the speed of a two-dimensional coordinate point of a 3D barrier as the speed of the barrier where the point is located, taking a dynamic/static two-classification picture for distinguishing the barrier as the reference, and setting the speed of a static 3D point corresponding to the non-0 speed as 0 (in an actual scene, a plurality of special-shaped barriers are wrongly provided with the speed, so that fusion cannot be processed); second, the speed of a 3D obstacle is the speed of most points inside it, since there may be errors in the alignment of the 2D picture and the 3D projected picture. The specific process of the secondary clustering and the speed repairing of the 3D point cloud (the two-dimensional coordinates of the 3D obstacle) is shown in fig. 4.

And extracting the 3D lane line point cloud, and converting the ground point coordinates in the 3D point cloud into 2D visual lane line segmentation result pictures by using the calibration external parameters of the laser radar and the camera internal parameters. And determining 3D ground points belonging to the lane lines and the lane line types corresponding to the points according to the pixel positions after the point cloud projection and the pixel positions of the lane lines given by the image segmentation. The 3D lane line point cloud extraction is shown in fig. 5.

And carrying out secondary clustering and speed correction on the two-dimensional coordinates of the 3D obstacle in the 3D point cloud and extracting the 3D lane line points to obtain a perception output result after the fusion of the multiple sensors.

According to the method, the problem that the problem of abnormal pixels or saw teeth of edges exists in the barrier sampled on the picture, and the problem can affect the fusion effect of the 2D vision and the 3D point cloud is solved, the post-processing process in the 2D vision segmentation model is not adopted, the 3D point is projected to the picture, and the corresponding relation of the pixel level is generated with the 2D segmentation result after the same cutting and scaling as the 2D training picture. The error caused by space-time synchronization and external parameter calibration which are difficult to be aligned among multiple sensors can be adaptively adjusted, the RGB channel characteristics without error are input into the deep learning network, and the output result is more accurate; and a plurality of network models are combined into a multi-path single network model, so that the consumption of resources, the abnormal time consumption caused by the preemption of the resources and the like are reduced. Compared with the scheme of fusion behind most perception lidars and cameras, the method can extract almost all effective information in the automatic driving scene, such as road surfaces, road edges, special-shaped barriers, road users, lane lines and the like, can acquire all effective information required by the automatic driving scene more comprehensively and accurately, and improves the fusion perception effect.

Fig. 6 is a schematic structural diagram of a 2D vision and 3D point cloud post-fusion sensing execution device according to an embodiment of the present invention, where the execution device may execute the 2D vision and 3D point cloud post-fusion sensing method according to any of the foregoing embodiments and is configured in a terminal.

The 2D vision and 3D point cloud post-fusion perception executing device 10 provided in this embodiment includes: the system comprises a picture segmentation module 11, a clustering module 12 and a fusion module 13.

The image segmentation module 11 is configured to perform 2D visual segmentation on an image acquired by the camera to obtain a 2D multi-path segmented image, where the multi-path segmented image at least includes: the system comprises a multi-classification segmentation picture for obstacles in a scene, a dynamic/static classification picture for distinguishing the obstacles and a lane line segmentation picture; the clustering module 12 is used for performing first clustering on the 3D point cloud acquired by the laser radar to obtain clustered 3D obstacle point coordinates; the fusion module 13 is configured to convert the 3D obstacle point coordinates into 3D obstacle two-dimensional coordinates, project the 3D obstacle two-dimensional coordinates to the 2D multi-path segmented image, obtain a fusion result after 2D vision and 3D point cloud, perform edge precision optimization and second classification on the 3D obstacle two-dimensional coordinates at least through the multi-classification segmented image, perform velocity correction on pixel positions of the 3D obstacle two-dimensional coordinates through the dynamic/static classification image, and determine 3D ground points belonging to lane lines in the 3D point cloud through the lane line segmented image.

The embodiment of the invention also provides a nonvolatile computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions can execute the 2D vision and 3D point cloud post-fusion perception method in any method embodiment;

as one embodiment, the non-volatile computer storage medium of the present invention stores computer-executable instructions configured to:

As a non-volatile computer readable storage medium, it may be used to store a non-volatile software program, a non-volatile computer executable program, and modules, such as program instructions/modules corresponding to the methods in the embodiments of the present invention. One or more program instructions are stored in a non-transitory computer readable storage medium that, when executed by a processor, perform the 2D vision and 3D point cloud post fusion awareness method of any of the method embodiments described above.

The embodiment of the invention also provides electronic equipment, which comprises: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a 2D vision and 3D point cloud post fusion awareness method.

In some embodiments, the present disclosure further provides a mobile device, including a body and the electronic apparatus according to any one of the foregoing embodiments mounted on the body. The mobile device may be an unmanned vehicle, such as an unmanned sweeper, an unmanned ground washing vehicle, an unmanned logistics vehicle, an unmanned passenger vehicle, an unmanned sanitation vehicle, an unmanned trolley/bus, a truck, a mine car, etc., or may be a robot, etc.

In some embodiments, the present invention further provides a computer program product, which when run on a computer, causes the computer to perform the 2D vision and 3D point cloud post fusion awareness method according to any one of the embodiments of the present invention.

Fig. 7 is a schematic hardware structure diagram of an electronic device of a 2D vision and 3D point cloud post-fusion perception method according to another embodiment of the present application, and as shown in fig. 7, the device includes:

one or more processors 710, and a memory 720, one processor 710 being illustrated in fig. 7. The device of the 2D vision and 3D point cloud post-fusion perception method may further include: an input device 730 and an output device 740.

Processor 710, memory 720, input device 730, and output device 740 may be connected by a bus or other means, for example in fig. 7.

The memory 720 is used as a non-volatile computer readable storage medium, and can be used to store non-volatile software programs, non-volatile computer executable programs, and modules, such as program instructions/modules corresponding to the 2D vision and 3D point cloud post-fusion awareness method in the embodiments of the present application. The processor 710 executes various functional applications and data processing of the server by running non-volatile software programs, instructions and modules stored in the memory 720, i.e. implementing the above-described method embodiment 2D vision and 3D point cloud post fusion awareness method.

Memory 720 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data, etc. In addition, memory 720 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some embodiments, memory 720 may optionally include memory located remotely from processor 710, which may be connected to the mobile device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 730 may receive input numerical or character information. The output device 740 may include a display device such as a display screen.

The one or more modules are stored in the memory 720 that, when executed by the one or more processors 710, perform the 2D vision and 3D point cloud post fusion awareness method of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, and has the corresponding functional modules and beneficial effects of the execution method. Technical details not described in detail in this embodiment may be found in the methods provided in the embodiments of the present application.

The non-transitory computer readable storage medium may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the device, etc. Further, the non-volatile computer-readable storage medium may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the non-transitory computer readable storage medium may optionally include memory remotely located relative to the processor, which may be connected to the apparatus via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The embodiment of the invention also provides electronic equipment, which comprises: the system comprises at least one processor and a memory communicatively connected with the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the 2D vision and 3D point cloud post fusion awareness method of any of the embodiments of the present invention.

The electronic device of the embodiments of the present application exist in a variety of forms including, but not limited to:

(1) Mobile communication devices, which are characterized by mobile communication functionality and are aimed at providing voice, data communication. Such terminals include smart phones, multimedia phones, functional phones, low-end phones, and the like.

(2) Ultra mobile personal computer equipment, which belongs to the category of personal computers, has the functions of calculation and processing and generally has the characteristic of mobile internet surfing. Such terminals include PDA, MID, and UMPC devices, etc., such as tablet computers.

(3) Portable entertainment devices such devices can display and play multimedia content. The device comprises an audio player, a video player, a palm game machine, an electronic book, an intelligent toy and a portable vehicle navigation device.

(4) Other mobile devices having data processing capabilities.

In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," comprising, "or" includes not only those elements but also other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A2D vision and 3D point cloud post-fusion perception method is applied to a mobile device loaded with a camera and a laser radar, and comprises the following steps:

2. The method of claim 1, wherein the edge accuracy optimization of the 3D obstacle two-dimensional coordinates by the multi-class segmented picture and the second clustering comprises:

3. The method of claim 1, wherein the velocity correction of the pixel position of the 3D obstacle two-dimensional coordinates by the moving/static classification picture comprises:

4. The method of claim 1, wherein converting the 3D obstacle point coordinates to 3D obstacle two-dimensional coordinates, projecting the 2D multi-divided picture comprises:

and performing 2D picture coordinate conversion on the 3D obstacle point coordinates by using the laser radar and the calibration external of the camera and the internal parameters of the camera to obtain 3D obstacle two-dimensional coordinates.

5. The method of claim 1, wherein the 2D visual segmentation of the pictures acquired by the camera comprises: and 2D visual segmentation is carried out on the picture acquired by the camera by using a 2D visual segmentation model, wherein the 2D visual segmentation model is used for multi-classification scene segmentation, movable/stationary two-classification and lane line segmentation.

6. The method of claim 5, wherein the 2D visual segmentation model is built from a deep learning neural network, comprising: the training device comprises an input layer, a feature extraction layer, a branch specific sub-module layer, an output layer and a training head;

the input layer is used for receiving pictures acquired by the camera;

7. The method of claim 6, wherein the training of the 2D visual segmentation model comprises:

and carrying out different types of segmentation labeling on the 2D training picture data set to obtain a training set and a verification set, wherein the different types comprise: visual multi-classification segmentation category and lane line category of the obstacle;

8. The method of claim 1, wherein after the obtaining the 2D multi-divided picture, the method further comprises:

9. A 2D vision and 3D point cloud post-fusion awareness execution device, comprising:

10. An electronic device, comprising: at least one processor, and a memory communicatively coupled to the at least one processor, wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the steps of the method of any one of claims 1-8.

11. A mobile device comprising a body and the electronic apparatus of claim 10 mounted on the body.

12. A storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method according to any of claims 1-8.