CN117935209A

CN117935209A - Obstacle detection method, device, equipment and storage medium

Info

Publication number: CN117935209A
Application number: CN202311746615.4A
Authority: CN
Inventors: 钱承军; 王宇; 陈�光; 郭昌野
Original assignee: Faw Nanjing Technology Development Co ltd; FAW Group Corp
Current assignee: Faw Nanjing Technology Development Co ltd; FAW Group Corp
Priority date: 2023-12-18
Filing date: 2023-12-18
Publication date: 2024-04-26

Abstract

The invention discloses an obstacle detection method, an obstacle detection device, obstacle detection equipment and a storage medium. The method comprises the following steps: obtaining perception data, and preprocessing the perception data to obtain voxel data; inputting voxel data into a target model obtained by pre-training to obtain a first detection result of 3D marking of the obstacle and a semantic segmentation result, wherein foreground points and background points are distinguished in the semantic segmentation result; eliminating the preset perception data in the first detection result, and clustering the perception data in the first detection result after the elimination treatment to obtain a result to be fused; obtaining a result to be processed through fusion processing of the result to be fused and the first detection result; and determining a target result comprising a target obstacle based on a history result generated before the result to be processed, and distinguishing and displaying the target obstacle in the target result. The problem of lower barrier detection accuracy is solved, and barrier detection accuracy is improved.

Description

Obstacle detection method, device, equipment and storage medium

Technical Field

The present invention relates to the field of autopilot technology, and in particular, to a method, apparatus, device, and storage medium for detecting an obstacle.

Background

The intelligent driving or automatic driving is characterized in that the intelligent driving or automatic driving can acquire surrounding environment information including traffic signs, road conditions, weather conditions and traffic flow information through various sensors such as cameras, laser radars or 4D millimeter wave radars around a vehicle body, and further dynamically sense based on the surrounding environment information, for example, ground extraction, obstacle target detection, lane line and road boundary prediction, driving area detection and the like, so as to realize path planning, and further realize self-adaptive cruising, automatic avoidance, automatic lane changing, up-down ramp, automatic parking and the like. Therefore, the validity and the correctness of the perception information need to be ensured so as to ensure the safety of the running of the vehicle.

With the rapid development of automatic driving technology in recent years, obstacle perception is an important point and difficulty of the automatic driving technology, and is a precondition for realizing automatic driving. However, in the technical scheme of related obstacle detection, the obstacle detection is performed based on the traditional convolutional neural network, so that missed detection and false detection are caused. On one hand, false detection can cause false braking, and the riding experience is affected; on the other hand, the missed detection may cause that the automatic driving vehicle cannot recognize the target, so that the detection rate of the detected obstacle target and the classification accuracy of the obstacle target cannot be effectively ensured, and traffic accidents may be caused. In addition, in the real-time driving process, the detection frame can shake and abnormally deflect, abnormal obstacles such as vehicle rollover and the like cannot be identified, especially special weather, such as rainy days and snowy days, a large amount of noise exists in pictures or laser point clouds acquired by the sensor, and therefore obstacle detection accuracy is low.

Disclosure of Invention

The invention provides an obstacle detection method, an obstacle detection device, obstacle detection equipment and a storage medium, so that the obstacle detection accuracy is improved.

According to an aspect of the present invention, there is provided a method of detecting an obstacle, the method comprising:

Obtaining perception data, preprocessing the perception data, and obtaining voxel data corresponding to the perception data, wherein the perception data comprises images or point cloud data;

Inputting the voxel data into a target model obtained by pre-training to obtain a first detection result of 3D marking of the obstacle and a semantic segmentation result, wherein foreground points and background points are distinguished in the semantic segmentation result;

Removing the preset perception data in the first detection result, and clustering the perception data in the removed first detection result to obtain a to-be-fused result;

The result to be processed is obtained through fusion processing of the result to be fused and the first detection result;

And determining a target result comprising a target obstacle based on a history result generated before the result to be processed, and distinguishing and displaying the target obstacle in the target result.

According to another aspect of the present invention, there is provided an obstacle detecting apparatus including:

The voxel data determining module is used for acquiring sensing data and preprocessing the sensing data to obtain voxel data corresponding to the sensing data, wherein the sensing data comprises image or point cloud data;

The model detection module is used for inputting the voxel data into a target model obtained through pre-training to obtain a first detection result of 3D marking of the obstacle and a semantic segmentation result, wherein foreground points and background points are distinguished in the semantic segmentation result;

The clustering module is used for eliminating the preset perception data in the first detection result, and clustering the perception data in the first detection result after the elimination treatment to obtain a result to be fused;

the fusion module is used for obtaining a result to be processed by carrying out fusion processing on the result to be fused and the first detection result;

and the obstacle display module is used for determining a target result comprising a target obstacle based on the historical result generated before the result to be processed and displaying the target obstacle in the target result in a distinguishing way.

According to another aspect of the present invention, there is provided an electronic device including:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the obstacle detection method of any one of the embodiments of the invention.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to execute the obstacle detection method of any one of the embodiments of the present invention.

According to the technical scheme, the sensing data are obtained, and are preprocessed to obtain voxel data corresponding to the sensing data, wherein the sensing data comprise images or point cloud data; inputting the voxel data into a target model obtained by pre-training to obtain a first detection result of 3D marking of the obstacle and a semantic segmentation result, wherein foreground points and background points are distinguished in the semantic segmentation result; eliminating the preset perception data in the first detection result based on the semantic segmentation result, and clustering the perception data in the first detection result after the elimination treatment to obtain a result to be fused; the result to be processed is obtained through fusion processing of the result to be fused and the first detection result; and determining a target result comprising a target obstacle based on a history result generated before the result to be processed, and distinguishing and displaying the target obstacle in the target result. The problem of lower barrier detection accuracy is solved, and barrier detection accuracy is improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an obstacle detection method provided according to an embodiment of the present invention;

FIG. 2 is a flow chart of another obstacle detection method provided in accordance with an embodiment of the present invention;

FIG. 3 is a block diagram of a specific obstacle detection system provided in accordance with an embodiment of the invention;

FIG. 4 is a schematic diagram of obstacle detection based on an obstacle detection system according to an embodiment of the invention;

FIG. 5 is a schematic diagram of obstacle detection based on a target detection module according to an embodiment of the invention;

FIG. 6 is a schematic diagram of specific point cloud data provided according to an embodiment of the present invention;

FIG. 7 is a first detection result including a 3D marker in accordance with an embodiment of the present invention;

FIG. 8 is a particular panorama semantic segmentation result provided according to an embodiment of the present invention;

fig. 9 is a block diagram of a structure of an obstacle detecting apparatus provided according to an embodiment of the present invention;

fig. 10 is a block diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It is noted that the terms "comprises" and "comprising," and any variations thereof, in the description and claims of the present invention and in the foregoing figures, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.

Fig. 1 is a flowchart of an obstacle detection method according to an embodiment of the present invention, where the embodiment is applicable to a scene of obstacle detection based on perceived data, and the obstacle detection method may be performed by an obstacle detection device, where the obstacle detection device may be implemented in a form of hardware and/or software and configured in a processor of an electronic device.

As shown in fig. 1, the obstacle detection method includes the steps of:

S110, obtaining the perception data, and preprocessing the perception data to obtain voxel data corresponding to the perception data.

Wherein the perception data comprises image or point cloud data.

It will be appreciated that for intelligent driving or autopilot, environmental awareness is the first ring, and the evaluation of autopilot systems depends largely on awareness data, including image or point cloud data, for converting environmental information of the environment in which the vehicle is located into digital signals for data analysis and data processing, thereby providing decisions and control for intelligent driving or autopilot.

The voxel data is a short term of a Volume element (Volume Pixel), and is a minimum unit for detecting an obstacle in a three-dimensional space.

Optionally, preprocessing the sensing data to obtain voxel data corresponding to the sensing data may include: three-dimensional voxel grids are preset, and based on the coordinate conversion relation between the sensing equipment and the three-dimensional voxel grids, sensing data are mapped into the three-dimensional voxel grids, so that voxel data corresponding to the sensing data are obtained.

In this embodiment, obtaining the sensing data, and preprocessing the sensing data to obtain voxel data corresponding to the sensing data, including: acquiring point cloud data to be processed or images to be fused in the running process of the vehicle based on at least one sensing device arranged on the running vehicle; processing point cloud data to be processed and images to be fused, which are acquired at the same acquisition moment, based on deployment information of at least one sensing device to obtain fused point cloud data and fused images; and respectively carrying out voxel division on the fused point cloud data and the fused image according to a preset voxel division rule to obtain at least one group of voxel data.

It can be understood that in order to obtain multi-angle sensing data, a plurality of sensing devices can be arranged at each position of the vehicle, so that sensing capability can be enhanced, a richer scale transformation space can be provided, the integrity of environment information is ensured, and the accuracy of the environment information is improved.

The sensing device may comprise an image acquisition device and a radar device, for example, the image acquisition device may be a monocular camera or a binocular camera, the corresponding sensing data comprising image data; the radar device may be at least one of a lidar, a millimeter wave radar and an ultrasonic radar, and the corresponding perception data includes point cloud data.

In this embodiment, based on at least one sensing device disposed on a vehicle in running, collecting point cloud data to be processed or an image to be fused in the running process of the vehicle includes: at least one camera is arranged around the vehicle, and each camera senses the surrounding environment based on the current camera in the running process of the vehicle to obtain a current image which is used as an image to be fused; and/or, at least one laser radar is arranged around the vehicle, and for each laser radar, the surrounding environment is perceived based on the current laser radar in the running process of the vehicle, so that the current point cloud data is obtained and used as the point cloud data to be processed.

Wherein the deployment information comprises location information of the sensing device and/or location information of the sensing device relative to the vehicle, for example, the deployment information may be coordinates of the sensing device in a world coordinate system and/or coordinates in a vehicle coordinate system.

In this embodiment, for a plurality of sensing devices disposed at different positions, fusion processing may be performed on sensing data based on position information of the devices, to obtain fused point cloud data and fused images, including: determining a coordinate conversion relation between the sensing data based on the position information of each sensing device; based on the coordinate conversion relation, converting the sensing data into a coordinate system corresponding to one of the sensing data to obtain fused point cloud data and fused images; or determining a coordinate conversion relation between the sensing data and a vehicle coordinate system based on the position information of each sensing device; based on the coordinate conversion relation, all the sensing data are converted into a vehicle coordinate system to obtain fused point cloud data and fused images.

Wherein the voxel division rules include at least one of a size, a number, and a location of voxels. Correspondingly, the fused point cloud data and the fused image are respectively subjected to voxel division according to a preset voxel division rule, at least one group of voxel data is obtained for voxel division, and the method comprises the following steps: at least one of the size, number, and position of voxels is preset. And carrying out voxel division on the fused point cloud data and the fused image according to at least one of the preset size, number and position of the voxels so that the fused point cloud data and the fused image meet voxel division rule voxel data. Illustratively, the size, number and location of voxels are preset; and carrying out voxel division on the fused point cloud data and the fused image at the preset position to obtain a plurality of groups of voxel data meeting the preset voxel size and the preset voxel number. This has the advantage that the data is regular, reducing the difficulty of further feature extraction.

S120, inputting the voxel data into a target model obtained through pre-training, and obtaining a first detection result and a semantic segmentation result of the 3D marker of the obstacle.

The semantic segmentation result distinguishes foreground points and background points. The foreground point is a point on an object, the background point is a point on a non-object, and each position (each pixel in an image or each point in a point cloud) in the semantic segmentation result has a corresponding category label, for example, the category label can be a vehicle, a pedestrian, a road, a building and the like.

The 3D mark is a 3D object frame corresponding to the obstacle, for example, the 3D mark may be a rectangular parallelepiped including all pixels or point clouds of the obstacle.

It can be understood that in the first detection result, object level detection is realized for the environmental information, and a corresponding 3D object frame is added for marking. However, for scenes such as automatic driving, spatial information with higher accuracy is required, and the accuracy of the 3D object frame is difficult to satisfy, so that in addition to object detection, semantic segmentation is required to determine a category label corresponding to each position (each pixel in an image, or each point in a point cloud) in the scene.

The semantic segmentation includes point-based semantic segmentation, grid-based semantic segmentation and projection-based semantic segmentation. Note that, for a scene such as autopilot, the data amount of the point cloud data is large, so that the semantic segmentation based on points cannot be adapted to a complex scene (insufficient learning ability or insufficient receptive field) in autopilot, and therefore, in this embodiment, the semantic segmentation based on a grid or the semantic segmentation based on projection.

It will be appreciated that prior to training the target model, a training sample set corresponding to the target model needs to be constructed to train the target model based on each training sample in the training sample set to obtain.

In this embodiment, constructing a training sample set includes: acquiring a point cloud map, carrying out voxel division on the point cloud map, and determining voxels corresponding to each obstacle; extracting characteristics of point cloud data in all voxels corresponding to each obstacle to obtain point cloud characteristics; for each voxel, determining the category of the obstacle corresponding to the current voxel, and taking the category as a voxel label corresponding to the voxel, wherein all point clouds in the voxel correspond to the voxel label; meanwhile, adding a minimum-size 3D mark comprising the voxel to each voxel, further obtaining a minimum 3D mark comprising all voxels of the obstacle, and taking the minimum 3D mark as a 3D label of all point clouds of the obstacle; and taking the point cloud data, the voxel labels and the 3D labels as a training sample set to obtain the training sample set.

Further, for each voxel label, determining whether an obstacle corresponding to the current voxel label is a static object, taking point cloud data corresponding to the static object label as a background point, and taking point cloud data corresponding to the dynamic object label as a foreground point, so as to display the background point and the foreground point in a semantic segmentation image in a distinguishing way. Illustratively, in the semantically segmented image, the background points are displayed in a first color and/or a first shape and the foreground points are displayed in a second color and/or a second shape.

The point cloud features may be feature indicators associated with the number, density, volume, and standard deviation of the point cloud, for example, the point cloud features may be at least one of a fast point feature histogram (Fast Point Feature Histograms, FPFH), a direction histogram (Signature of Histograms of Orien Tations, SHOT), a local surface slice (Local Surface Patches), and an intrinsic morphology (INTRINSIC SHAPE).

In this embodiment, training the target model based on each training sample in the training sample set includes: for each training sample, voxel data in the current training sample is input into a target model to obtain a voxel label to be processed and a 3D label to be processed corresponding to each voxel, and a loss value is determined based on the voxel label to be processed and the 3D label to be processed and the voxel label and the 3D label in the current training sample so as to correct model parameters in the target model based on the loss value; and taking the convergence of the loss function in the target model as a training target to obtain the target model.

Further, voxel data are input into a target model obtained through training in advance, and a first detection result comprising 3D marks corresponding to each obstacle and a semantic segmentation result for distinguishing a foreground point from a background point are obtained.

In this embodiment, the target model includes an encoder, a decoder, a semantic segmentation sub-model and a labeling frame labeling sub-model, and inputting the voxel data into the target model obtained by training in advance, to obtain a first detection result and a semantic segmentation result for performing 3D labeling on the obstacle includes: performing feature extraction processing on input voxel data based on an encoder to obtain voxel feature data corresponding to the voxel data; determining target tensor data corresponding to the voxel feature data based on the decoder; inputting the target tensor data into a semantic segmentation sub-model to obtain a semantic segmentation image; and inputting the target tensor data into a marking frame marking sub-model to obtain a first detection result of 3D marking of the obstacle.

Wherein the encoder is for mapping each voxel in the voxel data to a low-dimensional encoding space and the decoder is for mapping vectors in the encoding space back to the original data space. By minimizing reconstruction errors during training, the encoder can learn a valid data representation; the semantic segmentation sub-model is used for carrying out semantic segmentation on the voxel data based on the characteristics output by the encoder, and determining the category corresponding to each voxel data (point cloud data/image data). And the label frame label sub-model is used for determining 3D labels corresponding to the voxel data, and determining 3D labels comprising all voxel data of the obstacle based on all 3D labels corresponding to the voxel data of the same obstacle so as to obtain a first detection result of the 3D labels of the obstacle.

Specifically, inputting voxel data into a target model obtained by pre-training to obtain a first detection result and a semantic segmentation result of the 3D marker of the obstacle, wherein the method comprises the following steps: inputting the voxel data into an encoder, and mapping each voxel in the voxel data into a low-dimensional coding space by the encoder to obtain feature data corresponding to each voxel so as to obtain voxel feature data corresponding to the voxel data; inputting the voxel characteristic data into a decoder, and mapping the voxel characteristic data in the low-dimensional coding space back to an original data space corresponding to the voxel data by the decoder, and determining tensor data corresponding to each voxel characteristic data to obtain target tensor data corresponding to the voxel characteristic data; inputting the target tensor data into a semantic segmentation sub-model, determining the category to which tensor data corresponding to each voxel belongs, and further, distinguishing and displaying voxel data corresponding to different categories to obtain a semantic segmentation result; and simultaneously, inputting the target tensor data into a label frame label sub-model, determining a 3D label corresponding to each voxel data, and determining a minimum 3D label comprising all voxel data of the obstacle based on all 3D labels corresponding to the voxel data of the same obstacle so as to obtain a first detection result of the 3D label of the obstacle.

S130, eliminating the preset perception data in the first detection result, and clustering the perception data in the first detection result after the eliminating process to obtain a result to be fused.

The preset sensing data may be point cloud data/pixel data corresponding to a background point.

Optionally, removing the preset sensing data in the first detection result, and clustering the sensing data in the removed first detection result, where obtaining the to-be-fused result includes: and eliminating the preset perception data in the first detection result based on the semantic segmentation result to eliminate the perception data positioned on the static scene, such as point cloud data corresponding to the road surface and the trunk, and updating the semantic segmentation result. The method has the advantages that the calculated amount is reduced and the speed of obstacle detection is improved by keeping the perception data corresponding to the dynamic scene.

Optionally, the preset sensing data is sensing data corresponding to an obstacle outside the road area. Considering that the vehicle runs on the road, the preset perception data corresponding to the obstacles outside the road area can be removed. Specifically, the processing of eliminating the preset perception data in the first detection result based on the semantic segmentation result includes: determining the area where each obstacle is located based on the semantic segmentation image to obtain the obstacle in the road area; and determining perception data corresponding to the obstacles in the first detection result, and eliminating the perception data.

Further, clustering the 3D mark in the first detection result after the rejection processing, to obtain a to-be-fused result includes: and clustering the perception data corresponding to the 3D mark in the first detection result after the elimination processing based on a preset clustering algorithm to obtain a result to be fused.

The preset clustering algorithm comprises an European clustering algorithm, a hierarchical clustering algorithm, a density clustering algorithm and the like.

Exemplary, based on an European clustering algorithm, clustering processing is performed on the point cloud data corresponding to the 3D mark in the first detection result after the elimination processing, so as to obtain a result to be fused. Specifically, for each point in the point cloud data corresponding to all the 3D marks in the first detection result after the rejection processing, determining the distance between every two points, if the distance is smaller than a preset distance threshold, determining that the two points are of the same type, determining that the point cloud data of the same type is a cluster, and setting different colors and/or shapes for different clusters so as to be displayed differently in the result to be fused.

In this embodiment, the removing processing of the preset sensing data in the first detection result, and the clustering processing of the sensing data in the first detection result after the removing processing are performed to obtain a to-be-fused result, includes: the perceived data corresponding to the road area in the first detection result is removed and processed, and the first detection result is updated; and clustering the perception data in the first detection result based on a preset geometric clustering algorithm, and determining a target 3D mark corresponding to the clustering result to obtain a result to be fused.

The preset geometric clustering algorithm comprises a Density-based clustering algorithm and a partition-based clustering algorithm, wherein the Density-based clustering algorithm comprises a Density-based clustering method (Density-Based Spatial Clustering of Applications with Noise, DBSCAN) with noise, a clustering method (Ordering points to IDENTIFY THE clustering structure, OPTICS) for ordering points to determine a cluster structure, and the partition-based clustering algorithm comprises a k-means algorithm. The preset geometric clustering algorithm may also include a graph neural network (Graph convolution Network, GCN).

Specifically, the perceived data is processed based on the target detection model, so that perceived data corresponding to the road area is obtained; the perceived data corresponding to the ground of the road area in the first detection result is removed and processed, and the first detection result is updated; clustering the sensing data in the first detection result based on a preset geometric clustering algorithm to obtain clustering sensing data of each type, and taking the clustering sensing data as a clustering result; and determining 3D marks corresponding to the cluster types based on the perception data of the cluster types in the cluster results, and taking the 3D marks as target 3D marks to obtain the result to be fused. This has the advantage that the structure of the scene in which the vehicle is located can be determined more intuitively, and the shape of the obstacle can be analyzed for obstacle detection.

And S140, obtaining a result to be processed through fusion processing of the result to be fused and the first detection result.

Optionally, when the perceived data is image data, fusion processing is performed on the image to be fused and the semantic segmentation image based on an image fusion method, so as to obtain the image to be processed. The image fusion method includes at least one of a pixel-level based fusion, a feature-level based fusion, and a model-level based fusion. The fusion method based on the pixel level comprises the steps of carrying out weighted average, maximum value or minimum value and other processing on pixels of an image to be fused and a semantically segmented image so as to realize the fusion of the pixel level, wherein the fusion method is simple and visual, but can cause the loss of certain information. Feature-level-based fusion includes performing equal feature extraction on an image to be fused and a semantically segmented image, respectively, for example, features may include edges, textures and colors; the features are subjected to feature matching, and image fusion is carried out based on a matching result, so that details and features of the images can be better reserved, but the requirements on the accuracy of feature extraction and matching are higher. The fusion based on the model level comprises modeling and optimizing the information of the image to be fused and the semantically segmented image based on wavelet transformation, multi-scale analysis, deep learning and other modes so as to realize image fusion. In summary, the image fusion method may be selected based on a specific application scenario.

In this embodiment, the to-be-fused result includes 3D marks corresponding to the sensing data of each cluster type, and the to-be-fused result and the first detection result are fused to obtain the to-be-processed result, which includes: fusing the 3D marks in the first detection result with the 3D marks corresponding to the perception data of each cluster type to obtain a fusion result; for the fusion result, determining an overlapping region of the 3D mark corresponding to the perception data of each cluster type in the 3D mark in the first detection result; and processing the fusion result based on the size data of the overlapping region and a preset size threshold value to obtain a to-be-processed result.

Wherein determining 3D markers corresponding to the perceptual data of each cluster type comprises: clustering is carried out on the perception data in the first detection result, so as to obtain perception data corresponding to each clustering type; and for the perception data corresponding to each cluster type, determining 3D marks comprising all the perception data corresponding to the cluster type, and obtaining a to-be-fused result comprising the 3D marks corresponding to the perception data of each cluster type.

Optionally, performing fusion processing on the 3D label in the first detection result and the 3D label corresponding to the sensing data of each cluster type, to obtain a fusion result, where the fusion result includes: superposing 3D marks in the first detection result and 3D marks corresponding to the sensing data of each cluster type in the result to be fused based on the position information of each 3D mark, namely superposing the 3D marks in the first detection result in the result to be fused based on the position information of the 3D marks in the first detection result and the position information of the 3D marks corresponding to the sensing data of each cluster type in the result to be fused, and taking the superposed result to be fused as a fusion result; or superposing the 3D mark in the to-be-fused result in the first detection result, and taking the superposed first detection result as a fusion result.

The overlapping area is an area corresponding to an overlapping portion of the two 3D markers in the fusion result. The size data may include at least one of an area, a side length, or a diagonal length of the overlapping region, and the corresponding preset size threshold may include an area threshold and/or a length threshold.

Optionally, for the fusion result, determining an overlapping region of the 3D mark in the first detection result corresponding to the perception data of each cluster type; based on the size data of the overlapping area and a preset size threshold, processing the fusion result to obtain a to-be-processed result, wherein the method comprises the following steps: in the fusion result, determining whether an overlapping area exists between the 3D mark in the first detection result and the 3D mark corresponding to the cluster type; if the overlapping areas exist, determining the area of each overlapping area; if the overlapping area exceeds the preset area threshold, determining 3D marks comprising the two 3D marks for the two 3D marks corresponding to the overlapping area, or reserving any 3D mark in the 3D marks corresponding to the clustering type in the first detection result to obtain a result to be processed. And if the overlapping area does not exist, reserving 3D marks in the first detection result in the fusion result and 3D marks corresponding to the clustering type to obtain a result to be processed.

And S150, determining a target result comprising the target obstacle based on a history result generated before the result to be processed, and distinguishing and displaying the target obstacle in the target result.

Wherein the target obstacle may be the nearest obstacle to the vehicle.

The historical result is the result to be processed corresponding to the last moment.

In consideration of the fact that the time interval between the acquisition time corresponding to the historical result and the acquisition time corresponding to the to-be-processed result is short, the obstacle state may not be different greatly, and in order to avoid the situation that the obstacle in the to-be-processed image is lost or suddenly changed in size or deflected in orientation due to shielding or omission, the obstacle state information in the to-be-processed result can be corrected based on the obstacle in the historical result, so that the obstacle can be comprehensively detected.

The obstacle state may include information such as obstacle speed, size, and orientation, among others.

Specifically, for each obstacle, performing state estimation based on the object tracking algorithm according to the obstacle state information in the history result generated before the result to be processed so as to obtain the obstacle prediction state information in the result to be processed; correcting the state information of the obstacle in the to-be-processed result based on the obstacle prediction state information to obtain a target result; determining the obstacle nearest to the vehicle based on the position information corresponding to each obstacle in the target result, and taking the obstacle as a target obstacle; the target obstacle is displayed in a target result in a shape and/or color that is distinguishable from other obstacles.

The target tracking algorithm comprises an early classical tracking algorithm, a tracking algorithm based on kernel correlation filtering and a tracking algorithm based on deep learning. Among the early classical tracking algorithms included particle filtering, mean shift (MEAN SHIFT), optical flow, and kalman filtering (KALMAN FILTERING) algorithms.

According to the technical scheme, obstacle detection is performed based on voxel data, the data are regular, and the difficulty of further feature extraction is reduced. Based on mutual correction between the 3D mark and the semantic segmentation result, the accuracy of obstacle detection and the integrity of a detection target are improved, and missed detection is reduced; and correcting the obstacle state information in the to-be-processed result based on the obstacle in the historical result so as to comprehensively detect the obstacle.

Fig. 2 is a flowchart of another obstacle detection method according to an embodiment of the present invention, and the embodiment is applicable to a scene of obstacle detection based on perception data. The present embodiment and the obstacle detection method in the foregoing embodiment belong to the same inventive concept, and on the basis of the foregoing embodiment, a process of inputting voxel data into a target model obtained by training in advance to obtain a first detection result and a semantic segmentation result for performing 3D marking on an obstacle is further described.

As shown in fig. 2, the obstacle detection method includes:

S210, obtaining the perception data, and preprocessing the perception data to obtain voxel data corresponding to the perception data.

S220, processing the voxel feature data based on the global context pooling module to obtain bird ' S-eye view feature data, and performing feature extraction on the bird ' S-eye view feature data to update the bird ' S-eye view feature data.

The global context pooling module (Global Context Pooling, GCP) is arranged between the encoder and the decoder and is used for extracting global context characteristics from voxel characteristic data so as to supplement local characteristics, so that semantic segmentation performance is improved, and obstacle detection accuracy is further improved.

Specifically, the global context pooling module maps global context features in three-dimensional sparse tensor data (i.e., voxel feature data) onto a two-dimensional dense aerial View (BEV) feature map, then converts the two-dimensional dense aerial View feature map into three-dimensional sparse tensor data based on a two-dimensional multi-scale feature extractor, and takes the sparse tensor data as updated aerial View feature data. This has the advantage that the receiving field can be enlarged, and the characteristic expression of the target model can be enhanced.

S230, inputting the aerial view characteristic data into a marking frame marking sub-model and a decoder respectively to obtain a first detection result and a semantic segmentation result.

Specifically, inputting the aerial view feature data into a decoder, mapping the aerial view feature data back to an original data space corresponding to the voxel data by the decoder, and determining tensor data corresponding to each voxel feature data to obtain target tensor data corresponding to the voxel feature data; inputting the target tensor data into a semantic segmentation sub-model, determining the category to which tensor data corresponding to each voxel belongs, and further, distinguishing and displaying voxel data corresponding to different categories to obtain a semantic segmentation result; and simultaneously, inputting the aerial view characteristic data into a marking frame marking sub-model, determining a 3D marking corresponding to each voxel data, and determining a minimum 3D marking comprising all voxel data of the obstacle based on all 3D markings corresponding to the voxel data of the same obstacle so as to obtain a first detection result of the 3D marking of the obstacle.

S240, eliminating the preset perception data in the first detection result based on the semantic segmentation result, and clustering the 3D marks in the first detection result after eliminating to obtain a to-be-fused result.

S250, obtaining a result to be processed through fusion processing of the result to be fused and the semantic segmentation result.

S260, carrying out state estimation on each obstacle in the to-be-processed result based on the Kalman filtering algorithm and the historical state information corresponding to each obstacle in the historical result, and obtaining the prediction state information corresponding to each obstacle.

For obstacle detection, it is necessary to determine an obstacle based on the perceived data, but the perceived data of the obstacle generally includes noise, so in order to remove the influence of noise, in this embodiment, the influence of noise on the perceived data at the current time is removed based on the history state information of the obstacle based on the kalman filter algorithm.

The historical result may be a to-be-processed result corresponding to a time before the current time, for example, the historical result may be a to-be-processed result of a previous time. The state information comprises an obstacle orientation angle, an obstacle size, position information of the obstacle in a vehicle coordinate system, position information of the obstacle in a world coordinate system, an obstacle speed and an obstacle motion state, the predicted state information corresponds to sensing data at the current moment, and the historical state information corresponds to sensing data at the moment before the current moment.

In consideration of the fact that the historical time corresponding to the historical result can be determined based on the collection time of the historical sensing data, and the current time corresponding to the result to be processed can be determined based on the collection time of the sensing data, the obstacle orientation angle, the obstacle size and the position information of the obstacle in the vehicle coordinate system, the position information of the obstacle in the world coordinate system, the obstacle speed and the obstacle movement state of each obstacle in the result to be processed can be determined based on the state information of each obstacle in the historical result and the state information of each obstacle in the result to be processed, and the information is used as prediction state information corresponding to each obstacle.

Optionally, for each obstacle, based on the state transition matrix in the kalman filtering algorithm and the historical state information corresponding to the obstacle in the image to be processed corresponding to the previous moment, the state information of the obstacle at the current moment is predicted, and the predicted state information corresponding to the obstacle is determined.

For each obstacle, a state transition matrix is determined, and the product of the speed, the size of the detection frame and the orientation corresponding to the obstacle in the image to be processed corresponding to the last moment is obtained, so that the predicted speed, the size of the detection frame and the orientation corresponding to the obstacle are obtained.

S270, smoothing obstacle results corresponding to each obstacle in the to-be-processed results based on the prediction state information so as to update the to-be-processed results.

In consideration of the fact that the difference between the state information at the current moment and the historical state information is not too large, a preset difference threshold is set, an obstacle corresponding to the difference larger than the preset difference threshold is used as an obstacle to be corrected, and the problems of reduction in the number of the obstacles, incomplete size of the obstacles, deflection in the direction and the like caused by shielding or missing detection are corrected, so that comprehensive detection of the obstacle is guaranteed.

Specifically, for each obstacle, determining area sensing data corresponding to the obstacle in the to-be-processed result, and taking the area sensing data as an obstacle result; determining the real state information of the obstacle in the obstacle result; determining a difference value between the predicted state information and the real state, and obtaining an obstacle with a large difference value and a preset difference threshold value, wherein the obstacle is used as an obstacle to be smoothed; updating the obstacle result corresponding to the obstacle to be smoothed based on the prediction state information so as to update the result to be processed.

For each obstacle, an area image corresponding to the obstacle in the image to be processed is determined as an obstacle image; determining the real state information of the obstacle in the obstacle image; determining a difference value between the predicted state information and the real state information, obtaining an obstacle with a large difference value and a preset difference threshold value, and taking the obstacle as the obstacle to be corrected; and adjusting the real state information in the obstacle image corresponding to the obstacle to be corrected into the predicted state information to obtain an updated obstacle image so as to update the image to be processed.

S280, determining area sensing data corresponding to the target obstacle in the to-be-processed result based on the position information of each obstacle in the to-be-processed result, obtaining a target result comprising the target obstacle, and distinguishing and displaying the target obstacle in the target result.

The position information may be coordinate information corresponding to each obstacle in the result to be processed. The regional perception data are all perception data corresponding to the target obstacle in the to-be-processed result. The target result may be perceived data that includes only all obstacles.

Specifically, for each obstacle, determining coordinate information of all perception data corresponding to the obstacle in the result to be processed; determining region position information of region sensing data of the obstacle based on the coordinate information; determining a distance corresponding to the region position information based on a built-in coordinate conversion relation of sensing equipment corresponding to the sensing data to obtain a minimum distance; taking an obstacle corresponding to the minimum distance as a target obstacle; the target obstacle is displayed in a shape and/or color different from other obstacles in the result to be processed, and the result to be processed is taken as a target result.

According to the technical scheme provided by the embodiment of the invention, the local features are supplemented based on the global context pooling module, so that the receiving field can be enlarged, the feature expression of the target model is enhanced, the semantic segmentation performance is improved, and the obstacle detection accuracy is further improved. Based on a Kalman filtering algorithm and historical state information corresponding to each obstacle in a historical result, carrying out state estimation on each obstacle in a to-be-processed result to obtain predicted state information corresponding to each obstacle, and correcting the problems of obstacle quantity reduction, obstacle size insufficiency, direction deflection and the like caused by shielding or missed detection to ensure comprehensive detection of the obstacle.

Fig. 3 is a block diagram of a specific obstacle detection system according to an embodiment of the invention, and fig. 4 is a schematic diagram of obstacle detection based on the obstacle detection system according to an embodiment of the invention, where, as shown in fig. 4, the obstacle detection system includes: the system comprises a preprocessing module, a target detection module, a data structuring module, a geometric cluster segmentation module, a target tracking module, a post-processing module and a travelable area detection module.

The preprocessing module is used for performing motion compensation on the sensing data acquired by the sensing devices, fusing the sensing data acquired by the sensing devices, determining an interested region (region of intresting, ROI), only reserving the sensing data corresponding to the interested region, converting the sensing data into a vehicle coordinate system (Egocentric coordinate system, EGO), and performing voxel division on the sensing data to obtain data voxels corresponding to the sensing data. The sensing data can be point cloud data acquired by a laser radar, and the region of interest can be a region corresponding to a set distance range around the vehicle.

And the target detection module is used for extracting the characteristics, fusing the characteristics, filtering the data, detecting the targets, segmenting the semantics and the like. Specifically, referring to fig. 5, the voxel features are processed based on a multi-scale convolutional neural network, and then multi-scale high-dimensional semantics and spatial features are extracted based on an encoder-decoder, and the generated features are processed by using a attention mechanism, so that the feature expression of a model is enhanced; then carrying out fusion operation on the characteristics of multiple scales to obtain multi-scale fusion characteristics; generating a bird's-eye view feature corresponding to the feature based on the bird's-eye view feature extraction sub-model, and processing the bird's-eye view feature based on the two-dimensional multi-scale feature extraction sub-model to obtain information such as the position, the size, the category, the orientation angle and the like of the obstacle in the environment where the vehicle is located; then, the obtained data is simultaneously sent to the marking frame sub-model and the semantic segmentation sub-model, and the semantic segmentation result can be corrected through the 2-stage correction sub-model to generate an accurate panoramic segmentation result and a corrected semantic segmentation result, and the final output result comprises, by way of example, input point cloud data (refer to fig. 6), a first detection result (refer to fig. 7) comprising a 3D mark, and the panoramic semantic segmentation result (refer to fig. 8). The size, the orientation and the like of the target detection frame are further corrected through 2-stage or multi-stage correction and through the point cloud semantic segmentation result, so that the target detection frame can effectively envelope all point clouds of the obstacle, and the orientation is basically correct. And finally, setting a corresponding loss function and branch weight for each predicted branch. The module can obtain a target detection frame, a target semantic segmentation instance point cloud (foreground point) and a background point.

The data structuring module is used for obtaining structured data such as road area surface, road boundary line and lane line, and the like, and can be polygonal area data or mask pixel surface data.

The geometric clustering segmentation module is used for filtering the region of interest (ROI) based on the road region obtained by the data structuring module, reserving background points except the ground in the road region, clustering and segmentation to generate geometric clustering barriers, generating a barrier 3D segmentation frame, and then performing fusion detection with the detection frame generated by the target detection module to ensure that the barriers which are missed to be detected in the road region are subjected to complementary detection in the module, ensure that all the barriers are comprehensively detected, and finally reserve the barriers with the target size.

The target tracking module is used for carrying out state estimation, smoothing and updating on information such as speed, detection frame size, orientation and the like of the obstacle based on Kalman filtering and obstacle information of historical frames, so that the problems of missing or incomplete size, orientation deflection and the like of some frame obstacles caused by shielding or missing detection are reduced, and the comprehensive detection of the obstacle is further ensured.

The post-processing module is used for uniformly processing the barrier finally determined after tracking, detecting and removing the mirror image ghost barrier; the nearest (close In-PATH VEHICLE, CIPV) obstacle to the vehicle is detected and marked, and unique IDs are assigned to all obstacles and lane lines.

The drivable region detection module is used for detecting a region which can be directly reached by the current position of the vehicle in the road range according to the road region and the size of the obstacle, and outputting polygonal point columns of the drivable region so as to further process the polygonal point columns by other models.

According to the technical scheme, the obstacle detection and segmentation results, lane line and other prediction results can be output, and 2-stage or multi-stage correction is performed based on the 3D detection frame and the semantically segmented sensing data, so that the accuracy of the size and the orientation of the detection frame and the integrity of a detection target are improved, and missed detection is reduced; supplementing detection results based on geometric clustering and other methods, and capturing and identifying abnormal obstacles and incomplete obstacle characteristics so as to ensure that the obstacles are not missed; the object tracking module is based on the fact that the speed, the size, the direction and other information of the obstacle output in the continuous frames are adjusted and smoothed, the obstacle is guaranteed to be free of defects in the continuous frames, and the consistency of the obstacle information is guaranteed; the method is applicable to single-type sensing equipment (such as only a camera or only a laser radar), single-number sensing equipment (such as a single camera or a single laser radar), multiple sensing equipment (such as multiple cameras or multiple laser radars), multiple-type sensing equipment combinations (such as camera, laser radars or 4D millimeter wave radar combinations) and the like, has strong applicability and strong sensing capability, effectively ensures the accuracy of obstacle identification, and further ensures the safety of automatic driving.

Fig. 9 is a block diagram of an obstacle detecting device according to an embodiment of the present invention, where the embodiment is applicable to a scene of obstacle detection based on sensing data, and the device may be implemented in hardware and/or software, and integrated into a processor of an electronic device with an application development function.

As shown in fig. 9, the obstacle detecting apparatus includes: the voxel data determining module 901 is configured to obtain sensing data, and pre-process the sensing data to obtain voxel data corresponding to the sensing data, where the sensing data includes image or point cloud data; the model detection module 902 is configured to input the voxel data into a target model obtained by training in advance, and obtain a first detection result of performing 3D marking on an obstacle and a semantic segmentation result, where foreground points and background points are distinguished in the semantic segmentation result; the clustering module 903 is configured to reject the preset sensing data in the first detection result, and cluster the sensing data in the first detection result after the rejection processing to obtain a to-be-fused result; the fusion module 904 is configured to obtain a to-be-processed result by performing fusion processing on the to-be-fused result and the first detection result; an obstacle display module 905 is configured to determine a target result including a target obstacle based on a history result generated before the result to be processed, and display the target obstacle differently in the target result. The problem of lower barrier detection accuracy is solved, and barrier detection accuracy is improved.

Optionally, the voxel data determination module 901 is specifically configured to:

acquiring point cloud data to be processed or images to be fused in the running process of a vehicle based on at least one sensing device arranged on the running vehicle;

Processing point cloud data to be processed and images to be fused, which are acquired at the same acquisition moment, based on deployment information of the at least one sensing device to obtain fused point cloud data and fused images;

And respectively carrying out voxel division on the fused point cloud data and the fused image according to a preset voxel division rule to obtain at least one group of voxel data.

Optionally, the model detection module 902 is specifically configured to:

Performing feature extraction processing on the input voxel data based on the encoder to obtain voxel feature data corresponding to the voxel data;

determining target tensor data corresponding to the voxel feature data based on the decoder;

Inputting the target tensor data into the semantic segmentation sub-model to obtain a semantic segmentation result;

And inputting the target tensor data into the marking frame marking sub-model to obtain a first detection result of 3D marking of the obstacle.

Optionally, the model detection module 902 is further configured to:

Processing the voxel feature data based on the global context pooling module to obtain aerial view feature data, and carrying out feature extraction on the aerial view feature data to update the aerial view feature data;

And respectively inputting the aerial view characteristic data into the marking frame marking submodel and the decoder to obtain the first detection result and the semantic segmentation result.

Optionally, the clustering module 903 is specifically configured to:

The perceived data corresponding to the ground of the road area in the first detection result is removed and processed, and the first detection result is updated;

And clustering the perception data in the first detection result based on a preset geometric clustering algorithm, and determining a target 3D mark corresponding to the clustering result to obtain a result to be fused.

Optionally, the fusion module 904 is specifically configured to:

Performing fusion processing on the 3D marks in the first detection result and the 3D marks corresponding to the perception data of each cluster type to obtain a fusion result;

For the fusion result, determining an overlapping region of the 3D mark corresponding to the perception data of each cluster type in the 3D mark in the first detection result;

And processing the fusion result based on the size data of the overlapped area and a preset size threshold value to obtain a to-be-processed result.

Optionally, the obstacle display module 905 is specifically configured to:

Carrying out state estimation on each obstacle in the to-be-processed result based on a Kalman filtering algorithm and historical state information corresponding to each obstacle in the historical result to obtain predicted state information corresponding to each obstacle, wherein the state information comprises an obstacle orientation angle, an obstacle size, position information of the obstacle in a vehicle coordinate system, position information of the obstacle in a world coordinate system, an obstacle speed and an obstacle motion state;

smoothing obstacle results corresponding to each obstacle in the to-be-processed results based on the prediction state information so as to update the to-be-processed results;

and determining area sensing data corresponding to the target obstacle in the to-be-processed result based on the position information of each obstacle in the to-be-processed result, and obtaining a target result comprising the target obstacle.

The obstacle detection device provided by the embodiment of the invention can execute the obstacle detection method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Fig. 10 is a block diagram of an electronic device according to an embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 10, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, etc.; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as the obstacle detection method.

In some embodiments, the obstacle detection method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as the storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into the RAM 13 and executed by the processor 11, one or more steps of the obstacle detection method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the obstacle detection method by any other suitable means (e.g. by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable obstacle detection device, such that the computer programs, when executed by the processor, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) through which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. An obstacle detection method, comprising:

Obtaining perception data, preprocessing the perception data, and obtaining voxel data corresponding to the perception data, wherein the perception data comprises image or point cloud data;

2. The method of claim 1, wherein the obtaining the perceptual data and preprocessing the perceptual data to obtain voxel data corresponding to the perceptual data comprises:

3. The method of claim 1, wherein the object model comprises an encoder, a decoder, a semantic segmentation sub-model, and a marker-frame marker sub-model,

Inputting the voxel data into a target model obtained by training in advance, and obtaining a first detection result of 3D marking the obstacle and a first detection result of semantic segmentation result comprises the following steps:

4. The method of claim 3, wherein the object model further comprises a global context pooling module disposed between the encoder and the decoder,

Inputting the voxel data into a target model obtained by training in advance to obtain a first detection result and a semantic segmentation result of the 3D marker of the obstacle, wherein the method comprises the following steps:

5. The method of claim 1, wherein the removing the preset perceived data in the first detection result, and clustering the perceived data in the removed first detection result, to obtain the result to be fused, includes:

6. The method of claim 1, wherein the results to be fused comprise 3D labels corresponding to the perceptual data of each cluster type,

And obtaining a result to be processed by fusing the result to be fused and the first detection result, wherein the result to be processed comprises the following steps:

7. The method of claim 1, wherein the determining a target result comprising a target obstacle based on historical results generated before the result to be processed comprises:

8. An obstacle detecting apparatus, comprising:

9. An electronic device, the electronic device comprising:

At least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the obstacle detection method of any one of claims 1-7.

10. A computer readable storage medium storing computer instructions for causing a processor to perform the obstacle detection method of any one of claims 1-7.