CN114862901A

CN114862901A - Road-end multi-source sensor fusion target sensing method and system for surface mine

Info

Publication number: CN114862901A
Application number: CN202210441815.8A
Authority: CN
Inventors: 郭叙森; 李静; 蔡杰; 周彤; 王俊晓; 朱亚琛; 袁胜; 张睿
Original assignee: Zhongke Huituo Shaanxi Technology Co ltd; Qingdao Vehicle Intelligence Pioneers Inc
Current assignee: Zhongke Huituo Shaanxi Technology Co ltd; Qingdao Vehicle Intelligence Pioneers Inc
Priority date: 2022-04-26
Filing date: 2022-04-26
Publication date: 2022-08-05

Abstract

The invention discloses a method and a system for sensing a fusion target of a road-end multi-source sensor of a surface mine, which specifically comprise the following steps: data acquisition is carried out, data are read from the sensor equipment, and format conversion is carried out on the data; carrying out example segmentation and depth estimation on the image; fusing point cloud data, clustering the foreground point cloud data, and constructing a three-dimensional detection frame for the point cloud cluster; and carrying out heterogeneous information fusion, associating a fusion detection result with the track, carrying out optimal estimation on the target observation value, and outputting a future motion track. The method adapts to the cooperative vehicle and road sensing environment, realizes full coverage of scene range sensing through fusion of multi-camera and multi-laser radar sensing results, improves the robustness of a sensing algorithm in severe weather, realizes accurate prediction of a target future track by combining high-precision map information, has the characteristics of low power consumption, good effect and high robustness, and is very suitable for large-scale commercial landing and popularization.

Description

Road-end multi-source sensor fusion target sensing method and system for surface mine

Technical Field

The invention relates to a method and a system for sensing a fusion target of a road-end multi-source sensor, in particular to a method and a system for sensing a fusion target of a road-end multi-source sensor of an open-pit mine.

Background

With the development of deep learning technology and the improvement of computing power of edge hardware platforms, automatic driving is becoming a core technology of future vehicle traveling. Most of the existing automatic driving algorithms concern single-vehicle intelligence, namely, various sensors and AI algorithms are deployed on a vehicle to complete vehicle intelligence. The bicycle intelligence can obtain good effect in simple road environment, but can not well deal with complex intersection environment. In the intersection environment, there is a complex interaction of various types of vehicles and pedestrians, and the field of view is easily blocked by obstacles such as large vehicles, bridges, houses, and the like due to a low installation position of the on-vehicle sensor. The problems lead to systematic defects of a single-vehicle intelligent system in a complex intersection environment, and in order to solve the defects, a roadside cooperative sensing system is developed.

The roadside intelligent unit is a set of system which is deployed at a crossing and monitors a dynamic target in the whole area, and the system can provide information such as the global pose, the speed and the predicted track of the target in the area for intelligent driving vehicles in the area and provide additional guarantee for the safe driving of the vehicles. Most of the existing three-dimensional sensing frames are adaptive to vehicle ends, and the existing three-dimensional sensing frames are used for the roadside environment. It is difficult to migrate the perception algorithm of the vehicle end to the road side end, which mainly has three aspects: 1. the vehicle-end sensor is in motion all the time, the background of the acquired data is also constantly changed, the roadside end is fixed, and the algorithm flow developed based on the vehicle-end data is not suitable for the roadside end; the viewing angle of the vehicle end is greatly different from the viewing angle of the roadside end, and the scene information of the roadside viewing angle is more comprehensive and abundant, which also means that roadside perception is more easily interfered by the environment, such as severe environments like dust, rain, snow and the like, and brings new challenges to a roadside perception algorithm; 3. the calculation resources that car end can use are comparatively abundant compared with the roadside end, and the roadside end needs the algorithm that efficiency is better, the consumption is lower in order to guarantee the real-time. Therefore, a set of environment perception algorithms needs to be developed specially for roadside scenes.

Patent CN112990129A proposes a three-dimensional object detection method based on the combination of vision and laser radar. Firstly, acquiring a current point cloud frame and a video frame, then carrying out visual detection on the video frame to obtain a visual detection result, and then carrying out depth judgment on the visual detection result to obtain visual near-sighted depth; for point cloud frame data, firstly converting the point cloud frame data into a sparse depth map, then extracting a candidate depth frame in the sparse depth map according to a visual detection result, then constructing a candidate point cluster of a current point cloud frame according to the candidate depth frame and a visual approximate depth, and finally constructing a three-dimensional object detection result according to the point cluster. The method utilizes an image detection result to extract a target point cloud cluster, and utilizes an image depth estimation model to estimate the depth of a detection target so as to further segment the point cloud. The performance of the three-dimensional object detection method is limited by image detection, the condition of image target shielding cannot be well processed, and in addition, due to the limited visual field range of the image, the algorithm cannot utilize point cloud information beyond the visual field range of the image.

The patent CN113095172A provides a point cloud three-dimensional object detection method based on deep learning, which comprises the steps of firstly extracting characteristic representation of non-empty voxels in point sparse and point dense areas in a point cloud scene through a layered voxel coding module, and then fusing voxel characteristics through an attention module to obtain effective voxel characteristic representation; in addition, the method also introduces a bird's eye view of the point cloud through a height information supplementing module to supplement the height information of the voxel characteristic map, and extracts useful information in the characteristic map after mask processing through a channel attention module to improve the geometric structure perception capability of the network. The method constructs a feature learning network for mining high-level semantic information in a feature map, and adds a voxel segmentation task at an output end to judge whether a non-empty voxel belongs to a target object. The model proposed by the method is very complex in structure, and has many limitations on engineering deployment. In addition, as with all detection methods based on deep learning, the method is data-driven, the model performance is highly dependent on a data set, the network generalization capability is poor, data marking and training need to be collected again when a scene is changed, the whole process consumes time, materials and labor, and the method is very unfavorable for landing. The current method based on deep learning also faces the problem of interpretability, and the processing of the long tail condition is relatively labored. Finally, the network-based three-dimensional detection model consumes computational resources, and the real-time detection effect on the edge computing platform is difficult to achieve.

In the field of trajectory prediction, patent CN113763434A proposes a target trajectory prediction method based on kalman filtering multi-motion model switching. The method comprises the steps of firstly establishing a Kalman filtering multi-motion model, then collecting motion information (at least comprising initial coordinates, real-time speed and real-time acceleration) of a target within a period of time, obtaining a motion state (at least comprising deceleration straight motion, constant-speed straight motion, deceleration lane change, constant-speed lane change, acceleration lane change and the like) of the target by an algorithm according to the motion information of the target, then switching the Kalman filtering motion model according to the change of the motion state, and calculating to obtain a predicted track of the target. The method hard codes the motion model of the target into different Kalman filters, and is difficult to cope with the complexity of the running state in an actual scene. The method does not utilize the prior information of the road in the scene, only depends on Kalman filtering and can only predict the track of the target in a short time, but cannot predict the long-time movement trend of the target.

Disclosure of Invention

The invention aims to provide a method and a system for sensing a road-end multi-source sensor fusion target of an open-pit mine, which realize accurate and reliable three-dimensional target detection and tracking in a mining area by using a multi-source sensor fusion technology, can predict future travel track information of the target by combining a target tracking result and high-precision map information, and solve the defects in the prior art.

The invention provides the following scheme:

a method for sensing a fusion target of a road-end multi-source sensor of a surface mine specifically comprises the following steps:

step one), data acquisition: data acquisition is carried out, data are read from the sensor equipment, and format conversion is carried out on the data;

step two) image multitask perception: carrying out example segmentation and depth estimation on the collected road target image to obtain an example segmentation result and a depth map;

step three), point cloud target detection: performing multi-radar fusion on the point cloud data to obtain fused point cloud data with unified coordinates, then obtaining foreground point cloud data through background filtering, performing clustering processing on the foreground point cloud data through a clustering algorithm, and constructing a three-dimensional detection frame on the point cloud cluster obtained through clustering processing to obtain a three-dimensional target frame;

step four), heterogeneous information fusion: carrying out heterogeneous information fusion on image instance segmentation, the depth map and the three-dimensional target frame, and outputting a target 3D detection result;

step five), multi-target tracking: establishing a track according to a first frame result after heterogeneous information fusion, associating a fusion detection result with the track, and performing optimal estimation on a target observation value;

step six) track prediction: and outputting a future motion trail according to the high-precision map information of the road side area and the tracking result of the target.

Further, in the step one), time synchronization is performed while data are acquired, a trigger signal is sent by a unified clock source to trigger different sensors to acquire data, time stamps at trigger moments are given to all data, and nanosecond time synchronization of different sensor data is achieved.

Further, in the second step), an industrial camera is used for collecting road target images at a road side visual angle, example segmentation and depth estimation are carried out on the collected road target images, and a multitask deep learning network is used for training and realizing the example segmentation and the depth estimation on a data set of a specific application scene.

Further, in step three):

transferring the point cloud frames under the local coordinate systems output by different laser radar sensors to a unified coordinate system to obtain a frame of fused point cloud frame with a complete view field range;

receiving fused point cloud frame data and performing filtering operation, wherein the filtering operation comprises down-sampling, illegal point removal, outlier removal and region-of-interest filtering operation, and outputting the processed point cloud data;

filtering background point clouds in the point cloud frames, dividing a detection range into different voxels, collecting a plurality of point cloud frames on line, counting the point cloud density of each voxel, making a background table, and setting a threshold value to filter background points according to the point cloud density value of the voxel corresponding to a current point in the background table;

clustering foreground point clouds is completed by using a DBSCAN algorithm, and the point clouds of the same target are clustered into one class;

and constructing a three-dimensional detection frame of each point cloud cluster by using an OBB direction bounding box algorithm.

Further, in the fourth step), image instance segmentation, depth estimation and multi-radar point cloud fusion data are fused, and a target 3D detection result is output, wherein the 3D detection result comprises a semantic category size and 3D pose information;

in the fifth step), a track is established according to the first frame result after heterogeneous information fusion, when the next frame fusion result comes, the fusion detection result is associated with the track by using a Hungarian bipartite graph matching algorithm, and the target observation value is optimally estimated by using a Kalman filtering technology.

Further, in the sixth step), a future movement track is output according to the tracking result of the target and high-precision map information, wherein the map information comprises lane guide lines, flatness and gradient information, and the movement track comprises position information, speed and acceleration.

A road-end multi-source sensor fusion target sensing system of a surface mine specifically comprises:

the data acquisition module is used for acquiring data, reading the data from the sensor equipment and converting the format of the data;

the image multi-task perception module is used for carrying out example segmentation and depth estimation on the collected road target image to obtain an example segmentation result and a depth map;

the point cloud target detection module is used for performing multi-radar fusion on point cloud data to obtain fused point cloud data with unified coordinates, obtaining foreground point cloud data through background filtering, performing clustering processing on the foreground point cloud data through a clustering algorithm, and constructing a three-dimensional detection frame for the point cloud cluster obtained through clustering processing to obtain a three-dimensional target frame;

the heterogeneous information fusion module is used for carrying out heterogeneous information fusion on the image instance segmentation, the depth map and the three-dimensional target frame and outputting a target 3D detection result;

the multi-target tracking module is used for establishing a track according to a first frame result after heterogeneous information fusion, associating a fusion detection result with the track and carrying out optimal estimation on a target observation value;

and the track prediction module is used for outputting a future motion track according to the target tracking result and high-precision map information, wherein the high-precision map information comprises lane guide lines, flatness and gradient information, and the motion track comprises position information, speed and acceleration.

Further, the point cloud target detection module comprises a preprocessing module, a background filtering module, a point cloud clustering module and a 3D frame reconstruction module, wherein the point cloud target detection module comprises a point cloud detection module, a point cloud filtering module and a point cloud filtering module;

the preprocessing module is used for receiving the fused point cloud frame data output by the original point cloud frame data or the multi-point cloud fusion module, filtering the fused point cloud frame data, wherein the filtering operation comprises down-sampling, illegal point removal, outlier removal and region-of-interest filtering operation, and outputting the processed point cloud data;

the background filtering module is used for filtering background point clouds in the point cloud frames, formalizing a detection range into three-dimensional voxel representation, collecting a plurality of point cloud frames on line, counting the point cloud density of each voxel, making a background table, and filtering background points according to a threshold value and the point cloud density value of the voxel corresponding to the current point in the background table;

the point cloud clustering module is used for finishing clustering of foreground point clouds by using a DBSCAN algorithm and clustering the point clouds of the same target into one type;

the 3D frame reconstruction module is used for constructing a three-dimensional detection frame of each point cloud cluster by using an OBB direction bounding frame algorithm, and each point cloud cluster obtained by the point cloud clustering module needs to construct a corresponding 3D frame through the 3D frame reconstruction module;

in the multi-target tracking module, a fusion result of the heterogeneous information fusion module is finally input into the multi-target tracking module, the multi-target tracking module firstly establishes a track according to a first frame fusion result, when a next frame fusion result arrives, a Hungary bipartite graph matching algorithm is utilized to associate a fusion detection result with the track, and then a Kalman filtering technology is used for carrying out optimal estimation on a target observation value.

An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus; the memory has stored therein a computer program that, when executed by the processor, causes the processor to perform the steps of a road-end multi-source sensor fusion target perception method for a surface mine.

A computer-readable storage medium, characterized in that it stores a computer program executable by an electronic device, which when run on the electronic device, causes the electronic device to perform the steps of a road-end multi-source sensor fusion target perception method for a surface mine.

Compared with the prior art, the invention has the following advantages:

the invention provides a roadside perception algorithm suitable for a surface mine, which can well fit the characteristics of road end perception in a concrete application scene of surface mine vehicle-road cooperation, realize the accurate perception of a dynamic target in the scene, provide the functions of blind supplementation, beyond visual range and redundancy verification for the vehicle end perception, and provide guarantee for the safe driving of an intelligent driving vehicle at a complex intersection.

The invention aims at the point cloud target detection technology of the road side platform, and can adapt to the condition of fusion detection of a plurality of laser radars. The point cloud target detection uses background filtering to filter static targets in a scene in advance, and the speed and the precision of point cloud target detection are obviously improved. The image detection result and the heterogeneous information of the point cloud target detection result are fused, the advantages of each sensor can be fully utilized, the over-the-horizon sensing of the road side sensing unit is achieved, and the accurate sensing under severe environments such as raised dust, rain, snow and the like is achieved.

The invention also combines a target track prediction algorithm of high-precision map information, and the algorithm restrains the movement direction of the target through the lane guide line information in the high-precision map and restrains the speed and the acceleration of the target by using the flatness and the gradient information in a concrete application scene of the open mine road cooperation, thereby realizing the accurate prediction of the future movement track of the target.

The multi-source sensor sensing algorithm specially adapts to the cooperative sensing environment of the vehicle and the road, realizes full coverage of scene range sensing through fusion of the sensing results of the multiple cameras and the multiple laser radars, and improves the robustness of the sensing algorithm in severe weather. In addition, the framework also provides a track prediction module, the accurate prediction of the target future track is realized by combining high-precision map information, and the perception algorithm has the characteristics of low power consumption, good effect and high robustness in a specific application scene, so that the method is very suitable for large-scale commercial landing and popularization.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a road-end multi-source sensor fusion target sensing method for a surface mine.

FIG. 2 is a framework diagram of a road-end multi-source sensor fusion target sensing system of a surface mine.

FIG. 3 is a block diagram of a surface mine roadside scene perception algorithm.

FIG. 4 is a system architecture diagram of a point cloud object detection module.

FIG. 5 is a flow chart of point cloud target detection.

Fig. 6 is a structural diagram of a heterogeneous information fusion module.

FIG. 7 is a schematic representation of the Frenet coordinate system.

Fig. 8 is a system architecture diagram of an electronic device.

Detailed Description

The technical solutions of the present invention will be described clearly and completely with reference to the accompanying drawings, and it should be understood that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Most of three-dimensional environment perception algorithms in the prior art are developed based on vehicle end visual angle data, and the methods cannot be well transferred to roadside perception scenes. On one hand, the viewing angles of the sensors at the vehicle end and the road side end are greatly different, and the differences further cause data differences, so that a perception algorithm suitable for the vehicle end cannot work well in a road side scene; on the other hand, the vehicle-end sensor is moving, and background information in the collected data is constantly changing, which brings great challenges to the perception algorithm. The road side upright stanchion is fixed, and background information in a scene can be filtered out by adopting methods such as background filtering and the like based on a high-precision map. This not only reduces the amount of data to be processed, but also improves the accuracy of target detection. Therefore, the perception algorithm flows of the roadside and the vehicle are also different. The invention provides a perception algorithm aiming at a roadside environment, which improves the performance of the roadside perception algorithm by using strategies such as background filtering, multi-source sensor fusion and the like and can meet the real-time requirement in the field of automatic driving.

For target trajectory prediction, the existing methods are based on Kalman filtering and deep learning technology. For the former, the kalman filtering technology highly depends on a motion model of a target, and cannot provide accurate track prediction for the situation that the target motion is complex, and in addition, the method is difficult to predict the long-time motion track of the target. For the latter, the deep learning method has poor generalization capability, the performance is highly dependent on data, the labeling cost is high, the calculation amount is large, and the real-time requirement of automatic driving cannot be met on the embedded equipment. The target track prediction method provided by the invention combines high-precision map information of scenes, including lane guide lines, flatness, gradient and other information, and can obtain accurate track information of a target in the next several seconds. The prediction information can remarkably enhance the planning performance of the automatic driving vehicle at the complex intersection and improve the driving efficiency and safety of the vehicle.

As can be seen from the description of the prior art in the background art, the prior art has the following defects: the method has the advantages that the condition of image target occlusion cannot be processed, the image view range is limited, and point cloud information beyond the image view range cannot be utilized (CN 112990129A, a three-dimensional object detection method combining vision and laser radar is involved); the data-driven method is debugged depending on a data set, retraining is needed when scenes are changed, interpretability is lacked, computing resources are consumed, and the effect is difficult to detect in real time on an edge computing platform (CN 113095172A, relating to deep learning points to three-dimensional object detection); the complexity of the running state under the actual scene is difficult to deal with, the priori information in the scene road is not utilized, only the track of the target in a short time can be predicted by means of Kalman filtering, and the long-time movement trend of the target cannot be predicted (CN 113763434A, related to track prediction).

The roadside perception algorithm provided by the invention can realize beyond-the-horizon perception and high-robustness perception in severe weather aiming at the specific application scene of the surface mine environment, and can run on an edge computing platform in real time and efficiently. Most of the existing technical schemes are vehicle-end visual angles, most of the existing technical schemes are large in power consumption, and instantaneity cannot be guaranteed. Accordingly, the present invention has technical effects beyond those expected by those skilled in the art.

Example 1: as shown in fig. 1, the method for sensing the fusion target of the multi-source sensor at the road end of the surface mine used for the surface mine with the cooperation of the surface mine and the vehicle comprises the following steps:

step S1) data acquisition: data acquisition is carried out, data are read from the sensor equipment, and format conversion is carried out on the data; in this embodiment, the execution main body of the data acquisition module includes a roadside/road side device: industrial cameras, multiple laser radars, etc.;

step S2) image multitask perception: carrying out example segmentation and depth estimation on the collected road target image to obtain an example segmentation result and a depth map;

depth estimation is to estimate the distance of each pixel in an image from a shooting source by using an RGB image under one or only one/multiple visual angles. In the prior art, the method for depth estimation of an image generally comprises: utilizing a single-view/multi-view geometric principle to enlighten and construct a depth estimation convolutional neural network model; then, a large number of images are collected and a depth map of the images is obtained (for example, depth interpolation is performed by using laser radar data, or data is collected directly by using a depth camera), the network model is optimized and trained by using the RGB images marked with the depth value of each pixel point, and then the depth estimation is performed on the target image by using the network model after the optimization training, so as to obtain the depth value of each pixel point in the target image.

Step S3) point cloud target detection: performing multi-radar fusion on the point cloud data to obtain fused point cloud data with unified coordinates, then obtaining foreground point cloud data through background filtering, performing clustering processing on the foreground point cloud data through a clustering algorithm, and constructing a three-dimensional detection frame on the point cloud cluster obtained through clustering processing to obtain a three-dimensional target frame;

step S4) heterogeneous information fusion: carrying out heterogeneous information fusion on image instance segmentation, the depth map and the three-dimensional target frame, and outputting a target 3D detection result; in the embodiment, the heterogeneous information fusion is to fuse heterogeneous data of different data sources and different systems, such as image instance segmentation, depth estimation, multi-radar fusion point cloud data and the like;

step S5) multi-target tracking: and establishing a track according to the first frame result after the heterogeneous information is fused, associating the fused detection result with the track, performing optimal estimation on the target observation value to obtain a smoother and more accurate detection result, and simultaneously calculating information such as the speed, the acceleration and the like of the target.

Step S6) trajectory prediction: according to the high-precision map information of the road side area, a future motion track (the motion track of the target in the future of several seconds can be output) is output by combining the tracking result of the target, the map information comprises lane guide lines, flatness, gradient information and the like, and the motion track comprises position information, speed and acceleration.

Preferably, the time synchronization is carried out while the data are acquired, the trigger signals are sent by the unified clock source to trigger different sensors to acquire the data, time stamps of trigger moments are given to all the data, and the nanosecond time synchronization of the data of the different sensors is realized.

Preferably, an industrial camera is used for collecting road target images at a road side view angle, example segmentation and depth estimation are carried out on the collected road target images, and a multitask deep learning network is used for training and realizing the example segmentation and the depth estimation on a data set of a specific application scene.

Preferably, point cloud frames output by different laser radar sensors under a local coordinate system are transferred to a unified coordinate system, namely, point cloud data under different visual angles are spliced together to obtain a frame of fused point cloud frame with a complete visual field range;

receiving fused point cloud frame data, performing filtering operation, wherein the filtering operation comprises down-sampling (voxel filtering), illegal point removal, outlier removal and region-of-interest filtering operation, and outputting the processed point cloud data;

filtering background point clouds in the point cloud frames, formalizing a detection range into different voxels for representing, collecting a plurality of point cloud frames on line, counting the point cloud density of each voxel, making a background table, and filtering background points according to a threshold value and the point cloud density value of the voxel corresponding to a current point in the background table;

clustering foreground point clouds by using a DBSCAN (density-based clustering method with noise) algorithm, and clustering the point clouds of the same target into one class;

Preferably, in step S4), image instance segmentation, depth estimation, and multi-radar point cloud fusion data are fused, and a target 3D detection result is output, where the 3D detection result includes semantic category size and 3D pose information;

in step S5), a trajectory is established according to the first frame result after the heterogeneous information fusion, when the next frame fusion result arrives, the fused detection result is associated with the trajectory by using the hungarian bipartite graph matching algorithm, the target observation value is optimally estimated by using the kalman filtering technique, a smoother and more accurate detection result is obtained, and information such as the speed and the acceleration of the target is also calculated.

Preferably, in step S6), a future movement trajectory including lane guide lines, flatness, and gradient information is output based on the tracking result of the target and high-precision map information including position information, speed, and acceleration.

Example 2: as shown in fig. 2 and 3, the present embodiment is a system corresponding to a road-end multi-source sensor fusion target sensing method for a surface mine, and the system includes six modules, which are a data acquisition module, an image multi-task sensing module, a point cloud target detection module, a heterogeneous information fusion module, a multi-target tracking module, and a trajectory prediction module, where: the data acquisition module is responsible for acquiring real-time image and point cloud data, the image data is input to the image multi-task perception module to obtain results such as instance segmentation and depth estimation, and the point cloud data is input to the point cloud target detection module to obtain a three-dimensional detection result. The results of the image multi-task perception module and the point cloud target detection module are input into the heterogeneous information fusion module, and the fusion algorithm can fuse the accurate position information of the point cloud and the rich semantic information of the image to obtain a more accurate three-dimensional detection result. And then, the result of the fusion module is input into the multi-target tracking module to obtain the tracking ID, the speed acceleration and other information of the target. And finally, the tracking information of the target is input into a track prediction module, and the module can be combined with high-precision map information to complete accurate prediction of the future motion track of the target.

The system can realize a road-end multi-source sensor fusion target sensing method of the surface mine in a mode of combining software and hardware, and the basic functional modules of the system specifically comprise:

the data acquisition module is used for acquiring data, reading the data from the sensor equipment, performing format conversion on the data and transmitting the data to the point cloud target detection module; besides the data encoding function, the data acquisition module also comprises the functions of time synchronization, space synchronization and the like among data. The time synchronization of the data is to generate a pulse signal by using hardware, all sensors are triggered by the pulse, and the own clock is corrected once when the sensors are triggered each time, so that the accumulated error of a clock source can be eliminated, which is very important for the online processing of time sequence data, and the time synchronization of the data ensures the possibility of data fusion of the multi-source sensors.

The module sends a trigger signal to trigger different sensors to acquire data through a unified clock source, and all data are given with timestamps of trigger time, so that nanosecond time synchronization of different sensor data is realized. In addition, the spatial synchronization of the data, i.e. the calibration of the different sensors, is also performed in the data acquisition module. Because the whole sensing system comprises the laser radar and the camera, the calibration of the sensor comprises the calibration of the laser radar and the calibration of the camera and the laser radar.

The image multi-task perception module is used for carrying out example segmentation and depth estimation on the collected road target image; the image multi-task sensing module inputs a road target image collected by a roadside viewing angle high-speed industrial camera and outputs an image instance segmentation result and a depth estimation result. The module trains a lightweight multi-task network model by collecting a large amount of data, and the network can complete image instance segmentation and depth estimation tasks at the same time. Without loss of generality, instance segmentation and depth estimation can be achieved by training on a specific application scene data set by utilizing an open-source multitask depth network. For example, in the article "Real-Time Joint detection and Depth Estimation Using asymmetry", a lightweight multi-tasking detection network is proposed. The article, which is part of the prior art, is accessible and downloadable to the article via the online link https:// arxiv.org/pdf/1809.04766v2.pdf, the relevant information of which:

the authors: vladimir Nekrasov1, Thanuja Dharmasiri2, Andrew Spek2, Tom Drummond2, Chunhua Shen1 and Ian Reid1

1School of Computer Science, University of Adelaide, Australia 2Monash University, Australia

Taking out: 2019-ICRA

Quote: nekrasov, V., Dharmasiri, T.A., Spek, A.D., Drummond, T.A., Shen, C., & Reid, I. (2019, May). Real-time joint segmentation and estimation using systematic errors, In 2019 International Conference on Robotics and Automation (ICRA) (pp. 7101 7107). IEEE.

Improvements are made from the thesis: nekrasov 2018 article Light-Weight RefineNet for Real-Time Semantic Segmentation

The network architecture used by the module is similar to the lightweight multitask detection network, and both the network architecture and the lightweight multitask detection network comprise a coder network for extracting the high-level semantic features of the image, and then a lightweight decoder network for completing different prediction tasks. In the network processing stage, the images of all the cameras at the same time are combined into a batch and simultaneously input into the network for processing, so that the processing speed and the throughput of the network can be remarkably improved. In order to further accelerate the network processing process, the module accelerates the model by using a TensorRT framework of NVIDIA, so that the image multitask perception module can meet the real-time requirement of automatic driving on an Xavier development suite.

As shown in fig. 4, the point cloud target detection module performs multi-radar fusion on point cloud data to obtain fused point cloud data with uniform coordinates, obtains foreground point cloud data through background filtering, performs clustering processing on the foreground point cloud data through a clustering algorithm, and constructs a three-dimensional detection frame for the point cloud cluster obtained through the clustering processing to obtain a three-dimensional target frame.

Under the condition of multi-laser radar cooperative sensing, point cloud data acquired by a plurality of sensors are firstly input into a multi-point cloud fusion module for pre-data fusion to obtain fusion point cloud data of a unified coordinate system, and then input into a point cloud target detection module.

Preferably, as shown in fig. 5, the point cloud target detection module includes four sub-modules, which are a preprocessing module, a background filtering module, a point cloud clustering module, and a 3D frame reconstruction module. The three-dimensional target detection information output by the point cloud target detection module is finally transmitted to the heterogeneous information fusion module and fused with the image detection result to obtain a detection result with richer information. The multi-point cloud fusion module is used for transferring point cloud frames under local coordinate systems output by different laser radar sensors to a unified coordinate system, namely splicing point cloud data under different visual angles together to obtain a fusion point cloud frame with a complete visual field range;

the point cloud clustering module is used for finishing clustering of foreground point clouds by utilizing a DBSCAN (noisy density-based clustering method) algorithm and clustering the point clouds of the same target into one class;

only foreground point cloud data are left after the preprocessed point cloud passes through the background filtering module, and the point cloud clustering module completes clustering of the foreground point cloud by utilizing a DBSCAN (density-based clustering method with noise) algorithm and clusters the point clouds of the same target into one class.

Specifically, the DBSCAN clustering algorithm generally assumes that the class can be determined by the closeness of the sample distribution, and the clustering purpose is achieved by classifying closely connected samples into one class. For point cloud data, connectivity can be constructed according to Euclidean distances between points, and then clustering is carried out by using a DBSCAN algorithm.

And the 3D frame reconstruction module is used for constructing a three-dimensional detection frame of each point cloud cluster by using an OBB direction bounding frame algorithm. And each point cloud cluster obtained by the point cloud clustering module needs to establish a corresponding 3D frame through a 3D frame reconstruction module. And the 3D frame reconstruction module utilizes an OBB direction bounding box algorithm to construct a three-dimensional detection frame of the point cloud cluster. The OBB algorithm aims to construct a minimum cuboid capable of surrounding a point cloud cluster in a three-dimensional space, and the algorithm mainly utilizes a principal component analysis technology to iteratively calculate three long axis (XYZ) directions of the point cloud. After the three long axis directions are obtained, the pose information and the size information of the target can be obtained through the projection of the point cloud in the three directions, and therefore the target detection function is completed.

As shown in fig. 6, the heterogeneous information fusion module is configured to perform heterogeneous information fusion, preferably, perform heterogeneous information fusion on image instance segmentation, depth estimation (depth map), and fusion point cloud data (three-dimensional target frame), and output a target 3D detection result; preferably, the 3D detection result includes semantic category size and 3D pose information. The heterogeneous information fusion module inputs results of the image multitask perception module and the point cloud target detection module and outputs a final 3D detection result of the target, wherein the final 3D detection result comprises semantic category size and 3D pose information. The fusion module realizes two different fusion strategies according to the distance. In a short range (within an effective sensing range of a laser radar), a fusion module preferentially trusts a point cloud target detection result, but due to severe weather such as dust flying, rain, snow and the like, a lot of noise is brought to the point cloud, and detection errors are caused. In contrast, the perception algorithm of the image is less affected by noise, and perception tasks under the weather of dust, rain, snow and the like can be better processed. Therefore, the algorithm projects the point cloud of the three-dimensional detection target to the image space, then utilizes the image example segmentation result to filter out the points projected outside the target mask, and then uses the remaining point cloud to recalculate the 3D bounding box, thereby obtaining a more accurate target three-dimensional detection result while filtering out the false detection result. In a long distance, the point cloud of the target is sparse, the point cloud target detection algorithm cannot well detect the target, the visible distance of the image is far greater than that of a laser radar, and the target two or three hundred meters away can also be well detected. Therefore, the fusion module firstly obtains the depth information of all pixels of the target mask region by using the image example segmentation result and the depth estimation information, generates a pseudo point cloud according to the depth information, and then calculates the size and the three-dimensional pose of the target. Due to errors in the depth estimation, the calculated target information is generally not very accurate. In order to restore the size and three-dimensional pose of the target as accurately as possible, the algorithm projects the original point cloud data into a mask for image instance segmentation, and corrects the result of image depth estimation by using the depth information of the point cloud falling in the target mask. The fusion module fully considers the advantages and the disadvantages of each sensor, realizes the over-the-horizon sensing of the road side sensing algorithm and the adaptation to raised dust and rain and snow, and improves the sensing distance of the sensing algorithm and the robustness to severe environment.

And the multi-target tracking module is used for establishing a track according to the first frame result after the heterogeneous information is fused, associating the fused detection result with the track, performing optimal estimation on the target observation value to obtain a smoother and more accurate detection result, and meanwhile calculating information such as the speed, the acceleration and the like of the target. And finally, inputting the fusion result of the heterogeneous information fusion module into the multi-target tracking module. The multi-target tracking module firstly establishes a track according to a first frame fusion result, when a next frame fusion result arrives, the Hungarian bipartite graph matching algorithm is used for correlating the fusion detection result with the track, then the Kalman filtering technology is used for carrying out optimal estimation on a target observation value to obtain a smoother and accurate detection result, and meanwhile information such as the speed and the acceleration of a target is calculated.

Kalman filtering is an optimal estimation algorithm that uses the optimal estimation X at time K-1 _k-1 For the standard, predict the state variable X at the time K ^{^} _k/k-1 And observing the state to obtain an observed variable Z _k Then, the prediction quantity is analyzed between prediction and observation, or the observation quantity is used for correcting the prediction quantity, so as to obtain the optimal state estimation X at the moment K _k . In the multi-target tracking process, the state variables are the three-dimensional position and the speed ([ x, y, z, v ] of the target _x , v _y , v _z ]) The three-dimensional position of the target ([ x, y, z) is the observed variable]) And obtaining the target through a target detection algorithm. In practical cases, the orientation of the target and the size information may also be added to the state variables and the observation variables to get an optimal estimate of the orientation and size information of the target. Through the multi-target tracking module, each fusion target is managed through the track, the time sequence correlation of the inter-frame targets is established, and the track prediction module can conveniently use the perception result.

The trajectory prediction module shown in fig. 7 is configured to output a future motion trajectory (which may output a motion trajectory of the target for several seconds in the future) according to the target tracking result and high-precision map information, where the high-precision map information includes lane guide lines, flatness, and gradient information, and the motion trajectory includes position information, speed, and acceleration.

The track prediction module inputs a tracking result of a target and high-precision map information of a road side area, including lane guide lines (generally lane center lines), flatness and gradient information, and outputs a motion track of the target in the next few seconds, including position information, speed and acceleration information. Firstly, a prediction module initially constructs a motion model of a target by using information such as target pose, speed and acceleration obtained by multi-target tracking, and then constrains the motion direction of the target by using a lane guide line. The constraint on the direction of motion is operated in the Frenet coordinate system. The Frenet coordinate system is based on the lane guideline, with the vertical axis(s) along the guideline and the horizontal axis (l) perpendicular to the guideline, as shown in FIG. 7. Given a lane guide line, the module projects the vehicle position onto the guide line, and decomposes the motion state (x, y, theta, v, a) of the current vehicle in a map coordinate system based on the projected point to obtain the position, the speed, the acceleration along the direction of the guide line, and the position, the speed and the acceleration relative to the motion of the guide line, wherein the speed and the acceleration are not the first/second derivative of the displacement to time in a general sense, but the first/second derivative of the transverse displacement to the longitudinal displacement, and describe the change trend of the geometric shape, and the calculation formula is as follows:

after the motion decomposition, the motion model of the vehicle in the map coordinate system can be correspondingly converted into the Frenet coordinate system. The constraint of the future motion direction of the vehicle can be realized by limiting the motion component of the vehicle in the direction of the transverse axis l, so that the predicted track direction of the vehicle is made to follow a lane guide line as much as possible, and the accuracy of track prediction is improved.

In addition, the track prediction module can also carry out linear correction on the predicted speed by using the gradient information, and then restrict the movement speed and the acceleration of the target by using the flatness information, which is different from the restriction of the movement direction, and is carried out under a map coordinate system.

The track prediction module utilizes the scheme to iteratively calculate the future pose, speed and acceleration information of the target, and the information can provide the future state of the scene for the vehicle end, so that the accuracy and efficiency of vehicle end path planning and speed planning are improved.

It is noted that although only basic functional blocks have been disclosed in the text and in the drawings of the present system, it is not meant that the system is limited to the basic functional blocks, but rather that this patent is intended to convey the meaning of: on the basis of the basic functional modules, a person skilled in the art can combine the prior art to add one or more functional modules arbitrarily to form an infinite number of embodiments or technical solutions, that is, the present system is open rather than closed, and the protection scope of the patent claims should be considered to be limited to the disclosed basic functional modules because the present embodiment discloses only individual basic functional modules.

As shown in fig. 8, the present invention discloses an electronic device and a storage medium corresponding to the method and the system based on the method and the system for sensing the fusion target of the multi-source sensor at the road end of the surface mine:

an electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of any of the methods described above.

A computer readable storage medium storing a computer program executable by an electronic device, the computer program, when run on the electronic device, causing the electronic device to perform the steps of any of the methods described above.

The electronic device includes a hardware layer, an operating system layer running on top of the hardware layer, and an application layer running on top of the operating system. The hardware layer includes hardware such as a Central Processing Unit (CPU), a Memory Management Unit (MMU), and a Memory. The operating system may be any one or more computer operating systems that implement control of the electronic device through a Process (Process), such as a Linux operating system, a Unix operating system, an Android operating system, an iOS operating system, or a windows operating system. In the embodiment of the present invention, the electronic device may be a handheld device such as a smart phone and a tablet computer, or an electronic device such as a desktop computer and a portable computer, which is not particularly limited in the embodiment of the present invention. The execution main body of the electronic device control in the embodiment of the present invention may be the electronic device, or a functional module capable of calling a program and executing the program in the electronic device.

The electronic device may obtain the firmware corresponding to the storage medium, the firmware corresponding to the storage medium is provided by a vendor, and the firmware corresponding to different storage media may be the same or different, which is not limited herein. After the electronic device acquires the firmware corresponding to the storage medium, the firmware corresponding to the storage medium may be written into the storage medium, specifically, the firmware corresponding to the storage medium is burned into the storage medium. The process of burning the firmware into the storage medium can be realized by adopting the prior art, and details are not described in the embodiment of the present invention.

The electronic device may further acquire a reset command corresponding to the storage medium, where the reset command corresponding to the storage medium is provided by a vendor, and the reset commands corresponding to different storage media may be the same or different, and are not limited herein. At this time, the storage medium of the electronic device is a storage medium in which the corresponding firmware is written, and the electronic device may respond to the reset command corresponding to the storage medium in which the corresponding firmware is written, so that the electronic device resets the storage medium in which the corresponding firmware is written according to the reset command corresponding to the storage medium. The process of resetting the storage medium according to the reset command may be implemented in the prior art, and is not described in detail in the embodiment of the present invention.

From the above description of the embodiments, it is clear to those skilled in the art that the present application can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the present application may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the embodiments or some parts of the embodiments of the present application.

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A road-end multi-source sensor fusion target sensing method for a surface mine is characterized by comprising the following steps:

2. The method for sensing the fusion target of the road-end multi-source sensors of the surface mine according to claim 1, wherein in the step one), time synchronization is performed while data are acquired, a trigger signal is sent by a unified clock source to trigger different sensors to acquire data, and time stamps of trigger time are given to all data, so that nanosecond time synchronization of different sensor data is realized.

3. The method for sensing the fusion target of the multi-source sensor at the road end of the surface mine according to claim 1, wherein in the step two), an industrial camera is used for collecting road target images at a road side view angle, example segmentation and depth estimation are carried out on the collected road target images, and the multitask deep learning network is used for training and realizing the example segmentation and the depth estimation on a data set of a specific application scene.

4. The method for sensing the fusion target of the road-end multi-source sensor of the surface mine according to claim 1, wherein in the step three):

the method comprises the steps that point cloud frames under local coordinate systems output by different laser radar sensors are transferred to a unified coordinate system, and a fused point cloud frame of a complete visual field range is obtained;

5. The method for sensing the fusion target of the road-end multi-source sensor of the surface mine according to claim 1, wherein:

in the fourth step), image instance segmentation, depth estimation and multi-radar point cloud fusion data are fused, and a target 3D detection result is output, wherein the 3D detection result comprises semantic category size and 3D pose information;

and in the step five), establishing a track according to a first frame result after heterogeneous information fusion, associating a fusion detection result with the track by using a Hungarian bipartite graph matching algorithm when a next frame fusion result arrives, and performing optimal estimation on a target observation value by using a Kalman filtering technology.

6. The method for sensing the fusion target of the road-end multi-source sensor of the surface mine according to claim 1, wherein in the sixth step), a future movement track is output according to a tracking result of the target and high-precision map information, wherein the map information comprises lane guide lines, flatness and gradient information, and the movement track comprises position information, speed and acceleration.

7. The utility model provides a way end multisource sensor of surface mine fuses target perception system which characterized in that specifically includes:

8. The system for sensing the road-end multi-source sensor fusion target of the surface mine according to claim 7, wherein the point cloud target detection module comprises a preprocessing module, a background filtering module, a point cloud clustering module and a 3D frame reconstruction module, wherein;

9. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete mutual communication through the communication bus; the memory has stored therein a computer program which, when executed by the processor, causes the processor to perform the steps of the method of any one of claims 1 to 6.

10. A computer-readable storage medium, characterized in that it stores a computer program executable by an electronic device, which, when run on the electronic device, causes the electronic device to perform the steps of the method of any one of claims 1 to 6.