CN115187964A - Automatic driving decision-making method based on multi-sensor data fusion and SoC chip - Google Patents

Automatic driving decision-making method based on multi-sensor data fusion and SoC chip Download PDF

Info

Publication number
CN115187964A
CN115187964A CN202211082826.8A CN202211082826A CN115187964A CN 115187964 A CN115187964 A CN 115187964A CN 202211082826 A CN202211082826 A CN 202211082826A CN 115187964 A CN115187964 A CN 115187964A
Authority
CN
China
Prior art keywords
data
image
point cloud
layer
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211082826.8A
Other languages
Chinese (zh)
Inventor
王嘉诚
张少仲
张栩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongcheng Hualong Computer Technology Co Ltd
Original Assignee
Zhongcheng Hualong Computer Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongcheng Hualong Computer Technology Co Ltd filed Critical Zhongcheng Hualong Computer Technology Co Ltd
Priority to CN202211082826.8A priority Critical patent/CN115187964A/en
Publication of CN115187964A publication Critical patent/CN115187964A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01CMEASURING DISTANCES, LEVELS OR BEARINGS; SURVEYING; NAVIGATION; GYROSCOPIC INSTRUMENTS; PHOTOGRAMMETRY OR VIDEOGRAMMETRY
    • G01C21/00Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00
    • G01C21/26Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network
    • G01C21/28Navigation; Navigational instruments not provided for in groups G01C1/00 - G01C19/00 specially adapted for navigation in a road network with correlation of data from several navigational instruments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • G06V20/582Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads of traffic signs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/588Recognition of the road, e.g. of lane markings; Recognition of the vehicle driving pattern in relation to the road

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Remote Sensing (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Automation & Control Theory (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention discloses an automatic driving decision-making method and an SoC chip based on multi-sensor data fusion, belonging to the technical field of machine learning and automatic driving, wherein an image sensor acquires image data of a road, inputs the image data into a trained image target detection neural network model, carries out lane image target detection and outputs target detection data of a lane image; the method comprises the following steps that a laser radar collects 3D point cloud data, the point cloud data are input into a trained point cloud target detection neural network model to carry out obstacle target detection, and obstacle information output by a binocular camera is fused to generate obstacle position and distance data; and performing data fusion on the lane image data and the obstacle position distance data, and correcting the road condition information of vehicle driving as a basis for automatic driving decision. The scheme of the invention can fully meet the real-time requirement in the automatic driving scene, and simultaneously fuses different sensor data, so that the accuracy of road condition analysis is greatly improved.

Description

Automatic driving decision-making method based on multi-sensor data fusion and SoC chip
Technical Field
The invention belongs to the technical field of machine learning and automatic driving, and particularly relates to an automatic driving decision method based on multi-sensor data fusion and an SoC chip.
Background
The automatic driving technology is more and more concerned by whole vehicle enterprises, and some whole vehicle enterprises invest more and more manpower and material resources to develop automatic driving vehicles, and even the automatic driving vehicles are used as target mass production points of 5-10 years in the future. The realization of automatic driving is divided into three stages of cognition, judgment and control, and the current automatic driving technology has many problems in the aspects of cognitive stages and path generation such as road identification and pedestrian identification and judgment stages such as condition judgment.
Along with the rapid development of artificial intelligence in the years, the application of the artificial intelligence in the field of automatic driving is more and more common, and a Chinese patent with the publication number of CN114708566A discloses an automatic driving target detection method based on improved YOLOv4, which comprises the following specific steps: acquiring a target detection common data set, and preprocessing the acquired data set through Mosaic; constructing a new non-maximum value inhibition algorithm Soft-CIOU-NMS by using NMS, soft-NMS and a CIOU loss function; improving a feature extraction network of YOLOv4, and increasing the three-scale prediction of the original YOVOv4 to four-scale prediction; the ordinary convolution of YOLOv4 is improved, and the depth separable convolution is used for replacing the ordinary convolution, so that the detection speed is accelerated; and the YOLOv4 network structure is improved, and a CBAM attention mechanism is added to enhance the feature extraction capability. However, depending on the image alone as the judgment basis, the image may be deviated in the detection and classification step due to the deviation of the moving image, and a judgment error may occur in the operation of the vehicle due to a set threshold value or an error in image cropping or feature extraction, thereby giving an erroneous command.
With the continuous improvement and popularization of 3D equipment such as laser radars, depth cameras and the like, automatic driving under a real three-dimensional scene becomes possible, the requirements of an automatic driving system on identification and detection of targets in a complex scene are improved, and the requirements of safety and convenience are met. In the automatic driving device, data acquisition is usually performed by an image sensor, a laser sensor and a radar, and a sensor number is combined for comprehensive analysis, so that relevant operations are realized according to an analysis result. The 2D target detection cannot meet the requirement of sensing environment of the unmanned vehicle, the 3D target detection can identify object types and information such as length, width, height, rotation angle and the like in a three-dimensional space, the 3D target detection is applied to the unmanned vehicle to detect targets in a scene, and the automatic vehicle can accurately predict and plan own behaviors and paths by estimating the actual position, so that collision and violation are avoided, the occurrence of traffic accidents can be greatly reduced, and the intellectualization of urban traffic is realized.
In order to solve the problem that the operation of a vehicle is wrongly decided due to the dynamic image deviation existing in a single image sensor, the Chinese patent invention with the publication number of CN114782729A provides a real-time target detection method based on laser radar and vision fusion, which comprises the following steps: acquiring camera image data and three-dimensional laser radar scanning point data of the surrounding environment of the vehicle, converting the point cloud data into a local rectangular coordinate system, and preprocessing the 3D point cloud; performing density clustering on the preprocessed 3D point cloud data, and extracting a 3D region of interest of a target and corresponding point cloud characteristics; s3: screening out sparse clusters of a target 3D region of interest, mapping to a corresponding region of an image, extracting image features and fusing with point cloud features; and inputting the point cloud characteristics and the image characteristics of all the interested areas into an SSD detector, and positioning and identifying the target. The extraction algorithm of the point cloud characteristics comprises PointNet + +, pointNet, voxelNet or SECOND algorithm. However, the problem that the point cloud feature extraction algorithm is low in detection speed generally exists in the technical scheme, for example, the target detection speed of PointNet is only 5.7Hz, a PointNet + + model is proposed based on a dense point cloud data set, the performance on a laser radar sparse point cloud data set is difficult to meet requirements, the VoxelNet algorithm uses 3D convolution to cause overlarge calculation amount, the processing speed is only 4.4Hz, and although the SECOND algorithm is improved on the VoxelNet algorithm, the processing speed is increased to 20Hz, and the real-time requirement under an automatic driving scene is still difficult to meet.
Disclosure of Invention
The invention provides an automatic driving decision-making method based on multi-sensor data fusion and an SoC (system on chip) chip, and aims to solve the problems of low efficiency of road condition information processing and misjudgment depending on a single image sensor in automatic driving in the prior art.
In order to solve the technical problems, the automatic driving decision is carried out based on multi-sensor data fusion, different neural network models are trained to be respectively used for detecting image sensor data and laser radar sensor data, and the specific scheme is as follows:
the automatic driving decision-making method based on multi-sensor data fusion comprises the following steps:
s1: the method comprises the steps that an RGB image sensor collects image data of a vehicle driving road, wherein the image data comprises lane line data, vehicle data, pedestrian data and traffic sign data;
s2: inputting the lane line data, the vehicle data, the pedestrian data and the traffic sign data into a trained image target detection neural network model, performing lane image feature extraction and feature fusion, and outputting target detection data of a lane image, wherein the image target detection neural network model adopts a YOLOv7 target detection algorithm;
s3: the method comprises the steps that 3D point cloud data are collected through a laser radar, the point cloud data are input into a trained point cloud target detection neural network model, distance feature extraction and feature fusion are conducted, target position and distance data are output, fusion is conducted on the target position and distance information output by a binocular camera, and final obstacle position and distance data are generated, wherein a PointPillar target detection algorithm is adopted by the neural network model;
s4: carrying out data fusion on the lane image data generated in the step S2 and the obstacle position distance data generated in the step S3, analyzing whether errors exist in each sensor or not, and correcting the road condition information of vehicle driving;
s5: and making corresponding decision according to the road condition information corrected in the step S4 and applying the decision to automatic driving.
Preferably, the training process of the neural model for detecting image targets in step S2 specifically includes the following steps:
s2-1: establishing a data set of lanes, pedestrians and traffic signs, wherein the data set is used for training a neural network model;
s2-2: preprocessing the lane, pedestrian and traffic sign data sets to generate RGB format images with set resolution;
s2-3: sequentially enabling the format image to pass through an image feature extraction layer, an image feature fusion layer and an image target detection layer of a YOLOv7 network to obtain a neural network model;
s2-4: checking whether the training times reach a set target or not, if not, repeating the step S2-3 until the set training times are reached, and storing the neural network model as an image target detection neural network model.
Preferably, the training process of the point cloud target detection neural model in step S3 specifically includes the following steps:
s3-1: establishing a laser radar data set, wherein the data set is used for training a point cloud target detection neural model;
s3-2: preprocessing the laser radar data set to generate format point cloud data;
s3-3: sequentially passing the format point cloud data through a feature conversion layer, a feature extraction layer and a target detection layer of a PointPillar network to obtain a neural network model;
s3-4: checking whether the training times reach a set target or not, if not, repeating the step S3-3 until the set training times are reached, and storing the neural network model as a point cloud target detection neural network model.
Preferably, in the data fusion of step S4, the processed image data and the radar data are matched in a decision layer fusion manner, and the obstacle position and distance detection result generated by the radar data are mapped to the coordinates of the image data to form a comprehensive characteristic diagram.
Preferably, the YOLOv7 network model comprises an image input layer, an image feature extraction layer, an image feature fusion layer and an image target detection layer; the image input layer aligns input images; the image feature extraction layer further comprises a plurality of convolution layers, a batch normalization layer and a maximum pooling layer and is used for enriching the features of the aligned images and extracting the features of lanes, vehicles and pedestrians; the image feature fusion layer is used for fusing features extracted at different stages, so that the accuracy of the features is improved; and the image target detection layer detects the road condition information characteristics of the fused characteristic graph and outputs an image detection result.
Preferably, the PointPillar network model comprises a point cloud feature conversion layer, a point cloud feature extraction layer and a point cloud target detection layer; the point cloud feature conversion layer converts the input point cloud into a sparse pseudo image; the point cloud feature extraction layer processes the pseudo image to obtain features of a high layer; the point cloud target detection layer detects the position and the distance of a target through a regression 3D frame.
Preferably, the method for detecting the traffic sign data adopts an improved lightweight convolutional neural network, the lightweight convolutional neural network uses an expansion convolution to realize a sliding window method, and statistical information in a data set is used for accelerating the forward propagation speed of the network, so that the efficiency of detecting the traffic sign is improved.
Preferably, the format of the point cloud data input into the pilar feature layer is P × N × D, where P is the selected number of pilars, N is the maximum number of point clouds stored in each pilar, and D is the dimensional attribute of the point cloud.
Preferably, the dimensional attributes of the point cloud are 9-dimensional data, characterized by:
Figure 220208DEST_PATH_IMAGE001
wherein
Figure 126984DEST_PATH_IMAGE002
Is the original laser radar point cloud data,
Figure 282022DEST_PATH_IMAGE003
represents the three-dimensional coordinate data and represents the three-dimensional coordinate data,
Figure 348067DEST_PATH_IMAGE004
which represents the reflected intensity of the laser light,
Figure 195938DEST_PATH_IMAGE005
indicating the offset of the laser point cloud in pilar from the center of the N point clouds,
Figure 855589DEST_PATH_IMAGE006
indicating the offset of the laser point cloud from the pilar coordinates.
An automatic driving decision SoC chip based on multi-sensor data fusion comprises a general processor and a neural network processor; the general-purpose processor controls the operation of the neural network processor through the self-defined instruction, and the neural network processor is used for executing the method.
Compared with the prior art, the invention has the following technical effects:
1. the invention combines the laser radar and the binocular camera to realize the detection of the position and the distance of the obstacle, can efficiently utilize the advantage that the point cloud data has accurate spatial information, simultaneously utilizes the binocular camera to make up the problem that the laser radar has positioning error when working in a severe environment, enlarges the application range of the obstacle detection, and meets the robustness requirement of the obstacle detection in an automatic driving scene.
2. The invention adopts the PointPillar network model to process laser radar data, operates on a columnar body (Pillar) but not a Voxel (Voxel), does not need to manually adjust the box separation in the vertical direction, uses Pillar to represent point cloud, can be used for 3D point cloud detection only by 2D convolution, greatly reduces the calculated amount, increases the processing speed to over 62Hz, and can effectively meet the real-time requirement under the automatic driving scene.
3. Aiming at different tasks, the invention adopts different neural network models for processing, fully exerts the advantages of each neural network model, enables the data before the comprehensive data fusion to be synchronously processed, and improves the integrity of the decision method.
4. The method combines the lane image data, the image data of other traffic participants, the image data of traffic signs and the position distance data of the obstacles to perform comprehensive data fusion on a decision layer, so that the drivable area and the obstacles can be judged more accurately in the automatic driving scene, the target recognition capability is good, and the sensing accuracy of the vehicle to the surrounding environment is improved.
Drawings
FIG. 1 is a flow chart of an automated driving decision method based on multi-sensor data fusion in accordance with the present invention;
FIG. 2 is a flow chart of a YOLOv7 network model structure of the automatic driving decision method based on multi-sensor data fusion according to the present invention;
FIG. 3 is a flow chart of a PointPillar network model structure of the multi-sensor data fusion-based automatic driving decision method of the present invention.
In the figure: 1. an image input layer; 2. an image feature extraction layer; 3. an image target detection layer; 4. a point cloud feature conversion layer; 5. a point cloud feature extraction layer; 6. a point cloud target detection layer; 21. a convolution module; 22. a first pooling module; 23. a second pooling module; 24. a third pooling module; 41. point cloud data; 42. stacking column data; 43. acquiring characteristic data; 44. pseudo image data.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail and completely with reference to the accompanying drawings.
Referring to fig. 1-3, the present invention provides an automatic driving decision method based on multi-sensor data fusion, comprising the following steps:
s1: the RGB image sensor collects image data of a vehicle driving road, wherein the image data comprises lane line data, vehicle data, pedestrian data, traffic sign data and other traffic participant data.
S2: inputting lane line data, vehicle data, pedestrian data and traffic sign data into a trained image target detection neural network model, performing lane image feature extraction and feature fusion, and outputting target detection data of a lane image, wherein the neural network model adopts a YOLOv7 target detection algorithm.
S3: the method comprises the steps that 3D point cloud data are collected through a laser radar, the point cloud data are input into a trained point cloud target detection neural network model, distance feature extraction and feature fusion are conducted, target position and distance data are output, fusion is conducted on the target position and distance information output by a binocular camera, final obstacle position and distance data are generated, and a PointPillar target detection algorithm is adopted by the neural network model. The method can efficiently utilize the advantage that the point cloud data has accurate spatial information, and simultaneously utilizes the binocular camera to make up the problem that the laser radar has positioning errors when working in a severe environment, thereby expanding the application range of obstacle detection and meeting the robustness requirement of the obstacle detection in an automatic driving scene.
S4: and (3) carrying out data fusion on the lane image data generated in the step (S2) and the obstacle position distance data generated in the step (S3), analyzing whether errors exist in each sensor, and correcting the road condition information of vehicle driving.
S5: and making corresponding decision according to the road condition information corrected in the step S4 and applying the decision to automatic driving.
The training process of the image target detection neural model in the step S2 specifically comprises the following steps:
s2-1: and establishing a data set of lanes, pedestrians and traffic signs, wherein the data set comprises normal, crowded, night, lane-free lines, shadows, arrows, glare, curves, intersections and lane condition types under different weather and climate conditions, and also comprises pedestrians, animals, non-motor vehicles and other obstacles, and the data set is used for training a neural network model.
In this embodiment, in order to fully train the YOLOv7 neural network model, the TuSimple data set is used in cooperation with the CULane data set to train the detection of the targets of the lanes and the vehicles, and the RESIDE data set is used to train the detection of the targets of other traffic participants in road traffic.
S2-2: preprocessing a data set of lanes, pedestrians and traffic signs to generate an RGB format image with set resolution, wherein the format image is an RGB three-channel format image with 640 x 640 resolution according to the characteristics of the YOLOv7 input layer 1.
S2-3: and sequentially passing the format image through an image feature extraction layer 2, an image feature fusion layer and an image target detection layer 3 of a YOLOv7 network to obtain a neural network model.
S2-4: checking whether the training times reach a set target, if not, repeating the step S2-3 until the set training times are reached, and storing the neural network model as an image target detection neural network model.
S3, the training process of the point cloud target detection neural model specifically comprises the following steps:
s3-1: and establishing a laser radar data set, wherein the data set is used for training a point cloud target detection neural model, and the data set can adopt data sets such as LiDAR-Video Driving Dataset, KITTI, pandaset, waymo, lyft Level 5, DAIR-V2X, nuScenes and the like.
S3-2: and preprocessing the laser radar data set to generate format point cloud data.
The data format of the point cloud inputted into the Pillar feature layer is P × N × D, where P is the number of selected Pillars, N is the maximum number of point clouds stored in each Pillar, and D is the dimensional attribute of the point cloud.
The dimensional attribute of the point cloud is 9-dimensional data, and is characterized in that:
Figure 922640DEST_PATH_IMAGE001
wherein
Figure 718558DEST_PATH_IMAGE002
Is the original laser radar point cloud data,
Figure 737329DEST_PATH_IMAGE003
represents the three-dimensional coordinate data and represents the three-dimensional coordinate data,
Figure 743331DEST_PATH_IMAGE004
which represents the reflected intensity of the laser light,
Figure 505751DEST_PATH_IMAGE005
indicating the offset of the laser point cloud in pilar from the center of the N point clouds,
Figure 421754DEST_PATH_IMAGE006
indicating the offset of the laser point cloud from the pilar coordinates.
In this embodiment, the number P of the pilars is 30000, the maximum number of the point clouds stored in each pilar is 20, if the number of the point clouds in a certain pilar is greater than 20, 20 point clouds are randomly sampled and discarded, and if the number of the point clouds in a certain pilar is less than 20, 0 padding is used for supplementing the point clouds. Thus, the point cloud data format input to the pilar feature layer is P × N × D (30000 × 20 × 9).
S3-3: and sequentially passing the format point cloud data through a point cloud feature conversion layer 4, a point cloud feature extraction layer 5 and a point cloud target detection layer 6 of the PointPillar network to obtain a neural network model.
S3-4: checking whether the training times reach a set target, if not, repeating the step S3-3 until the set training times are reached, and storing the neural network model as a point cloud target detection neural network model.
The YOLOv7 network model comprises an image input layer 1, an image feature extraction layer 2, an image feature fusion layer and an image target detection layer 3; the image input layer aligns the input images; the image feature extraction layer further comprises a plurality of convolution layers, a batch normalization layer and a maximum pooling layer and is used for enriching the features of the aligned images and extracting the features of lanes, vehicles and pedestrians; the image feature fusion layer is used for fusing features extracted at different stages, so that the accuracy rate of the features is improved; and the image target detection layer detects the road condition information characteristics of the fused characteristic graph and outputs an image detection result.
The image feature extraction layer 2 comprises a convolution module 21, a first pooling module 22, a second pooling module 23 and a third pooling module 24 which are arranged in sequence; the convolution module 21 outputs a 4-time down-sampled feature map B, the first pooling module 22 receives the feature map B and processes the feature map B to output an 8-time down-sampled feature map C, the second pooling module 23 receives the feature map C and processes the feature map C to output a 16-time down-sampled feature map D, and the third pooling module 24 receives the feature map C and processes the feature map C to output a 32-time down-sampled feature map E. The convolution module 21 includes four CBR convolution layers and an ELAN layer sequentially disposed. The first, second and third pooling modules 22, 23 and 24 are the largest pooling layer MP1 and ELAN layer sequentially disposed.
And the image target detection layer 3 performs pyramid pooling on the feature map E, and outputs three target detection results with different sizes through three branches, namely a ReVGG _ block layer REP and a layer of convolution CONV.
The PointPillar network model comprises a point cloud feature conversion layer 4, a point cloud feature extraction layer 5 and a point cloud target detection layer 6; the point cloud feature conversion layer 4 converts the input point cloud into a sparse pseudo image; the point cloud feature extraction layer 5 processes the pseudo image to obtain the features of a high layer; the point cloud target detection layer 6 performs Bbox regression through the SSD detection head to realize the position and distance of the 3D frame detection target.
The point cloud feature conversion layer 4 converges the input P × N × D (30000 × 20 × 9) point cloud data 41 into stacked column data 42, then sequentially applies simplified PointNet and 1 × 1 convolution to each point cloud to obtain learned feature data 43, and finally moves the point cloud back to the original position according to the index to obtain pseudo image data 44, wherein the size of the pseudo image data 44 is H × W × C (512 × 512 × 64), where H is the pixel height of the pseudo image, W is the pixel width of the pseudo image, and C is the channel number of the pseudo image.
The processing flow of the point cloud feature extraction layer 5 comprises 3 steps: carrying out progressive downsampling on an input pseudo image to form a pyramid characteristic; corresponding features are up-sampled to a uniform size; and splicing the three uniform characteristics. Wherein the down-sampling is performed by a sequence of
Figure 955635DEST_PATH_IMAGE007
The components of the composition are as follows,
Figure 589879DEST_PATH_IMAGE008
is relative to the stride of the pseudo-image,
Figure 890410DEST_PATH_IMAGE009
is the number of 2D convolution layers of size 3 x 3,
Figure 51133DEST_PATH_IMAGE010
is the number of output channels. For operation of up-sampling
Figure 880549DEST_PATH_IMAGE011
It is shown that the process of the present invention,
Figure 2088DEST_PATH_IMAGE012
and
Figure 483142DEST_PATH_IMAGE013
representing the number of stride inputs and the number of outputs, and finally obtaining F characteristics to be spliced together.
The detection method of the traffic sign data adopts an improved lightweight convolution neural network, the lightweight convolution neural network realizes a sliding window method by using expansion convolution, and statistical information in a data set is used for accelerating the forward propagation speed of the network so as to improve the detection efficiency of the traffic sign.
And S4, fusing data, namely matching the processed image data with radar data in a decision layer fusion mode, and mapping the obstacle position and distance detection result generated by the obstacle data to the coordinates of the image data to form a comprehensive characteristic diagram. Both the obstacle information data and the image data are converted into BEV coordinates. The barrier information data is regarded as a multi-channel image under polar coordinates, the channel of the multi-channel image is characterized by doppler, and the multi-channel image under BEV can be regarded after coordinate conversion is carried out; the image data can be regarded as a multi-channel image under BEV after coordinate conversion. The two data are in the same coordinate system, and the two data are fused on a multi-scale by using a Concat-based mode. The method has the advantages that the vehicle driving road surface is judged by using the fused data, so that the judgment of the drivable area and the barrier in the automatic driving scene is more accurate, the target recognition capability is good, and the accuracy of the vehicle on the sensing of the surrounding environment is improved.
An automatic driving decision SoC chip based on multi-sensor data fusion comprises a general processor and a neural network processor; the general processor controls the operation of the neural network processor through the self-defined instruction, and the neural network processor is used for executing the method.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, it is possible to make various changes and modifications without departing from the inventive concept, and these changes and modifications are all within the scope of the present invention.

Claims (10)

1. The automatic driving decision method based on multi-sensor data fusion is characterized by comprising the following steps of:
s1: the method comprises the steps that an RGB image sensor collects image data of a vehicle driving road, wherein the image data comprises lane line data, vehicle data, pedestrian data and traffic sign data;
s2: inputting the lane line data, the vehicle data, the pedestrian data and the traffic sign data into a trained image target detection neural network model, performing lane image feature extraction and feature fusion, and outputting target detection data of a lane image, wherein the image target detection neural network model adopts a YOLOv7 target detection algorithm;
s3: collecting 3D point cloud data by a laser radar, inputting the point cloud data into a trained point cloud target detection neural network model, performing distance feature extraction and feature fusion, outputting target position and distance data, fusing the target position and distance data with target position and distance information output by a binocular camera, and generating final obstacle position and distance data, wherein the neural network model adopts a PointPillar target detection algorithm;
s4: carrying out data fusion on the lane image data generated in the step S2 and the obstacle position distance data generated in the step S3, analyzing whether errors exist in all sensors or not, and correcting the road condition information of vehicle driving;
s5: and making corresponding decisions according to the road condition information corrected in the step S4, and applying the decisions to automatic driving.
2. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the training process of the image target detection neural model in the step S2 specifically comprises the following steps:
s2-1: establishing a data set of lanes, pedestrians and traffic signs, wherein the data set is used for training a neural network model;
s2-2: preprocessing the lane, pedestrian and traffic sign data sets to generate RGB format images with set resolution;
s2-3: sequentially enabling the format image to pass through an image feature extraction layer, an image feature fusion layer and an image target detection layer of a YOLOv7 network to obtain a neural network model;
s2-4: checking whether the training times reach a set target, if not, repeating the step S2-3 until the set training times are reached, and storing the neural network model as an image target detection neural network model.
3. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the training process of the point cloud target detection neural model in the step S3 specifically comprises the following steps:
s3-1: establishing a laser radar data set, wherein the data set is used for training a point cloud target detection neural model;
s3-2: preprocessing the laser radar data set to generate format point cloud data;
s3-3: sequentially passing the format point cloud data through a feature conversion layer, a feature extraction layer and a target detection layer of a PointPillar network to obtain a neural network model;
s3-4: checking whether the training times reach a set target or not, if not, repeating the step S3-3 until the set training times are reached, and storing the neural network model as a point cloud target detection neural network model.
4. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the data fusion of step S4 is performed by matching the processed image data with the radar data in a decision layer fusion manner, and mapping the obstacle position and distance detection result generated by the radar data to the coordinates of the image data to form a comprehensive feature map.
5. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the YOLOv7 network model comprises an image input layer, an image feature extraction layer, an image feature fusion layer and an image target detection layer; the image input layer aligns input images; the image feature extraction layer further comprises a plurality of convolution layers, a batch normalization layer and a maximum pooling layer and is used for enriching the features of the aligned images and extracting the features of lanes, vehicles and pedestrians; the image feature fusion layer is used for fusing features extracted at different stages, so that the accuracy of the features is improved; and the image target detection layer detects the road condition information characteristics of the fused characteristic graph and outputs an image detection result.
6. The multi-sensor data fusion-based automatic driving decision method according to claim 1, wherein the PointPillar network model comprises a point cloud feature conversion layer, a point cloud feature extraction layer and a point cloud target detection layer; the point cloud feature conversion layer converts the input point cloud into a sparse pseudo image; the point cloud feature extraction layer processes the pseudo image to obtain the features of a high layer; the point cloud target detection layer detects the position and the distance of a target through a regression 3D frame.
7. The multi-sensor data fusion-based automatic driving decision method according to claim 1, characterized in that the detection method of the traffic sign data adopts an improved lightweight convolutional neural network, the lightweight convolutional neural network uses an expansion convolution to implement a sliding window method, and statistical information in a data set is used to accelerate the forward propagation speed of the network, so as to improve the efficiency of the traffic sign detection.
8. The multi-sensor data fusion-based automatic driving decision method of claim 6, wherein the point cloud data format of the input Pillar feature layer is P x N x D, where P is the selected Pillar number, N is the maximum point cloud number stored by each Pillar, and D is the dimensional attribute of the point cloud.
9. The multi-sensor data fusion-based automated driving decision method of claim 8, wherein the point cloud has a dimensional attribute of 9-dimensional data characterized as:
Figure DEST_PATH_IMAGE001
wherein
Figure 301693DEST_PATH_IMAGE002
Is the original laser radar point cloud data,
Figure 542182DEST_PATH_IMAGE003
represents the three-dimensional coordinate data and represents the three-dimensional coordinate data,
Figure 601142DEST_PATH_IMAGE004
which represents the intensity of the reflection of the laser light,
Figure 218069DEST_PATH_IMAGE005
indicating the offset of the laser point cloud in pilar from the center of the N point clouds,
Figure 39394DEST_PATH_IMAGE006
indicating the offset of the laser point cloud from the pilar coordinates.
10. An automatic driving decision SoC chip based on multi-sensor data fusion is characterized in that the SoC chip comprises a general processor and a neural network processor; the general purpose processor controls the operation of a neural network processor through custom instructions, the neural network processor being configured to perform the method of any one of claims 1-9.
CN202211082826.8A 2022-09-06 2022-09-06 Automatic driving decision-making method based on multi-sensor data fusion and SoC chip Pending CN115187964A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211082826.8A CN115187964A (en) 2022-09-06 2022-09-06 Automatic driving decision-making method based on multi-sensor data fusion and SoC chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211082826.8A CN115187964A (en) 2022-09-06 2022-09-06 Automatic driving decision-making method based on multi-sensor data fusion and SoC chip

Publications (1)

Publication Number Publication Date
CN115187964A true CN115187964A (en) 2022-10-14

Family

ID=83523212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211082826.8A Pending CN115187964A (en) 2022-09-06 2022-09-06 Automatic driving decision-making method based on multi-sensor data fusion and SoC chip

Country Status (1)

Country Link
CN (1) CN115187964A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984802A (en) * 2023-03-08 2023-04-18 安徽蔚来智驾科技有限公司 Target detection method, computer-readable storage medium and driving equipment
CN116229452A (en) * 2023-03-13 2023-06-06 无锡物联网创新中心有限公司 Point cloud three-dimensional target detection method based on improved multi-scale feature fusion
CN116453087A (en) * 2023-03-30 2023-07-18 无锡物联网创新中心有限公司 Automatic driving obstacle detection method of data closed loop
CN117111055A (en) * 2023-06-19 2023-11-24 山东高速集团有限公司 Vehicle state sensing method based on thunder fusion
CN117197019A (en) * 2023-11-07 2023-12-08 山东商业职业技术学院 Vehicle three-dimensional point cloud image fusion method and system
CN117944059A (en) * 2024-03-27 2024-04-30 南京师范大学 Track planning method based on vision and radar feature fusion
CN117944059B (en) * 2024-03-27 2024-05-31 南京师范大学 Track planning method based on vision and radar feature fusion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886477A (en) * 2017-09-20 2018-04-06 武汉环宇智行科技有限公司 Unmanned neutral body vision merges antidote with low line beam laser radar
US20210094580A1 (en) * 2019-09-30 2021-04-01 Toyota Jidosha Kabushiki Kaisha Driving control apparatus for automated driving vehicle, stop target, and driving control system
CN113420637A (en) * 2021-06-18 2021-09-21 北京轻舟智航科技有限公司 Laser radar detection method under multi-scale aerial view angle in automatic driving
CN114120115A (en) * 2021-11-19 2022-03-01 东南大学 Point cloud target detection method for fusing point features and grid features
CN114359181A (en) * 2021-12-17 2022-04-15 上海应用技术大学 Intelligent traffic target fusion detection method and system based on image and point cloud
CN114397877A (en) * 2021-06-25 2022-04-26 南京交通职业技术学院 Intelligent automobile automatic driving system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886477A (en) * 2017-09-20 2018-04-06 武汉环宇智行科技有限公司 Unmanned neutral body vision merges antidote with low line beam laser radar
US20210094580A1 (en) * 2019-09-30 2021-04-01 Toyota Jidosha Kabushiki Kaisha Driving control apparatus for automated driving vehicle, stop target, and driving control system
CN113420637A (en) * 2021-06-18 2021-09-21 北京轻舟智航科技有限公司 Laser radar detection method under multi-scale aerial view angle in automatic driving
CN114397877A (en) * 2021-06-25 2022-04-26 南京交通职业技术学院 Intelligent automobile automatic driving system
CN114120115A (en) * 2021-11-19 2022-03-01 东南大学 Point cloud target detection method for fusing point features and grid features
CN114359181A (en) * 2021-12-17 2022-04-15 上海应用技术大学 Intelligent traffic target fusion detection method and system based on image and point cloud

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
伍晓晖等: "交通标志识别方法综述", 《计算机工程与应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984802A (en) * 2023-03-08 2023-04-18 安徽蔚来智驾科技有限公司 Target detection method, computer-readable storage medium and driving equipment
CN116229452A (en) * 2023-03-13 2023-06-06 无锡物联网创新中心有限公司 Point cloud three-dimensional target detection method based on improved multi-scale feature fusion
CN116229452B (en) * 2023-03-13 2023-11-17 无锡物联网创新中心有限公司 Point cloud three-dimensional target detection method based on improved multi-scale feature fusion
CN116453087A (en) * 2023-03-30 2023-07-18 无锡物联网创新中心有限公司 Automatic driving obstacle detection method of data closed loop
CN116453087B (en) * 2023-03-30 2023-10-20 无锡物联网创新中心有限公司 Automatic driving obstacle detection method of data closed loop
CN117111055A (en) * 2023-06-19 2023-11-24 山东高速集团有限公司 Vehicle state sensing method based on thunder fusion
CN117197019A (en) * 2023-11-07 2023-12-08 山东商业职业技术学院 Vehicle three-dimensional point cloud image fusion method and system
CN117944059A (en) * 2024-03-27 2024-04-30 南京师范大学 Track planning method based on vision and radar feature fusion
CN117944059B (en) * 2024-03-27 2024-05-31 南京师范大学 Track planning method based on vision and radar feature fusion

Similar Documents

Publication Publication Date Title
CN111583337B (en) Omnibearing obstacle detection method based on multi-sensor fusion
CN109948661B (en) 3D vehicle detection method based on multi-sensor fusion
CN115187964A (en) Automatic driving decision-making method based on multi-sensor data fusion and SoC chip
CN112149550B (en) Automatic driving vehicle 3D target detection method based on multi-sensor fusion
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN110738121A (en) front vehicle detection method and detection system
CN111563415A (en) Binocular vision-based three-dimensional target detection system and method
CN113936139A (en) Scene aerial view reconstruction method and system combining visual depth information and semantic segmentation
CN115049700A (en) Target detection method and device
CN113095152B (en) Regression-based lane line detection method and system
CN114821507A (en) Multi-sensor fusion vehicle-road cooperative sensing method for automatic driving
CN117274749B (en) Fused 3D target detection method based on 4D millimeter wave radar and image
CN113378647B (en) Real-time track obstacle detection method based on three-dimensional point cloud
Kanchana et al. Computer vision for autonomous driving
Song et al. Automatic detection and classification of road, car, and pedestrian using binocular cameras in traffic scenes with a common framework
CN114155414A (en) Novel unmanned-driving-oriented feature layer data fusion method and system and target detection method
CN112950786A (en) Vehicle three-dimensional reconstruction method based on neural network
CN114820931B (en) Virtual reality-based CIM (common information model) visual real-time imaging method for smart city
CN116403186A (en) Automatic driving three-dimensional target detection method based on FPN Swin Transformer and Pointernet++
US20220371606A1 (en) Streaming object detection and segmentation with polar pillars
CN113611008B (en) Vehicle driving scene acquisition method, device, equipment and medium
CN113762195A (en) Point cloud semantic segmentation and understanding method based on road side RSU
CN112766100A (en) 3D target detection method based on key points
Yuan et al. Real-time long-range road estimation in unknown environments
Zhang et al. End-to-end BEV perception via homography matrix

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20221014

RJ01 Rejection of invention patent application after publication