CN114140672A

CN114140672A - Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene

Info

Publication number: CN114140672A
Application number: CN202111401618.5A
Authority: CN
Inventors: 王海; 张�成; 蔡英凤; 陈龙; 李祎承; 刘擎超; 孙晓强
Original assignee: Jiangsu University
Current assignee: Jiangsu University
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2022-03-04

Abstract

The invention discloses a target detection network system and a method applied to multi-sensor data fusion in a rainy and snowy weather scene, which comprises the steps of selecting a nuScenes road target detection data set as a training set and a verification set for training a target detection method; constructing a first-stage point cloud and image data feature extraction, feature fusion, foreground point segmentation and candidate frame generation network; constructing two-stage point cloud and image data feature extraction, feature fusion, confidence prediction and bounding box regression network; setting a multi-target loss function, and performing end-to-end target detection network training by using a training set; and carrying out real vehicle deployment on the trained optimal target detection network to realize high-robustness road target detection. The invention uses the point cloud information and the image information to carry out a bidirectional multi-scale fusion method, effectively improves the robustness of the target detection algorithm under severe working conditions such as rain and snow weather, and can simultaneously meet the requirements of a road target detection task on detection precision and detection speed.

Description

Target detection network system and method applied to multi-sensor data fusion in rainy and snowy weather scene

Technical Field

The invention belongs to the field of unmanned automobile environment perception, and particularly relates to an unmanned high-robustness multi-sensor data fusion target detection network system and method applied to rainy and snowy weather scenes.

Background

The unmanned system mainly comprises three parts, namely environment perception, path planning and decision control. And the three-dimensional target detection is an important component of the environment perception task of the unmanned automobile.

Conventional target detection methods are mainly classified into two types, a single sensor method or a multi-sensor method. Because the detection result redundancy of the single sensor method is small, the data fusion under the multi-sensor method is more suitable for complex working conditions such as road target detection and the like. Common multi-sensor data fusion methods are data layer fusion, target layer fusion and feature layer fusion. Although the data layer fusion can obtain the cross information among multiple sensors, the information is most abundant, but the data matching is difficult, the structure is complex, and the real-time detection is difficult to realize. And the decision layer fusion has simple structure and is easy to realize, but has data association only during detection, so that the advantage of data fusion is lost to a certain extent. Compared with the two methods, the feature layer fusion between the two methods can not only effectively utilize rich information obtained by the multiple sensors, but also has relatively simple data processing and is easier to realize real-time detection. Therefore, the data fusion method of the feature layer is the most promising at present, and the sensor fusion target detection algorithm which is most widely applied is adopted.

Although various feature layer fusion methods are proposed, the effect of target detection is not satisfactory, and through a great deal of research and discussion, the current multi-sensor data fusion target detection method has the following disadvantages:

(1) the method is only suitable for the working conditions in clear weather and less interference on the operation of each sensor, has poor robustness and is difficult to deal with the complex working conditions in the rainy and snowy weather scene;

(2) one kind of sensor data is taken as a main part, and the other sensor data is taken as supplementary information, so that the advantages of the sensor data are difficult to be fully utilized;

(3) the sensor data feature extraction method is single, and the requirements of a target detection algorithm on feature information loss and feature extraction speed are difficult to meet at the same time;

(4) the multi-sensor data fusion method is single, and the requirements of a road target detection task on detection precision and detection efficiency are difficult to meet at the same time.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides the unmanned high-robustness multi-sensor data fusion target detection method applied to the rainy and snowy weather scene, can deal with the complex working conditions of the rainy and snowy weather and the like, can fully utilize the characteristic information of each sensor, and can complete the more efficient and more accurate three-dimensional road target detection task.

In order to achieve the purpose, the invention adopts the technical scheme that:

a target detection network system applied to multi-sensor data fusion in a rain and snow weather scene comprises: the first network system: namely, a network is generated by extracting the characteristics of point cloud and image data, fusing the characteristics, segmenting foreground points and generating candidate frames at a first stage; the second network system: the two-stage point cloud and image data feature extraction, feature fusion, confidence prediction and bounding box regression network are obtained;

the first network system includes: the system comprises a two-dimensional convolution module, an SA point cloud feature extraction module, a one-stage feature layer data fusion module based on point cloud and a multi-scale multi-resolution point cloud feature fusion module; the method is used for realizing feature fusion mainly based on point cloud;

the second network system includes: the system comprises a voxelization-based SA point cloud feature extraction module, a three-dimensional sparse convolution module, a two-dimensional convolution module and an image-based two-stage feature layer data fusion module; for implementing image-dominated feature fusion.

Further, the loss function of the network system is the sum of the loss functions of the first network and the second network system, the second network system loss function is set as confidence prediction and boundary box regression loss, and the confidence prediction loss is focal loss used for balancing the condition that positive and negative samples in the training data are not balanced; the specific setting is as follows:

L＝L_rpn+L_rcnn

L_rcnn＝L_cls+L_reg

L_cls＝-α(1-c_t)^γlogc_t

wherein L represents the final multi-objective loss function; l is_rpnAnd L_rcnnThen represents the loss function of the one-stage and two-stage networks; l is_clsAnd L_regRespectively representing confidence prediction and bounding box regression loss in the two-stage network; alpha represents the weight occupied by the positive and negative samples; gamma is an inhibition factor and is used for reducing the influence of a large number of simple samples on the training result; c. C_tThe confidence measure is indicated.

Further, the two-dimensional convolution module of the first network system is configured to: splicing images collected by 6 cameras uniformly distributed at the top of the unmanned automobile into a four-dimensional tensor, and inputting the four-dimensional tensor into a two-dimensional convolution module for feature extraction; sequentially generating three image characteristic vectors with gradually increased scales by three two-dimensional convolution modules with different sizes; after each feature extraction, the generated image feature vector is input into a one-stage feature layer data fusion module based on the point cloud, and the generated image feature vector and the point cloud feature vector are superposed and fused.

Further, an SA point cloud feature extraction module of the first network system is used for point cloud processing, namely, the SA point cloud module is used for extracting features of point cloud information obtained by a solid-state laser radar arranged at the top of the unmanned vehicle; after the primary point cloud feature extraction is finished, overlapping and fusing the extracted point cloud features and the generated image feature vector, then performing the next point cloud feature extraction, repeating the operation twice to obtain three fusion features with different scales, and processing the sequentially obtained fusion features with different scales through a multi-scale multi-resolution point cloud feature fusion module to obtain the final multi-scale fusion feature.

Further, the multi-scale and multi-resolution point cloud feature fusion module is used for obtaining the final multi-scale fusion features: sequentially selecting down-sampling radii of the point clouds with different sizes, extracting the characteristics of the point clouds, and splicing and fusing the characteristics to obtain multi-resolution characteristics; and then, respectively sampling a corresponding number of point clouds on different scales, and splicing and fusing the point clouds with the previously acquired multi-resolution features to obtain the final multi-scale features.

Further, the first-stage feature layer data fusion module based on the point cloud of the first network system comprises a semantic segmentation module, an attention mechanism module, a feature matrix generation module and a feature mapping module; aiming at the richer semantic information and texture information of the image, the semantic segmentation is carried out on the acquired image characteristics, so that the pixels can be segmented into foreground points and background points one by one;

aiming at the fact that semantic information is distorted to a certain extent due to the influence of exposure on the image information, the attention machine module is applied to self-adapt the weight of the acquired image information, namely the semantic information and the image information are overlapped and then the weight of the semantic information is output through the attention machine module; and then, obtaining a feature mapping matrix from the image to the point cloud through external reference calibration and internal reference calibration of the camera, and finally mapping the semantic information to the point cloud information point by point to obtain a fusion feature.

Further, the second network system comprises a point cloud processing channel and an image processing channel;

in a point cloud processing channel, performing primary processing on the three-dimensional point cloud by an SA point cloud characteristic extraction module based on voxelization, and then acquiring voxelization point cloud characteristics by a three-dimensional sparse convolution module: selecting a central point at each voxel grid point to represent a spatial position, selecting a proper radius considering that the point cloud in the voxel space is sparse, and selecting a part with relatively dense point cloud in the voxel space to perform sparse convolution to obtain point cloud characteristics; repeating the operation twice to obtain voxel point cloud characteristics of different scales;

in an image processing channel, sequentially acquiring image characteristics of different scales by three two-dimensional convolution modules;

after each feature extraction of the point cloud processing and the image processing, the generated voxel point cloud feature vectors with different scales and the obtained image features with different scales are input into a two-stage feature layer data fusion module based on an image, the generated voxel point cloud features and the image features obtained by a two-dimensional convolution module are overlapped and fused, then the next feature extraction is carried out, the operation is repeated twice, the fusion features with different scales can be obtained, and the final multi-scale fusion features can be obtained after the sequentially obtained three fusion features with different scales are subjected to size adjustment.

Further, the image-based two-stage feature layer data fusion module comprises a K neighbor interpolation module, a foreground point segmentation module, an attention mechanism module and a feature mapping module;

aiming at the sparsity of the point cloud, firstly generating dense point cloud through a K nearest neighbor interpolation module; then, overlapping the dense point cloud information and the image information, and outputting self-adaptive weight of the point cloud characteristics through an attention mechanism module;

and then point-by-point mapping of the point cloud features to the image features is performed through a feature mapping matrix from the point cloud to the image, so that the fusion features can be obtained.

Further, a data set of the network system selects a nuScenes road target detection data set, and the data set is used as a training set and a verification set for target detection training; image data in the nuScenes road target detection data set needs exposure adjustment, and point cloud data needs thinning processing;

the correction formula of the exposure adjustment is as follows:

wherein, V^[k’] _RGBRepresenting the RGB value, V, of the image after exposure correction^[k] _RGBRGB value, alpha, of the original training set image^[k]Denotes an exposure correction factor, beta, of the k-th image^[k]Indicating exposure correction offset of the k picture; the thinning treatment comprises the following steps: in order to simulate the condition that the laser is blocked by opaque liquid drops and particles in the air in the rain and fog weather to cause the point cloud to be sparse, normal distribution is used for simulating the random missing condition of the point cloud. From near to far, far point clouds are more missing, while near point clouds are less missing.

A target detection method applied to unmanned high-robustness multi-sensor data fusion in rainy and snowy weather scenes comprises the following steps:

step 1, selecting a nuScenes road target detection data set as a training set and a verification set for training a target detection method;

step 2, constructing a stage of point cloud and image data feature extraction, feature fusion, foreground point segmentation and candidate frame generation network;

step 3, constructing two-stage point cloud and image data feature extraction, feature fusion and confidence prediction and bounding box regression network;

step 4, setting a multi-target loss function, and performing end-to-end target detection network training by using a training set;

and 5, carrying out real vehicle deployment on the trained optimal target detection network to realize high-robustness road target detection.

Specifically, the exposure of the image data in the nuScenes road target detection data set needs to be adjusted, and the point cloud data needs to be adjusted in a sparse manner.

The first-stage point cloud and image data feature extraction, feature fusion and foreground point segmentation and candidate frame generation network comprises a two-dimensional convolution module, an SA point cloud feature extraction module, a point cloud-based data fusion module and a multi-scale multi-resolution point cloud feature superposition module.

The two-stage point cloud and image data feature extraction, feature fusion and confidence prediction and bounding box regression network comprises a three-dimensional sparse convolution module based on a voxelization method, a two-dimensional convolution module, a data fusion module taking an image as a main part and an SA point cloud feature extraction module based on the voxelization method.

The multi-target loss function is set as the sum of two-stage loss functions, one-stage loss function is set as a candidate frame to generate network loss, two-stage loss functions are set as confidence degree prediction and boundary frame regression loss, and focal loss is selected as confidence degree prediction loss to balance the condition of imbalance of positive and negative samples in training data. The specific setting is as follows:

L＝L_rpn+L_rcnn

L_rcnn＝L_cls+L_reg

L_cls＝-α(1-c_t)^γlogc_t

wherein L represents the final multi-objective loss function; l is_rpnAnd L_rcnnThen representing the loss function of the one-stage and two-stage algorithm network; l is_clsAnd L_regRespectively representing confidence prediction and bounding box regression loss in the two-stage algorithm network; alpha represents the weight occupied by the positive and negative samples; gamma is an inhibition factor and is used for reducing the influence of a large number of simple samples on the training result; c. C_tThe confidence measure is indicated.

The algorithm real-vehicle deployment mainly realizes the real-vehicle deployment of an optimal model through a Robot (ROS) operating system.

Furthermore, the data fusion module mainly based on point cloud mainly comprises a semantic segmentation module, an attention mechanism, a feature matrix generation module and a feature mapping module.

The multi-scale and multi-resolution point cloud feature superposition module comprises a single-scale and multi-radius neighborhood feature extraction and multilayer feature extraction superposition method.

The image-based data fusion module mainly comprises a point cloud K neighbor interpolation module, a foreground point segmentation module, an attention mechanism, a characteristic matrix generation module and a characteristic mapping module.

The SA point cloud feature extraction module based on the voxelization method mainly comprises a point cloud voxelization and key point selection module, a feature projection module and a feature splicing module.

The invention has the beneficial effects that:

1. the invention carries out bidirectional multi-scale fusion on the point cloud information and the image information, effectively improves the robustness of the target detection algorithm under severe working conditions such as rain and snow weather, and reduces the influence of environmental noise including exposure on the sensor information.

2. The invention applies data fusion with point cloud as the main part and image as the auxiliary part in a one-stage algorithm, and completes the tasks of foreground point segmentation and candidate frame generation. And data fusion mainly based on images and assisted by point clouds is applied in the two-stage algorithm, so that confidence prediction and bounding box regression tasks are completed, and the advantages of different sensor data are effectively utilized.

3. According to the invention, in the feature extraction algorithm, different methods are applied to carry out feature extraction on point cloud information and image information according to the application scene and the requirements of specific tasks, and the requirements of the target detection algorithm on feature information loss and feature extraction speed can be simultaneously met.

4. The invention respectively applies two data fusion methods which take the image as the main method and the point cloud as the main method in the data fusion process, and can simultaneously meet the requirements of the road target detection task on the detection precision and the detection speed.

Drawings

FIG. 1 is a flow chart of a method for detecting an unmanned high-robustness multi-sensor data fusion target in rainy and snowy weather;

FIG. 2 is a flow chart of a one-stage algorithm of a high robustness multi-sensor data fusion target detection method applied to unmanned driving in rainy and snowy weather scenes according to the present invention;

FIG. 3 is a flow chart of a two-stage algorithm of a high robustness multi-sensor data fusion target detection method applied to unmanned driving in rainy and snowy weather scenes in accordance with the present invention;

FIG. 4 is a flow chart of a one-stage feature layer data fusion process based on point cloud according to the present invention;

FIG. 5 is a schematic diagram of multi-scale and multi-resolution feature fusion based on an original sparse point cloud according to the present invention;

FIG. 6 is a flow chart of two-stage feature layer data fusion based on images according to the present invention;

FIG. 7 is a schematic diagram of multi-scale feature fusion based on voxel rasterized point cloud.

Detailed Description

In order to make the purpose and technical solution of the present invention more clearly understood, the following detailed description of the embodiments of the present invention is provided with reference to the accompanying drawings.

The invention provides a target detection method applied to unmanned high-robustness multi-sensor data fusion in a rainy and snowy weather scene, as shown in fig. 1, and provides a flow chart of the target detection method applied to unmanned high-robustness multi-sensor data fusion in the rainy and snowy weather scene, which mainly comprises the following steps:

s1, selecting a nuScenes road target detection data set as a training set and a verification set for training a target detection method;

s2, constructing a first-stage point cloud and image data feature extraction, feature fusion, foreground point segmentation and candidate frame generation network;

s3, constructing two-stage point cloud and image data feature extraction, feature fusion, confidence prediction and bounding box regression network;

s4, setting a multi-target loss function, and performing end-to-end target detection network training by using a training set;

and S5, carrying out real vehicle deployment on the trained optimal target detection network, and realizing high-robustness road target detection.

Step 1, selecting a nuScenes road target detection data set as a training set and a verification set for training a target detection method.

In order to ensure that the trained target detection network has better robustness in rainy and snowy weather scenes, the images and point clouds in the data set are corrected in corresponding scenes.

Specifically, exposure correction is performed on the image data, the RGB values of the image pixels are randomly adjusted, and the exposure of the image is increased or decreased, so that image information acquired by a camera in a rainy or snowy weather scene is simulated. Considering that the RGB values of the real-time image information collected by the camera are changed as a whole in a single weather scene, different scenes can be simulated by applying linear transformation to all image data. The exposure correction formula is as follows:

wherein, V^[k’] _RGBRepresenting the RGB value, V, of the image after exposure correction^[k] _RGBRGB value, alpha, of the original training set image^[k]Denotes an exposure correction factor, beta, of the k-th image^[k]The exposure correction offset of the k-th picture is shown.

The point cloud data should be sparsely sampled, and the original point cloud should be sparsely processed randomly. The random missing condition of the point cloud is simulated by using normal distribution. According to the method, the situation that more invalid point clouds such as accumulated water and accumulated snow are obtained under the scene of rain and snow weather in a simulated manner that more point clouds at the far part are lost and less point clouds at the near part are lost from the near part to the far part. Considering that the point cloud distribution is complex under different scenes, the point cloud change conditions under various complex working conditions can be simulated by randomly sampling missing point clouds under the condition that training data are large enough.

And 2, constructing a stage of point cloud and image data feature extraction, feature fusion, foreground point segmentation and candidate frame generation network. Specifically, as shown in fig. 2, in the image processing channel, images collected by 6 cameras uniformly distributed on the top of the unmanned vehicle are spliced into a four-dimensional tensor, and the four-dimensional tensor is input into the two-dimensional convolution module for feature extraction. The image information sequentially generates three image feature vectors with gradually increased scales through three convolution modules with convolution kernels with the length and the width of 3 two-dimensional convolution. Meanwhile, after each feature extraction, the generated image feature vector is input into a one-stage feature layer data fusion module based on the point cloud, and the generated image feature vector and the point cloud feature vector are superposed and fused.

Similarly, in the point cloud processing channel, an SA point cloud module is used for extracting the characteristics of point cloud information acquired by the solid-state laser radar arranged at the top end of the unmanned automobile. After the primary feature extraction is finished, the image feature vector and the extracted point cloud features are correspondingly overlapped and fused, and then the next feature extraction is carried out. The operation is repeated twice, and the fusion features of different scales can be obtained. And processing the sequentially acquired fusion features with different scales by a multi-scale and multi-resolution feature fusion module to obtain the final multi-scale fusion feature.

Further, as shown in fig. 4, the point cloud-based one-stage feature layer data fusion module includes a semantic segmentation module, an attention mechanism, a feature matrix generation module, and a feature mapping module. Because the semantic information and the texture information of the image are rich, the obtained image features are subjected to simple semantic segmentation, and then the pixels can be segmented into foreground points and background points one by one.

Because the image information is affected by the exposure to cause distortion of semantic information to a certain extent, attention needs to be applied to the weight of the image information acquired by self-adaption, so that the interference of the exposure is reduced. Therefore, the semantic information and the image information are superposed, and then the weight of the semantic information is output through the attention mechanism module. And finally, mapping semantic information point by point to point cloud information to obtain a fusion feature.

Further, a multi-scale and multi-resolution feature fusion module is shown in fig. 5. In order to obtain the final multi-scale features, three point cloud down-sampling radii with different sizes are sequentially selected, the features of the point clouds are extracted, and then the features are spliced and fused to obtain the multi-resolution features. And then, respectively down-sampling point clouds with the same number ratio on different scales according to the size of the sampling radius, and splicing and fusing the point clouds with the previously acquired multi-resolution features to obtain the final multi-scale features.

The point cloud is taken as the main fusion characteristic, the advantage that the spatial position information of the three-dimensional point cloud is accurate in the world coordinate system and the advantage that the image semantic information is rich are effectively utilized, and the foreground point segmentation and candidate frame generation tasks can be well completed by connecting the multi-layer perceptron.

Step 3, constructing two-stage point cloud and image data feature extraction, feature fusion and confidence prediction and bounding box regression network; in consideration of the fact that accurate bounding box regression and confidence degree prediction are mainly based on semantic information of image edges, the point cloud features can meet the task requirements of two stages by using a data processing method based on voxelization. In addition, in consideration of the fact that the point cloud information can also obtain a part of invalid point clouds in rainy and snowy weather, sparse sampling based on a voxelization method is more suitable for the scene.

Specifically, as shown in fig. 3, in the point cloud processing channel, a voxelization method is used to perform preliminary processing on the three-dimensional point cloud, then a voxelization point cloud feature is obtained through a three-dimensional sparse convolution module, and a central point is selected at each voxelization grid point to represent a spatial position. And considering that the point cloud in the voxel space is sparse, selecting a proper radius, and selecting a part with relatively dense point cloud in the voxel space for sparse convolution to obtain the point cloud characteristics. Repeating the operation twice to obtain the voxel point cloud characteristics with different scales.

Meanwhile, after each feature extraction, the generated voxel point cloud feature vectors with different scales are input into a two-stage feature layer data fusion module based on an image, and the generated voxel point cloud features and the image features acquired by the two-dimensional convolution module are superposed and fused.

And the image processing channel sequentially passes through the three two-dimensional convolution modules to obtain the image characteristics of different scales. After the preliminary feature extraction is finished, correspondingly overlapping and fusing the generated voxel point cloud features and the obtained image features of different scales, and then performing the next feature extraction. The operation is repeated twice, and the fusion features of different scales can be obtained. And then the final multi-scale fusion features can be obtained after the size of the three fusion features with different scales which are obtained in sequence is adjusted.

Further, as shown in fig. 6, the image-based two-stage feature layer data fusion module includes a point cloud K-neighbor interpolation module, a foreground point segmentation module, an attention mechanism, and a feature matrix generation and feature mapping module.

Because the point cloud information is influenced by the rain and snow environment, a part of invalid point clouds exist, the weight of the point cloud information acquired by self-adaption of an attention mechanism is required to be applied, and therefore interference of the invalid point clouds is restrained. Therefore, the weight of the point cloud features is output through the attention mechanism module after the point cloud information and the image information are superposed. In addition, in consideration of the sparsity of the point cloud, a dense point cloud needs to be generated by a K-nearest neighbor interpolation module.

And finally, point-by-point mapping is carried out on the point cloud characteristics on the image characteristics through a point cloud to image characteristic mapping matrix (the matrix is an inverse matrix of a previously acquired matrix of a projection matrix obtained by a characteristic matrix generation module), and then the fusion characteristics can be obtained.

Further, the voxel-based point cloud feature extraction module in fig. 3 is specifically a multi-scale feature fusion method based on voxel-based rasterized point cloud, as shown in fig. 7, and the sampling can be completed by selecting a voxel feature center point, then taking the center point as a sphere center, and taking the corresponding length as a radius. And (4) acquiring two-dimensional characteristics by projecting to the aerial view, and finally splicing to acquire final multi-scale voxel point cloud characteristics.

The fusion method effectively utilizes the abundant semantic information of the image, the accurate edge information and the enhancement of the point cloud characteristics to the edge information, and then is connected with the corresponding multilayer perceptron to complete the final confidence prediction and bounding box regression tasks.

the multi-objective loss function is set as the sum of two-stage loss functions, the two-stage loss function is set as confidence prediction and bounding box regression loss, and focal loss is selected as confidence prediction loss to balance the condition of imbalance of positive and negative samples in training data. The specific setting is as follows:

L＝L_rpn+L_rcnn

L_rcnn＝L_cls+L_reg

L_cls＝-α(1-c_t)^γlogc_t

wherein L represents the final multi-objective loss function; lrpn and Lrcnn represent loss functions of the one-stage network and the two-stage network; l is_clsAnd L_regRespectively representing confidence prediction and bounding box regression loss in the two-stage network; alpha represents the weight occupied by the positive and negative samples; gamma is an inhibition factor and is used for reducing the influence of a large number of simple samples on the training result; c. C_tThe confidence measure is indicated.

And 5, realizing real vehicle deployment of the optimal model by the trained optimal target detection network through a Robot (ROS) operating system, and realizing high-robustness road target detection.

The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent means or modifications that do not depart from the technical spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. The utility model provides a be applied to under sleet weather scene multi-sensor data fusion's target detection network system which characterized in that includes: the first network system: namely, a network is generated by extracting the characteristics of point cloud and image data, fusing the characteristics, segmenting foreground points and generating candidate frames at a first stage; the second network system: the two-stage point cloud and image data feature extraction, feature fusion, confidence prediction and bounding box regression network are obtained;

2. The network system for target detection applied to multi-sensor data fusion in a rain and snow weather scene as claimed in claim 1, wherein the loss function of the network system is the sum of the loss functions of the first network and the second network system, the second network system loss function is set as confidence level prediction and bounding box regression loss, and the confidence level prediction loss is selected from focal loss to balance the condition of imbalance of positive and negative samples in the training data; the specific setting is as follows:

L＝L_rpn+L_rcnn

L_rcnn＝L_cls+L_reg

L_cls＝-α(1-c_t)^γlogc_t

3. The system of claim 1, wherein the two-dimensional convolution module of the first network system is configured to perform image processing on: splicing images collected by 6 cameras uniformly distributed at the top of the unmanned automobile into a four-dimensional tensor, and inputting the four-dimensional tensor into a two-dimensional convolution module for feature extraction; sequentially generating three image characteristic vectors with gradually increased scales by three two-dimensional convolution modules with different sizes; after each feature extraction, the generated image feature vector is input into a one-stage feature layer data fusion module based on the point cloud, and the generated image feature vector and the point cloud feature vector are superposed and fused.

4. The system of claim 3, wherein the SA point cloud feature extraction module of the first network system is used for point cloud processing, namely, the SA point cloud module is used for feature extraction of point cloud information obtained by a solid-state laser radar arranged at the top of the unmanned vehicle; after the primary point cloud feature extraction is finished, overlapping and fusing the extracted point cloud features and the image feature vector generated in the claim 3, then performing the next point cloud feature extraction, repeating the operation twice to obtain three fusion features with different scales, and then processing the sequentially obtained fusion features with different scales through a multi-scale multi-resolution point cloud feature fusion module to obtain the final multi-scale fusion features.

5. The system of claim 4, wherein the multi-scale multi-resolution point cloud feature fusion module is configured to obtain a final multi-scale fusion feature: sequentially selecting down-sampling radii of the point clouds with different sizes, extracting the characteristics of the point clouds, and splicing and fusing the characteristics to obtain multi-resolution characteristics; and then, respectively sampling a corresponding number of point clouds on different scales, and splicing and fusing the point clouds with the previously acquired multi-resolution features to obtain the final multi-scale features.

6. The system of claim 1, wherein the point cloud-based one-stage feature layer data fusion module of the first network system comprises a semantic segmentation module, an attention mechanism module, a feature matrix generation module, and a feature mapping module; aiming at the richer semantic information and texture information of the image, the semantic segmentation is carried out on the acquired image characteristics, so that the pixels can be segmented into foreground points and background points one by one;

aiming at the distortion of the semantic information to a certain degree caused by the influence of the exposure on the image information, the attention machine module is adopted to self-adapt the weight of the acquired image information, namely the weight of the semantic information is output through the attention machine module after the semantic information and the image information are overlapped; and then, obtaining a feature mapping matrix from the image to the point cloud through external reference calibration and internal reference calibration of the camera, and finally mapping the semantic information to the point cloud information point by point to obtain a fusion feature.

7. The system of claim 1, wherein the second network system comprises a point cloud processing channel and an image processing channel;

8. The system of claim 7, wherein the image-based two-stage feature layer data fusion module comprises a K-nearest neighbor interpolation module, an attention mechanism module, and a feature mapping module;

9. The network system for target detection applied to multi-sensor data fusion in the rainy and snowy weather scenes as claimed in any one of claims 1-8, wherein the dataset of the network system is a nuScenes road target detection dataset, and the dataset is used as a training set and a verification set for target detection training; image data in the nuScenes road target detection data set needs exposure adjustment, and point cloud data needs thinning processing;

the correction formula of the exposure adjustment is as follows:

wherein, V^[k’] _RGBRepresenting the RGB value, V, of the image after exposure correction^[k] _RGBRGB value, alpha, of the original training set image^[k]Exposure correction system for k-th imageNumber, beta^[k]Indicating exposure correction offset of the k picture;

the thinning treatment comprises the following steps: and simulating the random missing condition of the point cloud by using normal distribution, and simulating the condition of acquiring invalid point cloud in a rain and snow weather scene according to the condition that the point cloud at a far place is more missing and the point cloud at a near place is less missing from near to far.

10. A target detection method for multi-sensor data fusion in a rain and snow weather scene is characterized by comprising the following steps:

and (3) carrying out exposure correction on the image data in the data set: randomly adjusting the RGB value of the image pixel, and increasing or decreasing the exposure of the image, wherein the exposure correction formula is as follows:

wherein, V^[k’] _RGBRepresenting the RGB value, V, of the image after exposure correction^[k] _RGBRGB value, alpha, of the original training set image^[k]Denotes an exposure correction factor, beta, of the k-th image^[k]Indicating exposure correction offset of the k picture;

sparse sampling is carried out on point cloud data in the data set, and sparse processing is carried out on original point cloud randomly, wherein the specific processing method comprises the following steps: simulating the random missing condition of the point cloud by using normal distribution, and simulating the condition of acquiring invalid point cloud under a rain and snow weather scene according to the condition that the point cloud at a far position is more missing and the point cloud at a near position is less missing from near to far;

step 2, constructing a stage of point cloud and image data feature extraction, feature fusion, foreground point segmentation and candidate frame generation network, and realizing feature fusion mainly based on the point cloud, specifically:

in an image processing channel, images collected by 6 cameras uniformly distributed at the top of an unmanned vehicle are spliced into a four-dimensional tensor to be input into a two-dimensional convolution module for feature extraction, and image information sequentially generates three image feature vectors with gradually increased scales through three two-dimensional convolution modules with different sizes; meanwhile, after each feature extraction, the generated image feature vector is input into a one-stage feature layer data fusion module based on the point cloud, and the generated image feature vector and the point cloud feature vector are superposed and fused;

in a point cloud processing channel, an SA point cloud feature extraction module is used for extracting features of point cloud information acquired by a solid-state laser radar arranged at the top end of an unmanned vehicle, after primary feature extraction is completed, the image feature vectors and the extracted point cloud features are superposed and fused, and then the next feature extraction is performed; repeating the operation twice to obtain three fusion features with different scales, and processing the sequentially obtained fusion features with different scales by a multi-scale multi-resolution point cloud feature fusion module to obtain final multi-scale fusion features;

the point cloud-based one-stage feature layer data fusion module comprises a semantic segmentation module, an attention mechanism, a feature matrix generation module and a feature mapping module; aiming at the fact that semantic information and texture information of an image are rich, semantic segmentation is carried out on the obtained image features, and then pixels can be segmented into foreground points and background points one by one;

aiming at the fact that semantic information is distorted to a certain extent due to the influence of exposure degree on the image information, the attention mechanism is adopted to adaptively obtain the weight of the image information, namely the semantic information and the image information are superposed and then the attention mechanism module outputs the self-adaptive weight of the semantic information; then, obtaining a feature mapping matrix from the image to the point cloud through external reference calibration and internal reference calibration of the camera, and finally mapping semantic information to the point cloud information point by point to obtain a fusion feature;

wherein, in the multi-scale multi-resolution point cloud feature fusion module: in order to obtain the final multi-scale features, sequentially selecting three point cloud down-sampling radii with different sizes, extracting the features of the point cloud, and splicing and fusing the features to obtain multi-resolution features; then, respectively sampling a corresponding number of point clouds on different scales, and splicing and fusing the point clouds with the previously acquired multi-resolution features to obtain final multi-scale features;

step 3, constructing a network of two-stage point cloud and image data feature extraction, feature fusion, confidence prediction and bounding box regression, and realizing feature fusion mainly based on images, specifically:

in a point cloud processing channel, performing primary processing on the three-dimensional point cloud by using a voxelization method, then obtaining voxelization point cloud characteristics through a three-dimensional sparse convolution module, selecting a central point at each voxelization lattice point to represent a space position, considering that the voxelization lattice point cloud is sparse, selecting a proper radius, selecting a part with relatively dense point cloud in the voxelization lattice space, performing sparse convolution, and obtaining point cloud characteristics; repeating the operation twice to obtain the voxel point cloud characteristics of different scales;

after each feature extraction, inputting the generated voxel point cloud feature vectors with different scales into a two-stage feature layer data fusion module based on an image, namely performing superposition fusion on the generated voxel point cloud features and the image features acquired by a two-dimensional convolution module;

in an image processing channel, sequentially passing through three two-dimensional convolution modules to obtain image features of different scales, correspondingly superposing and fusing the generated voxel point cloud features and the obtained image features of different scales after primary feature extraction is completed, then performing next feature extraction, repeating the operation twice to obtain three fusion features of different scales, and then performing size adjustment on the three fusion features of different scales sequentially to obtain final multi-scale fusion features;

the image-based two-stage feature layer data fusion module comprises a point cloud K neighbor interpolation module, an attention mechanism, a feature matrix generation module and a feature mapping module;

aiming at the point cloud information influenced by the rain and snow environment, invalid point clouds exist, so the attention machine module is adopted to self-adaptively acquire the weight of the point cloud information, namely the point cloud information and the image information are superposed and then the self-adaptive weight of the point cloud characteristics is output through the attention machine module; in addition, in consideration of the sparsity of the point cloud, a dense point cloud needs to be generated through a K nearest neighbor interpolation module;

then, point-by-point mapping of the point cloud features to the image features is carried out through a feature mapping matrix from the point cloud to the image, and then fusion features can be obtained;

the multi-target loss function is set as the sum of two-stage loss functions, one-stage loss function is set as a candidate frame to generate network loss, two-stage loss functions are set as confidence prediction and boundary frame regression loss, and focal loss is selected as confidence prediction loss to balance the condition of imbalance of positive and negative samples in training data, and the multi-target loss function is specifically set as the following formula:

L＝L_rpn+L_rcnn

L_rcnn＝L_cls+L_reg

L_cls＝-α(1-c_t)^γlogc_t

wherein L represents the final multi-objective loss function; lrpn and Lrcnn represent loss functions of the one-stage network and the two-stage network; l is_clsAnd L_regRespectively representing confidence prediction and bounding box regression loss in the two-stage network; alpha represents the weight occupied by the positive and negative samples; gamma is an inhibition factor and is used for reducing the influence of a large number of simple samples on the training result; c. C_tThen representing a confidence degree predicted value;