CN112801928B

CN112801928B - Attention mechanism-based millimeter wave radar and visual sensor fusion method

Info

Publication number: CN112801928B
Application number: CN202110282139.XA
Authority: CN
Inventors: 杨猛; 沈韬; 曾凯; 么长慧
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2021-03-16
Filing date: 2021-03-16
Publication date: 2022-11-29
Anticipated expiration: 2041-03-16
Also published as: CN112801928A

Abstract

The invention discloses a method for fusing a millimeter wave radar and a visual sensor based on an attention mechanism, and belongs to the technical field of artificial intelligence. Firstly, determining a key detection area by using the spatial information of the radar in the space of a data layer, and highlighting the characteristics of the key detection area to form soft attention in the space; secondly, a channel attention weight learning method is used on a channel of the feature layer to reasonably distribute the millimeter wave radar and the visual weight, so that the problem of weight distribution of the millimeter wave radar and the visual fusion is solved. Compared with the prior art, the method utilizes the soft attention of the space and the attention weight of the channel to learn in the space and the channel respectively, and solves the problems that the detection effect of pedestrians and small objects fused in the traditional data layer is poor, and the weight distribution of millimeter wave radars and vision is generated in the fusion of the feature layer.

Description

Attention mechanism-based millimeter wave radar and visual sensor fusion method

Technical Field

The invention discloses a method for fusing a millimeter wave radar and a visual sensor based on an attention mechanism, and belongs to the technical field of artificial intelligence.

Background

At present, a target detection method of fusion of a millimeter wave radar and vision is commonly used in the fields of automatic driving and the like, the millimeter wave radar is commonly used for generating a visual region of interest, or a detection result obtained by the millimeter wave radar through a clustering algorithm is fused with a visual detection result in a decision layer; the two strategies have the disadvantages of easy missing detection of small targets, high calculation cost and difficulty in establishing a probability model.

Disclosure of Invention

The invention aims to provide a millimeter wave radar and visual sensor fusion method based on an attention mechanism, which is characterized in that 3D point cloud information of a millimeter wave radar receiving object is converted into a 2D plane image consistent with vision; determining a key detection area by using the spatial information of the radar in the space of the data layer, and highlighting the characteristics of the key detection area to form soft attention in the space; the overall detection effect is improved, and meanwhile the problem that the detection effect of pedestrians and small objects fused by the traditional data layer is poor is solved; secondly, a channel attention learning method is used on a channel of the feature layer to distribute the millimeter wave radar and the visual weight, so that the problem of weight distribution of the millimeter wave radar and the visual fusion is solved.

In order to achieve the purpose, the technical scheme provided by the invention is as follows:

step S1: and scanning by a millimeter wave radar to obtain 3D point cloud data, and acquiring visual image information by a visual sensor.

Step S2: and converting the millimeter wave radar 3D point cloud data into a 2D vertical plane consistent with the visual image.

And step S3: and generating a two-dimensional matrix with the same size as the visual image by the millimeter wave radar image, and recording the two-dimensional matrix as a radar two-dimensional matrix.

And step S4: highlight the key detection area and key area features in the visual image with spatially soft attention.

Step S5: and extracting the millimeter wave radar features and the visual image features, and cascading the millimeter wave radar features and the visual image features.

Step S6: and sending the cascade features into the attention of an SE channel for weight learning.

Step S7: classification and identification was performed with RetinaNet.

Preferably, the specific steps of converting the millimeter wave radar 3D point cloud into the 2D vertical plane consistent with the visual image by S2 of the present invention are as follows:

s2.1: and converting the coordinates in the millimeter wave radar coordinate system into a world coordinate system taking the camera as the center.

S2.2: the coordinates of the world coordinate system are converted to the camera coordinate system.

S2.3: the coordinates of the camera coordinate system are converted to the image coordinate system.

Preferably, in S4 of the present invention, the spatial soft attention is used to highlight the feature of the key detection area and the key area, and the specific steps are as follows:

s4.1: counting a radar two-dimensional matrix N and a visual image matrix C;

s4.2: determining a key detection area of the visual image by using a millimeter wave radar: performing point multiplication on the radar two-dimensional matrix N and the visual image matrix C to obtain a matrix H, namely a key detection area;

s4.3: highlighting the characteristics of the visual emphasis detection area, forming spatial soft attention, avoiding missing detection of small objects such as pedestrians caused by low resolution of the millimeter wave radar: element-level addition is carried out on the matrix H and the image matrix C to obtain the feature M of the key detection area, namelyM=H+C。

The method of the invention uses radar to generate spatial soft attention, and applies a spatial attention mechanism to multiple modes instead of a simple image.

According to the method, the weights of the millimeter wave radar and the vision are reasonably distributed on the characteristic channel with the fusion of the millimeter wave radar and the vision in a channel attention weight learning mode, and the problem that the weights of the millimeter wave radar and the vision are difficult to distribute is solved; the channel attention in the image is used in the weight assignment of the multi-modal fusion.

The method disclosed by the invention is used for fusing the space information and the characteristic information of the millimeter wave radar and the vision from the perspective of space and a channel respectively at a data layer and a characteristic layer, so that the detection precision and the recall rate are improved.

The invention has the beneficial effects that:

compared with the prior art, the method highlights the key visual detection area by utilizing the spatial soft attention method, improves the overall detection precision and recall rate, solves the problem of missing detection of pedestrians and small objects caused by the traditional data level fusion, and solves the problem of distribution of millimeter wave radar and visual weight by utilizing the channel attention weight learning method.

Drawings

FIG. 1 is a general flow chart of the present invention;

FIG. 2 is a detailed flow chart of spatial soft attention;

FIG. 3 is a graph showing the effect of the test according to the embodiment.

Detailed Description

The present invention is further described in detail with reference to the following specific examples, but the scope of the present invention is not limited to the above description.

A millimeter wave radar and vision sensor fusion method based on an attention mechanism comprises the following specific steps:

step S1: and downloading a nuScenes data set, and reading forward millimeter wave radar data and forward visual image key frame information in the data set.

S2.1: converting the coordinates under the millimeter wave radar coordinate system into a world coordinate system taking a camera as a center;

s2.2: converting the coordinates of the world coordinate system to a camera coordinate system;

And step S3: generating a two-dimensional matrix with the same size by the millimeter wave radar image and the visual image;

s3.1: the millimeter wave radar detection area is large, the radar image area exceeding the size of the visual image is removed, and both the radar image and the visual image are constructed to be 800 × 1200.

S3.2: scaling the radar image and the visual image to 360 × 640, and sending the scaled radar image and the visual image into a modified VGG 16;

and step S4: highlighting the key detection region and key region features in the visual image with spatially soft attention.

S4.1: extracting multi-scale information of the millimeter wave radar image by using convolution kernels of 3 multiplied by 3 and 5 multiplied by 5 to generate a single-channel matrix with the size of 360 multiplied by 640, wherein the counting matrix is N

S4.2: visual image information is extracted using two 3 x 3 convolution kernels, generating a 3-channel matrix of size 360 x 640, counting as C.

S4.3: determining a key detection area of the visual image by using a millimeter wave radar: performing dot multiplication on each channel of the radar two-dimensional matrix N and the visual image matrix C to obtain a matrix H, namely a key detection area;

s4.4: the visual emphasis detection area characteristics are highlighted, the soft attention of the space is formed, and the missing detection of small objects such as pedestrians caused by low resolution of the millimeter wave radar is avoided: element-level addition is carried out on the matrix H and the image matrix C to obtain the feature M of the key detection area, namelyM=H+C。

Step S7: classification and identification was performed with RetinaNet.

Effects of the embodiment: by training and testing with the nuScenes dataset, the present invention can significantly improve the average mean of precision (mAP) and the average recall (mAR) of the detection. The mAP of 2.6 percent and the mAR of 13.2 percent are respectively improved in a sunny scene; increased by 2.1% mAP and 3.7% mAR, respectively, in a rainy day scenario; the mAP of 0.6% and the mAR of 3.7% are respectively improved under a night scene, and the test effect is shown in figure 3.

While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims

1. A method for fusing a millimeter wave radar and a vision sensor based on an attention mechanism is characterized by comprising the following steps:

step S1: scanning by a millimeter wave radar to obtain 3D point cloud data, and acquiring visual image information by a visual sensor;

step S2: converting the millimeter wave radar 3D point cloud data into a 2D vertical plane consistent with the visual image;

and step S3: generating a two-dimensional matrix with the same size as the visual image by the millimeter wave radar image, and recording the two-dimensional matrix as a radar two-dimensional matrix;

and step S4: highlighting key detection areas and key area features in the visual image by using spatial soft attention;

step S5: extracting millimeter wave radar features and visual image features, and cascading the millimeter wave radar features and the visual image features;

step S6: sending the cascade characteristics into attention of an SE channel for weight learning;

step S7: classifying and identifying by RetinaNet;

the S2 is used for converting the millimeter wave radar 3D point cloud into a 2D vertical plane consistent with the visual image, and the specific steps are as follows:

s2.1: converting the coordinates under the millimeter wave radar coordinate system into a world coordinate system taking the camera as the center;

s2.3: converting the coordinates of the camera coordinate system to an image coordinate system;

in S4, the spatial soft attention is used to highlight the characteristics of the key detection area and the key area, and the specific steps are as follows:

s4.1: counting a radar two-dimensional matrix N and a visual image matrix C;

s4.2: determining a key detection area of the visual image by using a millimeter wave radar: performing dot multiplication on the radar two-dimensional matrix N and the visual image matrix C to obtain a matrix H, namely a key detection area;

s4.3: highlighting the characteristics of the visual emphasis detection area to form spatial soft attention: element-level addition is carried out on the matrix H and the image matrix C to obtain the feature M of the key detection area, namelyM=H+C。