CN112801928B - Attention mechanism-based millimeter wave radar and visual sensor fusion method - Google Patents

Attention mechanism-based millimeter wave radar and visual sensor fusion method Download PDF

Info

Publication number
CN112801928B
CN112801928B CN202110282139.XA CN202110282139A CN112801928B CN 112801928 B CN112801928 B CN 112801928B CN 202110282139 A CN202110282139 A CN 202110282139A CN 112801928 B CN112801928 B CN 112801928B
Authority
CN
China
Prior art keywords
millimeter wave
wave radar
visual image
visual
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110282139.XA
Other languages
Chinese (zh)
Other versions
CN112801928A (en
Inventor
杨猛
沈韬
曾凯
么长慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kunming University of Science and Technology
Original Assignee
Kunming University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kunming University of Science and Technology filed Critical Kunming University of Science and Technology
Priority to CN202110282139.XA priority Critical patent/CN112801928B/en
Publication of CN112801928A publication Critical patent/CN112801928A/en
Application granted granted Critical
Publication of CN112801928B publication Critical patent/CN112801928B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/06Topological mapping of higher dimensional structures onto lower dimensional surfaces
    • G06T3/067Reshaping or unfolding 3D tree structures onto 2D planes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Radar Systems Or Details Thereof (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method for fusing a millimeter wave radar and a visual sensor based on an attention mechanism, and belongs to the technical field of artificial intelligence. Firstly, determining a key detection area by using the spatial information of the radar in the space of a data layer, and highlighting the characteristics of the key detection area to form soft attention in the space; secondly, a channel attention weight learning method is used on a channel of the feature layer to reasonably distribute the millimeter wave radar and the visual weight, so that the problem of weight distribution of the millimeter wave radar and the visual fusion is solved. Compared with the prior art, the method utilizes the soft attention of the space and the attention weight of the channel to learn in the space and the channel respectively, and solves the problems that the detection effect of pedestrians and small objects fused in the traditional data layer is poor, and the weight distribution of millimeter wave radars and vision is generated in the fusion of the feature layer.

Description

Attention mechanism-based millimeter wave radar and visual sensor fusion method
Technical Field
The invention discloses a method for fusing a millimeter wave radar and a visual sensor based on an attention mechanism, and belongs to the technical field of artificial intelligence.
Background
At present, a target detection method of fusion of a millimeter wave radar and vision is commonly used in the fields of automatic driving and the like, the millimeter wave radar is commonly used for generating a visual region of interest, or a detection result obtained by the millimeter wave radar through a clustering algorithm is fused with a visual detection result in a decision layer; the two strategies have the disadvantages of easy missing detection of small targets, high calculation cost and difficulty in establishing a probability model.
Disclosure of Invention
The invention aims to provide a millimeter wave radar and visual sensor fusion method based on an attention mechanism, which is characterized in that 3D point cloud information of a millimeter wave radar receiving object is converted into a 2D plane image consistent with vision; determining a key detection area by using the spatial information of the radar in the space of the data layer, and highlighting the characteristics of the key detection area to form soft attention in the space; the overall detection effect is improved, and meanwhile the problem that the detection effect of pedestrians and small objects fused by the traditional data layer is poor is solved; secondly, a channel attention learning method is used on a channel of the feature layer to distribute the millimeter wave radar and the visual weight, so that the problem of weight distribution of the millimeter wave radar and the visual fusion is solved.
In order to achieve the purpose, the technical scheme provided by the invention is as follows:
step S1: and scanning by a millimeter wave radar to obtain 3D point cloud data, and acquiring visual image information by a visual sensor.
Step S2: and converting the millimeter wave radar 3D point cloud data into a 2D vertical plane consistent with the visual image.
And step S3: and generating a two-dimensional matrix with the same size as the visual image by the millimeter wave radar image, and recording the two-dimensional matrix as a radar two-dimensional matrix.
And step S4: highlight the key detection area and key area features in the visual image with spatially soft attention.
Step S5: and extracting the millimeter wave radar features and the visual image features, and cascading the millimeter wave radar features and the visual image features.
Step S6: and sending the cascade features into the attention of an SE channel for weight learning.
Step S7: classification and identification was performed with RetinaNet.
Preferably, the specific steps of converting the millimeter wave radar 3D point cloud into the 2D vertical plane consistent with the visual image by S2 of the present invention are as follows:
s2.1: and converting the coordinates in the millimeter wave radar coordinate system into a world coordinate system taking the camera as the center.
S2.2: the coordinates of the world coordinate system are converted to the camera coordinate system.
S2.3: the coordinates of the camera coordinate system are converted to the image coordinate system.
Preferably, in S4 of the present invention, the spatial soft attention is used to highlight the feature of the key detection area and the key area, and the specific steps are as follows:
s4.1: counting a radar two-dimensional matrix N and a visual image matrix C;
s4.2: determining a key detection area of the visual image by using a millimeter wave radar: performing point multiplication on the radar two-dimensional matrix N and the visual image matrix C to obtain a matrix H, namely a key detection area;
s4.3: highlighting the characteristics of the visual emphasis detection area, forming spatial soft attention, avoiding missing detection of small objects such as pedestrians caused by low resolution of the millimeter wave radar: element-level addition is carried out on the matrix H and the image matrix C to obtain the feature M of the key detection area, namelyM=H+C
The method of the invention uses radar to generate spatial soft attention, and applies a spatial attention mechanism to multiple modes instead of a simple image.
According to the method, the weights of the millimeter wave radar and the vision are reasonably distributed on the characteristic channel with the fusion of the millimeter wave radar and the vision in a channel attention weight learning mode, and the problem that the weights of the millimeter wave radar and the vision are difficult to distribute is solved; the channel attention in the image is used in the weight assignment of the multi-modal fusion.
The method disclosed by the invention is used for fusing the space information and the characteristic information of the millimeter wave radar and the vision from the perspective of space and a channel respectively at a data layer and a characteristic layer, so that the detection precision and the recall rate are improved.
The invention has the beneficial effects that:
compared with the prior art, the method highlights the key visual detection area by utilizing the spatial soft attention method, improves the overall detection precision and recall rate, solves the problem of missing detection of pedestrians and small objects caused by the traditional data level fusion, and solves the problem of distribution of millimeter wave radar and visual weight by utilizing the channel attention weight learning method.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a detailed flow chart of spatial soft attention;
FIG. 3 is a graph showing the effect of the test according to the embodiment.
Detailed Description
The present invention is further described in detail with reference to the following specific examples, but the scope of the present invention is not limited to the above description.
A millimeter wave radar and vision sensor fusion method based on an attention mechanism comprises the following specific steps:
step S1: and downloading a nuScenes data set, and reading forward millimeter wave radar data and forward visual image key frame information in the data set.
Step S2: and converting the millimeter wave radar 3D point cloud data into a 2D vertical plane consistent with the visual image.
S2.1: converting the coordinates under the millimeter wave radar coordinate system into a world coordinate system taking a camera as a center;
s2.2: converting the coordinates of the world coordinate system to a camera coordinate system;
s2.3: the coordinates of the camera coordinate system are converted to the image coordinate system.
And step S3: generating a two-dimensional matrix with the same size by the millimeter wave radar image and the visual image;
s3.1: the millimeter wave radar detection area is large, the radar image area exceeding the size of the visual image is removed, and both the radar image and the visual image are constructed to be 800 × 1200.
S3.2: scaling the radar image and the visual image to 360 × 640, and sending the scaled radar image and the visual image into a modified VGG 16;
and step S4: highlighting the key detection region and key region features in the visual image with spatially soft attention.
S4.1: extracting multi-scale information of the millimeter wave radar image by using convolution kernels of 3 multiplied by 3 and 5 multiplied by 5 to generate a single-channel matrix with the size of 360 multiplied by 640, wherein the counting matrix is N
S4.2: visual image information is extracted using two 3 x 3 convolution kernels, generating a 3-channel matrix of size 360 x 640, counting as C.
S4.3: determining a key detection area of the visual image by using a millimeter wave radar: performing dot multiplication on each channel of the radar two-dimensional matrix N and the visual image matrix C to obtain a matrix H, namely a key detection area;
s4.4: the visual emphasis detection area characteristics are highlighted, the soft attention of the space is formed, and the missing detection of small objects such as pedestrians caused by low resolution of the millimeter wave radar is avoided: element-level addition is carried out on the matrix H and the image matrix C to obtain the feature M of the key detection area, namelyM=H+C
Step S5: and extracting the millimeter wave radar features and the visual image features, and cascading the millimeter wave radar features and the visual image features.
Step S6: and sending the cascade features into the attention of an SE channel for weight learning.
Step S7: classification and identification was performed with RetinaNet.
Effects of the embodiment: by training and testing with the nuScenes dataset, the present invention can significantly improve the average mean of precision (mAP) and the average recall (mAR) of the detection. The mAP of 2.6 percent and the mAR of 13.2 percent are respectively improved in a sunny scene; increased by 2.1% mAP and 3.7% mAR, respectively, in a rainy day scenario; the mAP of 0.6% and the mAR of 3.7% are respectively improved under a night scene, and the test effect is shown in figure 3.
While the present invention has been described in detail with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art.

Claims (1)

1. A method for fusing a millimeter wave radar and a vision sensor based on an attention mechanism is characterized by comprising the following steps:
step S1: scanning by a millimeter wave radar to obtain 3D point cloud data, and acquiring visual image information by a visual sensor;
step S2: converting the millimeter wave radar 3D point cloud data into a 2D vertical plane consistent with the visual image;
and step S3: generating a two-dimensional matrix with the same size as the visual image by the millimeter wave radar image, and recording the two-dimensional matrix as a radar two-dimensional matrix;
and step S4: highlighting key detection areas and key area features in the visual image by using spatial soft attention;
step S5: extracting millimeter wave radar features and visual image features, and cascading the millimeter wave radar features and the visual image features;
step S6: sending the cascade characteristics into attention of an SE channel for weight learning;
step S7: classifying and identifying by RetinaNet;
the S2 is used for converting the millimeter wave radar 3D point cloud into a 2D vertical plane consistent with the visual image, and the specific steps are as follows:
s2.1: converting the coordinates under the millimeter wave radar coordinate system into a world coordinate system taking the camera as the center;
s2.2: converting the coordinates of the world coordinate system to a camera coordinate system;
s2.3: converting the coordinates of the camera coordinate system to an image coordinate system;
in S4, the spatial soft attention is used to highlight the characteristics of the key detection area and the key area, and the specific steps are as follows:
s4.1: counting a radar two-dimensional matrix N and a visual image matrix C;
s4.2: determining a key detection area of the visual image by using a millimeter wave radar: performing dot multiplication on the radar two-dimensional matrix N and the visual image matrix C to obtain a matrix H, namely a key detection area;
s4.3: highlighting the characteristics of the visual emphasis detection area to form spatial soft attention: element-level addition is carried out on the matrix H and the image matrix C to obtain the feature M of the key detection area, namelyM=H+C
CN202110282139.XA 2021-03-16 2021-03-16 Attention mechanism-based millimeter wave radar and visual sensor fusion method Active CN112801928B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110282139.XA CN112801928B (en) 2021-03-16 2021-03-16 Attention mechanism-based millimeter wave radar and visual sensor fusion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110282139.XA CN112801928B (en) 2021-03-16 2021-03-16 Attention mechanism-based millimeter wave radar and visual sensor fusion method

Publications (2)

Publication Number Publication Date
CN112801928A CN112801928A (en) 2021-05-14
CN112801928B true CN112801928B (en) 2022-11-29

Family

ID=75816995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110282139.XA Active CN112801928B (en) 2021-03-16 2021-03-16 Attention mechanism-based millimeter wave radar and visual sensor fusion method

Country Status (1)

Country Link
CN (1) CN112801928B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708585B (en) * 2022-04-15 2023-10-10 电子科技大学 Attention mechanism-based millimeter wave radar and vision fusion three-dimensional target detection method
CN115273460A (en) * 2022-06-28 2022-11-01 重庆长安汽车股份有限公司 Multi-mode perception fusion vehicle lane change prediction method, computer equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574376A (en) * 2014-12-24 2015-04-29 重庆大学 Anti-collision method based on joint verification of binocular vision and laser radar in congested traffic
CN106908783A (en) * 2017-02-23 2017-06-30 苏州大学 Obstacle detection method based on multi-sensor information fusion
CN111242207A (en) * 2020-01-08 2020-06-05 天津大学 Three-dimensional model classification and retrieval method based on visual saliency information sharing
CN111797717A (en) * 2020-06-17 2020-10-20 电子科技大学 High-speed high-precision SAR image ship detection method
CN111965636A (en) * 2020-07-20 2020-11-20 重庆大学 Night target detection method based on millimeter wave radar and vision fusion

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109443369A (en) * 2018-08-20 2019-03-08 北京主线科技有限公司 The method for constructing sound state grating map using laser radar and visual sensor
CN110135485A (en) * 2019-05-05 2019-08-16 浙江大学 The object identification and localization method and system that monocular camera is merged with millimetre-wave radar
US11361470B2 (en) * 2019-05-09 2022-06-14 Sri International Semantically-aware image-based visual localization
CN110390695B (en) * 2019-06-28 2023-05-23 东南大学 Laser radar and camera fusion calibration system and calibration method based on ROS
CN110363158B (en) * 2019-07-17 2021-05-25 浙江大学 Millimeter wave radar and visual cooperative target detection and identification method based on neural network
CN110456811A (en) * 2019-08-22 2019-11-15 台州学院 Unmanned plane selectivity obstacle avoidance system and method based on binocular vision and three axis holders
CN111060904B (en) * 2019-12-25 2022-03-15 中国汽车技术研究中心有限公司 Blind area monitoring method based on millimeter wave and vision fusion perception
CN111950467B (en) * 2020-08-14 2021-06-25 清华大学 Fusion network lane line detection method based on attention mechanism and terminal equipment
CN112200750B (en) * 2020-10-21 2022-08-05 华中科技大学 Ultrasonic image denoising model establishing method and ultrasonic image denoising method
CN112215306B (en) * 2020-11-18 2023-03-31 同济大学 Target detection method based on fusion of monocular vision and millimeter wave radar
CN112419155B (en) * 2020-11-26 2022-04-15 武汉大学 Super-resolution reconstruction method for fully-polarized synthetic aperture radar image

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104574376A (en) * 2014-12-24 2015-04-29 重庆大学 Anti-collision method based on joint verification of binocular vision and laser radar in congested traffic
CN106908783A (en) * 2017-02-23 2017-06-30 苏州大学 Obstacle detection method based on multi-sensor information fusion
CN111242207A (en) * 2020-01-08 2020-06-05 天津大学 Three-dimensional model classification and retrieval method based on visual saliency information sharing
CN111797717A (en) * 2020-06-17 2020-10-20 电子科技大学 High-speed high-precision SAR image ship detection method
CN111965636A (en) * 2020-07-20 2020-11-20 重庆大学 Night target detection method based on millimeter wave radar and vision fusion

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
VisDrone-DET2019: The Vision Meets Drone Object Detection in Image Challenge Results;Dawei Du 等;《Proceedings of the IEEE/CVF International Conference on Computer Vision(ICCV)》;20191231;1-14 *
基于卷积神经网络的SAR图像目标检测技术研究;陈诗琪;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20210115(第01期);I136-1044 *
星载SAR海陆交界图像舰船目标检测技术研究;李滕;《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑》;20200215(第02期);C036-224 *

Also Published As

Publication number Publication date
CN112801928A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
Sharma et al. YOLOrs: Object detection in multimodal remote sensing imagery
CN110675418B (en) Target track optimization method based on DS evidence theory
CN113052835B (en) Medicine box detection method and system based on three-dimensional point cloud and image data fusion
CN114724120B (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
CN114092780B (en) Three-dimensional target detection method based on fusion of point cloud and image data
CN111797716A (en) Single target tracking method based on Siamese network
Dai et al. Multi-task faster R-CNN for nighttime pedestrian detection and distance estimation
CN111723693B (en) Crowd counting method based on small sample learning
CN112801928B (en) Attention mechanism-based millimeter wave radar and visual sensor fusion method
Wang et al. An advanced YOLOv3 method for small-scale road object detection
CN112215296B (en) Infrared image recognition method based on transfer learning and storage medium
Li et al. Bifnet: Bidirectional fusion network for road segmentation
CN113408584B (en) RGB-D multi-modal feature fusion 3D target detection method
EP4174792A1 (en) Method for scene understanding and semantic analysis of objects
Wang et al. Radar ghost target detection via multimodal transformers
CN111914615A (en) Fire-fighting area passability analysis system based on stereoscopic vision
CN113762009A (en) Crowd counting method based on multi-scale feature fusion and double-attention machine mechanism
TW202225730A (en) High-efficiency LiDAR object detection method based on deep learning through direct processing of 3D point data to obtain a concise and fast 3D feature to solve the shortcomings of complexity and time-consuming of the current voxel network model
CN117115555A (en) Semi-supervised three-dimensional target detection method based on noise data
Feng Mask RCNN-based single shot multibox detector for gesture recognition in physical education
Gu et al. Radar-enhanced image fusion-based object detection for autonomous driving
Hu et al. DMFFNet: Dual-mode multi-scale feature fusion-based pedestrian detection method
He et al. Automatic detection and mapping of solar photovoltaic arrays with deep convolutional neural networks in high resolution satellite images
Ma et al. LGNet: Local and global point dependency network for 3D object detection
CN110738123A (en) Method and device for identifying densely displayed commodities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant