CN116129234A - Attention-based 4D millimeter wave radar and vision fusion method - Google Patents

Attention-based 4D millimeter wave radar and vision fusion method Download PDF

Info

Publication number
CN116129234A
CN116129234A CN202310237553.8A CN202310237553A CN116129234A CN 116129234 A CN116129234 A CN 116129234A CN 202310237553 A CN202310237553 A CN 202310237553A CN 116129234 A CN116129234 A CN 116129234A
Authority
CN
China
Prior art keywords
features
image
feature
bev
radar
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310237553.8A
Other languages
Chinese (zh)
Inventor
彭树生
刁天涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Science and Technology
Original Assignee
Nanjing University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Science and Technology filed Critical Nanjing University of Science and Technology
Priority to CN202310237553.8A priority Critical patent/CN116129234A/en
Publication of CN116129234A publication Critical patent/CN116129234A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention discloses a method for fusing 4D millimeter wave radar and vision based on attention, which comprises the following steps: extracting radar trunk characteristics of the 4D millimeter wave Lei Dadian cloud data by adopting a voxel format to obtain characteristics of the radar in the BEV space; extracting the main features of the image data to obtain the 2D features of the image; projecting 2D features of the image through a view projection module, intensively predicting depth in a classification mode, and obtaining features of the image in a BEV space according to the predicted image depth and external parameters of a camera; and finally, fusing the millimeter wave radar and the visual characteristics by using a attention mechanism at a characteristic layer through a designed fusion module, and reasonably distributing the weights of the millimeter wave radar and the visual characteristics. The invention solves the problems that the 4D millimeter wave radar and vision are mutually dependent and the weight is difficult to distribute.

Description

Attention-based 4D millimeter wave radar and vision fusion method
Technical Field
The invention relates to the technical field of radar and vision fusion, in particular to a attention-based 4D millimeter wave radar and vision fusion method.
Background
Millimeter wave radar and computer vision technology are widely used in the fields of automatic driving, security protection, intelligent transportation and the like. Millimeter wave radar has the advantages of strong penetrating power and no influence of weather such as illumination, rain and snow, etc., but it cannot provide high-precision target recognition and tracking information. In contrast, computer vision techniques can provide richer target information, but are more affected by factors such as lighting, weather, etc.
In general, radar and vision fusion strategies fall into three categories: decision layer fusion (commonly we refer to as post-fusion), feature layer fusion (middle layer fusion), data layer fusion (pre-fusion). The decision layer fusion is to fuse the final result output by the radar-based model, such as the 3D BoudingBox and the result output by the visual detection, such as the 2D BoudingBox, through a filtering algorithm; the feature layer fusion is to project the final result output by one mode onto the deep learning feature layer of another mode, and then to use a subsequent fusion network to perform information fusion; the data layer fusion is to directly fuse the original data of the two modes, and then directly output a final result by using a neural network.
The fusion strategies have advantages and disadvantages, but post fusion is commonly used in industry, because the scheme is flexible, the robustness is better, the output results of different modes are integrated through manually designed algorithms and rules, and different modes have different use priorities under different conditions, so that the influence on a system when a single sensor fails can be better processed. However, the post-fusion has many disadvantages, namely, the information is not fully utilized, the system links become more complex, the longer the links are, the more easily the problems are caused, and the maintenance cost is high when the rules are stacked more. The academic world is better presented with a pre-fusion scheme, which can better utilize the end-to-end characteristics of the neural network. However, the scheme of pre-fusion can be directly loaded, because the robustness of the current pre-fusion scheme is considered to be insufficient to meet the actual requirement, and particularly when radar signals are in a problem, the current pre-fusion scheme can hardly process.
In a practical environment, the following problems are encountered:
(1) The external parameter of the radar and the camera is inaccurate, namely, the external parameter is inaccurate due to calibration problems or jolt and shake when the vehicle runs, so that deviation can occur in direct projection of the point cloud and the image.
(2) Camera noise, such as lens smudge, card frames, even damage to a certain camera, etc., results in the point cloud being projected onto the image without finding the corresponding feature or obtaining the wrong feature.
(3) Radar noise, except for the dirt shielding problem; for some low-reflection objects, the radar itself characteristics result in missing return points.
Some methods have provided some compatibility capabilities for problems (1) and (2), such as deep fusion, but are ineffective for point cloud missing caused by the radar noise of problem (3). Because all the methods need to go through the point cloud coordinates to Query the image characteristics, once the point cloud is missing, all the methods cannot be performed.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to provide a 4D millimeter wave radar and vision fusion sensing method based on attention.
The specific implementation steps of the invention are as follows: a method for fusing 4D millimeter wave radar and vision based on attention comprises the following steps:
step 1: and (3) radar feature extraction: extracting radar trunks of the 4D millimeter wave radar data by adopting a voxel format to obtain BEV characteristics of the radar data;
step 2: extracting image features: extracting main features of the image data to obtain 2D features of the image, and projecting each picture feature pixel back into a 3D space through projection to form BEV features of the image;
step 3: feature fusion: through introducing an attention mechanism, performing attention coding fusion on the radar and visual data obtained in the step 1 and the step 2 in the BEV space to obtain comprehensive target information;
step 4: and (3) target detection: and performing target detection by using the fused BEV characteristic information.
Preferably, in step 1, radar trunk extraction is performed on the 4D millimeter wave radar data by adopting a voxel format, and the step of obtaining radar feature data includes:
selecting a voxel form as input of a point cloud BEV feature extraction network, wherein the BEV feature extraction network takes VoxelNet as a main network, and a feature pyramid network is added, wherein the main network divides a three-dimensional point cloud into a certain number of voxels, performs random sampling and normalization on the points, performs local feature extraction on each non-empty voxel by using a plurality of voxel feature coding layers to obtain voxel-level features, and then further abstracts the features (increases receptive fields and learns geometric space representation) by using a 3D convolution module to obtain BEV features of the point cloud; further refinement of BEV features via the feature pyramid network results in feature maps of different resolutions that all contain semantic information of the original deepest feature map by extracting features from BEV features using a bottom-up path and combining and refining the features using a top-down path.
Preferably, in step 2, the specific step of extracting the image backbone feature of the image data to obtain the 2d feature of the image includes:
taking a Swin Transformer as a backbone network, adding a feature pyramid network layer to obtain 2d features of an image, and explicitly estimating depth information of the image through a view projection module to complete the construction of a BEV view angle of the image, wherein the backbone network whole model adopts a layered structure, and has 4 stages in total, the resolution of an input feature image is reduced in each stage, the receptive field is gradually enlarged, and when input starts, patch editing is carried out to divide the image into a plurality of small blocks and embed the small blocks into the editing; each stage includes two parts, namely a Patch merge and a Swin transducer module.
Preferably, each picture feature pixel is projected back into 3D space by projection, the specific process of constructing the BEV feature of the image is:
a discrete set of depth values is generated for each pixel of the image, the method of generating depth values for the pixels being such that: in a view cone 1 m to 60 m away from a camera, an optional depth value is arranged every 1 m, N points are sampled on the straight line, depth information of Feature points is predicted, a vector of D dimension is expressed by softmax, D represents the number of distances which are 1 m apart and are 1 m apart in a range of 1 m to 60 m, the depth information obtained by each pixel point is used for weighting image features of the same position to generate a pseudo point cloud similar to a truncated pyramid in shape, the camera external parameters and the camera internal parameters are used for converting with the Frustum Feature obtained before to obtain coordinates of a picture in a 3D space, and then flattening is carried out, wherein the specific process is as follows: defining the size of each grid by defining the range of the BEV visual angles, and summarizing the characteristics projected to the corresponding grids into one grid; there may be multiple features in the same grid in top view, the image point cloud is quantized along the x, y dimensions using a fixed step size, the features are aggregated in each BEV grid using a BEV pooling operation, and the features are expanded along the z-axis.
Preferably, in step 3, the radar and the image features are fused, and the fusing step includes:
in space, firstly, carrying out superposition channel on the features of the radar and the image, respectively carrying out global maximum pooling and global average pooling, then respectively carrying out 3×3 convolution kernels, and then carrying out sigmoid activation operation to generate a final spatial attention feature map, and weighting each pixel of the BEV features of the radar and the image through the feature map;
and on the channel, carrying out channel superposition on the radar features and the picture features subjected to the spatial attention extraction, obtaining the weight of the channel through averaging pooling, 3×3 convolution kernel and Sigmoid operation, and multiplying the superposed features through the weight to obtain the final fusion feature.
Compared with the prior art, the invention has the remarkable advantages that: the invention unifies multi-modal characteristics in the shared aerial view (BEV) representation space, well reserves geometric and semantic information, and solves the problems of interdependence and difficult weight distribution during 4D millimeter wave radar and vision fusion.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of a picture feature encoding module according to the present invention.
Fig. 3 is a schematic diagram of a feature fusion module of the present invention.
Detailed Description
The invention is described in further detail below with reference to the accompanying drawings. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
As shown in fig. 1, a method for fusing attention-based 4D millimeter wave radar and vision specifically includes the following steps:
step 1: and (3) radar feature extraction: extracting radar trunks of the 4D millimeter wave radar data by adopting a voxel format to obtain BEV characteristics of the radar data;
when the main features of the radar are extracted, the point cloud of the millimeter wave radar is sparse, and the feature extraction method of dense point cloud is difficult to directly use. In addition, most point-level feature extraction methods only can be used for fusing the features of local information, and the relevance between the local information and the whole information is not strong. Therefore, in the field of autopilot, point-level features are not directly used for 3D object detection tasks. For the 4D millimeter wave radar, the point cloud is also a three-dimensional point cloud, so the feature extraction mode of the laser radar point cloud is also applicable to the 4D millimeter wave radar, and voxel-based feature expression is selected: voxel. The network design is mainly based on VoxelNet and is added with an FPN (feature pyramid network). The method comprises the steps that a backbone network divides a three-dimensional point cloud into a certain number of voxels, after random sampling and normalization of points, local feature extraction is carried out on each non-empty voxel by using a plurality of voxel feature coding layers to obtain voxel-level features, and then the features are further abstracted (the receptive field is increased and geometric space representation is learned) through a 3D convolution module to obtain BEV features of the point cloud; further refinement of BEV features via the feature pyramid network results in feature maps of different resolutions that all contain semantic information of the original deepest feature map by extracting features from BEV features using a bottom-up path and combining and refining the features using a top-down path.
Through practical tests, the radar point characteristics can be effectively extracted through the method.
Step 2: extracting image features: extracting main features of the image data to obtain 2D features of the image, and projecting each picture feature pixel back into a 3D space through projection to form BEV features of the image;
when the main features of the image are extracted, the calculated amount of the high-resolution and pixel-rich image, whether the image is a transducer or a CNN (convolutional neural network), is large, and the later-stage calculation force requirement is high. Thus, a Swin transducer with hierarchical design is used, including sliding window operations. The network design is mainly based on a Swin Transformer and an FPN layer is added, and the 2d characteristic of the image can be effectively extracted. In order to obtain the characteristics of the image in the BEV space, a view projection module is designed, and as shown in fig. 2, the module explicitly estimates the depth information of the image, and the construction of the BEV view angle of the image is completed. For the obtained 2D image features, each feature pixel is dispersed to D discrete points along the camera ray, the depth probability distribution of each pixel in the image is predicted, relevant features are scaled according to the corresponding depth probability, an image feature point cloud is obtained, then the image feature point cloud is quantized along the x and y dimensions by using a fixed step size, features are aggregated in each BEV grid by using BEV pooling operation, and then the features are unfolded along the z axis.
The whole model of the backbone network adopts a layered structure, and has 4 stages in total, the resolution of the input feature map can be reduced in each stage, the receptive field is gradually enlarged, and the effect of reducing the calculated amount is achieved. At the beginning of the input, patch editing is performed, the image is divided into several small blocks and embedded in editing. Each stage contains two parts, namely a Patch merge (except that the first block is a linear layer) and a Swin transducer module; for the view projection module, a discrete set of depth values is first generated for each pixel of the image, so that the network can select the appropriate depth itself during model training. The method for generating depth values for pixels is that in a view cone 1 m to 60 m from a camera, there is an optional depth value every 1 m (thus 61 optional discrete depth values for each pixel), so N points can be sampled on this straight line, then the network needs to predict the depth information (distribution over depth) of this feature point, and the depth information is represented by a D-dimensional vector through softmax, D represents a distance in the range of 1 m to 60 m, that is, d=61, so that each position on D represents a probability value of the pixel in this depth range. By defining the range of BEV viewing angles, the size of each grid is defined, and the features projected to the corresponding grid are summarized into one grid. There may be multiple features in the same grid in the top view, the image point cloud is quantized along the x, y dimensions using a fixed step size, the features are aggregated in each BEV grid using a BEV pooling operation, and the features are expanded along the z-axis.
Step 3: feature fusion: through introducing an attention mechanism, performing attention coding fusion on the radar and visual data obtained in the step 1 and the step 2 in the BEV space to obtain comprehensive target information;
when feature fusion is performed, as shown in fig. 3, in order to more effectively extract important features from the point cloud features and the image features, the thought of spatial attention is applied, the feature mapping and the correlation degree of the target detection task can be selectively combined to generate an attention mapping, the relative importance of the image and the point cloud data is reflected, and more important point cloud features and image features can be extracted according to the attention mapping. To effectively fuse the BEV features of the camera and radar, a channel attention extraction approach is used. For two characteristics of different channel numbers, the two characteristics are overlapped and connected, and are fused with the learnable static weights, and important channels are selected in a channel attention extraction mode, so that more important fusion characteristics can be obtained.
Step 4: and (3) target detection: and performing target detection by using the fused BEV characteristic information.
And finally, inputting the fusion characteristic into a detection head based on a transducer to obtain a final target detection result.
The invention relates to a attention-based BEV feature layer fusion method, which is characterized in that a mode-specific backbone network is used for respectively extracting features corresponding to an image and a 4D millimeter wave radar, then the features are converted into unified BEV characterization, fusion is carried out through an attention-based mechanism, and finally a target detection result is output through a detection head.
In the invention, the radar point cloud processing and the image processing are independently carried out, the neural network is utilized for encoding, the encoding is projected to a unified BEV space, and then the encoding and the image processing are fused on the BEV space. In this case, the radar and vision have no primary and secondary dependence, so that the flexibility of approximate post-fusion can be realized: the single mode can independently perform the task, when a plurality of modes are added, the performance can be greatly improved, but when one mode is missing or noise is generated, the whole device can not generate destructive results. The method also realizes the self-adaptive fusion of radar and visual data by introducing an attention mechanism, and improves the accuracy of sensing tasks (such as target tracking and recognition).
The invention is described in further detail below in connection with specific embodiments.
Example 1
And (3) radar feature extraction: and extracting radar trunks of the 4D millimeter wave radar data by adopting a voxel format to obtain BEV characteristics of the radar data.
Extracting image features: extracting image backbone features of image data to obtain 2D features of the image, dispersing each feature pixel to D discrete points along camera light by predicting the depth probability distribution mode of each pixel in the image, scaling related features according to corresponding depth probabilities to obtain image feature point clouds, quantifying the image point clouds along x and y dimensions, aggregating the features in each BEV grid by using BEV pooling operation, and expanding the features along a z axis to obtain the features of the image in a BEV space.
Feature fusion: each pixel of the BEV features of the radar and picture is weighted spatially, weighing the importance of the radar and picture features. And then, on the channel, the extracted radar features and the extracted picture features are subjected to channel superposition, the attention extraction is carried out on the fused channel by utilizing channel attention learning, the weights of the two sensor features are reasonably distributed, and the detection precision is further improved.
And (3) target detection: and inputting the fused characteristics into a 3D target detection head to obtain a 3D target detection result.
Parts or structures of the present invention, which are not specifically described, may be existing technologies or existing products, and are not described herein. The foregoing description is only illustrative of the present invention and is not intended to limit the scope of the invention, and all equivalent structures or equivalent processes or direct or indirect application in other related arts are included in the scope of the present invention.

Claims (5)

1. The attention-based 4D millimeter wave radar and vision fusion method is characterized by comprising the following steps of:
step 1: and (3) radar feature extraction: extracting radar trunks of the 4D millimeter wave radar data by adopting a voxel format to obtain BEV characteristics of the radar data;
step 2: extracting image features: extracting main features of the image data to obtain 2D features of the image, and projecting each picture feature pixel back into a 3D space through projection to form BEV features of the image;
step 3: feature fusion: through introducing an attention mechanism, performing attention coding fusion on the radar and visual data obtained in the step 1 and the step 2 in the BEV space to obtain comprehensive target information;
step 4: and (3) target detection: and performing target detection by using the fused BEV characteristic information.
2. The attention-based 4D millimeter wave radar and vision fusion method of claim 1, wherein the step of extracting radar trunk of the 4D millimeter wave radar data in the voxel format in the step 1 to obtain radar feature data comprises the steps of:
selecting a voxel form as input of a point cloud BEV feature extraction network, wherein the BEV feature extraction network takes VoxelNet as a main network, and a feature pyramid network is added, wherein the main network divides a three-dimensional point cloud into a certain number of voxels, performs random sampling and normalization on the points, performs local feature extraction on each non-empty voxel by using a plurality of voxel feature coding layers to obtain voxel-level features, and then further abstracts the features (increases receptive fields and learns geometric space representation) by using a 3D convolution module to obtain BEV features of the point cloud; further refinement of BEV features via the feature pyramid network results in feature maps of different resolutions that all contain semantic information of the original deepest feature map by extracting features from BEV features using a bottom-up path and combining and refining the features using a top-down path.
3. The attention-based 4D millimeter wave radar and vision fusion method of claim 1, wherein the specific step of extracting the image backbone feature of the image data in the step 2 to obtain the 2D feature of the image comprises the following steps:
taking a SwinTransformer as a backbone network, adding a feature pyramid network layer to obtain 2d features of an image, explicitly estimating depth information of the image through a view projection module, and completing construction of a BEV view angle of the image, wherein the backbone network whole model adopts a layered structure, and has 4 stages in total, each stage can reduce resolution of an input feature image, gradually expand a receptive field, and when input starts, patch editing is carried out to divide the image into a plurality of small blocks and embed the small blocks into the editing; each stage includes two parts, namely a Patch merge and a swinTransformer module.
4. The attention-based 4D millimeter wave radar and vision fusion method of claim 1, wherein each picture feature pixel is projected back into the 3D space by projection, and the specific process of forming the BEV feature of the image is:
a discrete set of depth values is generated for each pixel of the image, the method of generating depth values for the pixels being such that: in a view cone 1 m to 60 m away from a camera, an optional depth value is arranged every 1 m, N points are sampled on the straight line, depth information of feature points is predicted, a vector of D dimension is expressed by softmax, D represents the number of distances which are 1 m apart and are 1 m apart in a range of 1 m to 60 m, the depth information obtained by each pixel point is used for weighting image features of the same position to generate a pseudo point cloud similar to a truncated pyramid in shape, the camera external parameters and the internal parameters and the frame features obtained before are used for conversion to obtain coordinates of a picture in a 3D space, and then flattening is carried out, wherein the specific process is as follows: defining the size of each grid by defining the range of the BEV visual angles, and summarizing the characteristics projected to the corresponding grids into one grid; there may be multiple features in the same grid in top view, the image point cloud is quantized along the x, y dimensions using a fixed step size, the features are aggregated in each BEV grid using a BEV pooling operation, and the features are expanded along the z-axis.
5. The attention-based 4D millimeter wave radar and vision fusion method of claim 1, wherein the radar and image features are fused in step 3, and the fusing step comprises:
in space, firstly, carrying out superposition channel on the features of the radar and the image, respectively carrying out global maximum pooling and global average pooling, then respectively carrying out 3×3 convolution kernels, and then carrying out sigmoid activation operation to generate a final spatial attention feature map, and weighting each pixel of the BEV features of the radar and the image through the feature map;
and on the channel, carrying out channel superposition on the radar features and the picture features subjected to the spatial attention extraction, obtaining the weight of the channel through averaging pooling, 3×3 convolution kernel and Sigmoid operation, and multiplying the superposed features through the weight to obtain the final fusion feature.
CN202310237553.8A 2023-03-14 2023-03-14 Attention-based 4D millimeter wave radar and vision fusion method Pending CN116129234A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310237553.8A CN116129234A (en) 2023-03-14 2023-03-14 Attention-based 4D millimeter wave radar and vision fusion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310237553.8A CN116129234A (en) 2023-03-14 2023-03-14 Attention-based 4D millimeter wave radar and vision fusion method

Publications (1)

Publication Number Publication Date
CN116129234A true CN116129234A (en) 2023-05-16

Family

ID=86304759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310237553.8A Pending CN116129234A (en) 2023-03-14 2023-03-14 Attention-based 4D millimeter wave radar and vision fusion method

Country Status (1)

Country Link
CN (1) CN116129234A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274749A (en) * 2023-11-22 2023-12-22 电子科技大学 Fused 3D target detection method based on 4D millimeter wave radar and image

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117274749A (en) * 2023-11-22 2023-12-22 电子科技大学 Fused 3D target detection method based on 4D millimeter wave radar and image
CN117274749B (en) * 2023-11-22 2024-01-23 电子科技大学 Fused 3D target detection method based on 4D millimeter wave radar and image

Similar Documents

Publication Publication Date Title
Yu et al. A real-time detection approach for bridge cracks based on YOLOv4-FPM
CN111862126B (en) Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm
CN113159151B (en) Multi-sensor depth fusion 3D target detection method for automatic driving
CN114724120B (en) Vehicle target detection method and system based on radar vision semantic segmentation adaptive fusion
US11544898B2 (en) Method, computer device and storage medium for real-time urban scene reconstruction
CN114120115A (en) Point cloud target detection method for fusing point features and grid features
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN114463736A (en) Multi-target detection method and device based on multi-mode information fusion
CN116129234A (en) Attention-based 4D millimeter wave radar and vision fusion method
Laupheimer et al. The importance of radiometric feature quality for semantic mesh segmentation
CN113536920B (en) Semi-supervised three-dimensional point cloud target detection method
CN114494248A (en) Three-dimensional target detection system and method based on point cloud and images under different visual angles
CN113191204B (en) Multi-scale blocking pedestrian detection method and system
CN114332796A (en) Multi-sensor fusion voxel characteristic map generation method and system
CN114332921A (en) Pedestrian detection method based on improved clustering algorithm for Faster R-CNN network
Gigli et al. Road segmentation on low resolution lidar point clouds for autonomous vehicles
Meng et al. Multi‐vehicle multi‐sensor occupancy grid map fusion in vehicular networks
CN117237919A (en) Intelligent driving sensing method for truck through multi-sensor fusion detection under cross-mode supervised learning
Chaturvedi et al. Small object detection using retinanet with hybrid anchor box hyper tuning using interface of Bayesian mathematics
CN116664851A (en) Automatic driving data extraction method based on artificial intelligence
CN116363526A (en) MROCNet model construction and multi-source remote sensing image change detection method and system
US20230105331A1 (en) Methods and systems for semantic scene completion for sparse 3d data
CN115082902B (en) Vehicle target detection method based on laser radar point cloud
Le et al. Simple linear iterative clustering based low-cost pseudo-LiDAR for 3D object detection in autonomous driving
CN115829898B (en) Data processing method, device, electronic equipment, medium and automatic driving vehicle

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination