CN116682105A - Millimeter wave radar and visual feature attention fusion target detection method - Google Patents
Millimeter wave radar and visual feature attention fusion target detection method Download PDFInfo
- Publication number
- CN116682105A CN116682105A CN202310590332.9A CN202310590332A CN116682105A CN 116682105 A CN116682105 A CN 116682105A CN 202310590332 A CN202310590332 A CN 202310590332A CN 116682105 A CN116682105 A CN 116682105A
- Authority
- CN
- China
- Prior art keywords
- radar
- image
- target
- millimeter wave
- point cloud
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 52
- 230000000007 visual effect Effects 0.000 title claims abstract description 50
- 230000004927 fusion Effects 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 claims abstract description 20
- 238000007781 pre-processing Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000000295 complement effect Effects 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 230000002159 abnormal effect Effects 0.000 claims description 2
- 238000000605 extraction Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000011176 pooling Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000013517 stratification Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000013519 translation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/10—Terrestrial scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a target detection method for millimeter wave radar and visual characteristic attention fusion, which comprises the following steps: acquiring millimeter wave Lei Dadian cloud data and visual image information; fusing the point cloud data with the visual image in a data layer after preprocessing the point cloud data; performing preliminary detection on the image, and extracting image features; lei Dadian cloud is subjected to target association with the image information, and radar features are extracted; inputting the image features and the radar features into a feature attention fusion network for fusion; and decoding the detection result of the target by using a 3Dbox decoder and outputting the detection result. According to the method, the spatial information size of the point cloud projected onto the image is adaptively adjusted by utilizing the millimeter wave radar scattering cross section intensity, and the problem of fixed size of the existing radar point cloud projected onto the image is solved; the feature attention fusion network for the radar and the image solves the problem of uneven weight distribution of the millimeter wave radar and the visual image during feature fusion, and has the advantages of improving the accuracy and the robustness of target detection.
Description
Technical Field
The invention relates to the technical field of target detection by sensor feature fusion, in particular to a target detection method by millimeter wave radar and visual feature attention fusion.
Background
Currently, sensors for target detection mainly include a variety of sensors such as a vision camera, an infrared sensor, a laser radar, and a millimeter wave radar. The camera has wide application because of the large detection range, the capability of acquiring more original information of the target and the strong classification capability, but the camera cannot acquire depth information of the target and is easily affected by different weather and light rays. In recent years, although cameras have greatly improved two-dimensional detection of targets and high detection accuracy, there is a large gap in three-dimensional target detection by visual image input alone. The millimeter wave radar not only can realize all-weather work in the whole day, but also can acquire information such as the relative distance, the speed, the scattering cross section intensity and the like of a target from echo signals, and has strong anti-interference capability, but the millimeter wave Lei Dadian cloud is very sparse and is easily influenced by external clutter and noise. Therefore, the defect of a single sensor is generally solved by adopting a method of fusing a plurality of sensors. In contrast, the research of fusing the laser radar and the camera on target detection is relatively more, but due to the narrow detection range and high price of the laser radar, more and more research is turned to the fusion of the millimeter wave radar and the camera. By combining the advantages of the millimeter wave radar and the visual camera, the fusion of the millimeter wave radar and the visual camera can well utilize the information provided by the opposite side to complement each other, so that the fused information is richer and more comprehensive, and meanwhile, the problems of huge calculation amount and high cost when the laser radar is used are avoided.
Generally, the fusion method of millimeter wave radar and visual camera is divided into three types: data-level fusion, decision-level fusion and feature-level fusion, wherein feature-level fusion has a large research space. However, due to the sparse millimeter wave Lei Dadian cloud data, the representation capability of the image is very weak, and the problem of uneven distribution of characteristic weights also exists when radar characteristics and image characteristics are fused, the radar characteristics are mostly utilized to assist the image characteristics in target detection. Therefore, how to fully utilize the point cloud data to enhance the spatial information, improve the characterization capability and realize the reasonable distribution of radar features and image feature weights during millimeter wave Lei Dadian cloud data processing becomes a key problem.
Disclosure of Invention
In view of the shortcomings of the prior art, the invention provides a millimeter wave radar and visual feature attention fusion target detection method, which is used for solving the problems that millimeter wave Lei Dadian cloud data are sparse, the representation capability is very weak, and feature weight distribution is uneven when radar features and image features are fused.
To achieve the above and other related objects, the present invention provides a method for detecting an object by fusion of millimeter wave radar and visual feature attention, the method comprising the steps of:
step one: acquiring millimeter wave Lei Dadian cloud data and a frame image of a visual camera;
step two: preprocessing Lei Dadian cloud, projecting radar points to an image plane to obtain a radar image, and controlling the size of spatial information projected to the image by the radar point cloud by using the radar scattering cross section intensity to fuse with visual image information in a data layer;
step three: performing preliminary detection on the image, and extracting image features;
step four: performing target association on the radar point cloud and the image, and extracting radar features;
step five: inputting the image features and the radar features into a feature attention fusion module, and reasonably distributing weights of the two features;
step six: and (3) combining the results of the primary regression of the network detection head in the third step and the secondary regression in the fifth step, decoding the detection result of the target by using a 3Dbox decoder and outputting the detection result.
Preferably, in the second step of the present invention, a preprocessing operation is performed on the Lei Dadian cloud, and the specific steps of projecting radar points onto an image plane to obtain a radar image are as follows:
scanning radar detection points for multiple times and accumulating data of the radar detection points, so that the density of point clouds is increased;
loading point cloud data, filtering points according to distance by setting a threshold value of a proper distance, adding z bias to the radar point cloud, and removing the radar point cloud with abnormal speed;
the radar point is expressed as a 3D point detected by radar in a self-center coordinate system, and is parameterized as P radar (x,y,z,v x ,v y Sigma), where (x, y, z) represents the position of the target, v x ,v y Radial speeds of the target in the x and y directions respectively, and sigma represents the scattering section intensity of the target;
based on Nuscenes data set, millimeter wave radar point is projected to image plane through given parameters inside and outside the camera in the data set. And converting the millimeter wave radar point from a self-coordinate system to an image coordinate system through coordinate conversion, and generating a radar image with the same size as the visual image.
For radar point cloud data projected to an image plane, the point cloud data is primarily expanded into a height range of 0.5-2.0 m in the longitudinal direction and a width range of 0.2-1.5 m in the transverse direction by pixels, so that primary association is carried out between radar information and pixels of a camera;
the specific height values of radar point pixels on the radar image in the transverse width and the longitudinal height are adaptively adjusted through the scattering section intensity sigma of the radar points, so that the two-dimensional point cloud pixel area with changeable size is obtained.
Preferably, in the fourth step of the present invention, the specific steps of associating the radar point cloud with the target of the 2D image and extracting the radar feature are as follows:
generating a 3D visual cone by using the depth, the observation angle, the 3D size and the camera calibration matrix of the 3D bounding box obtained by preliminary regression in the step four;
expanding radar point cloud into a 3D column with a fixed size to increase the association rate of the point cloud, projecting the column to a pixel coordinate system to be matched with a 2D bounding box, and projecting the column to a camera coordinate system to be matched with a depth value of a constructed 3D cone;
extraction of radar features, for each object-dependent radar detection, 3 radar heat map channels are generated, located in the center and inside of the 2D bounding box of the object, wherein the width and height of the heat map are proportional to the size of the 2D bounding box, and the values of the heat map are determined by the normalized object depth and the x, y components of Vx, vy in a self-centering coordinate system.
Preferably, in the fifth step of the present invention, the image feature and the radar feature are input into the feature attention fusion module to perform weight learning, and the specific steps of reasonably distributing weights of the two features are as follows:
the feature attention fusion network mainly consists of a channel attention module CAM and convolution layers with different sizes. The first is a convolution layer conv1×1 with a convolution kernel size of 1×1, a step size of (1, 1), and a padding of (0, 0); the second is a convolution layer conv3×3 with a convolution kernel size of 3×3, a step size of (1, 1), and a padding of (1, 1); the third is the convolution layer conv7×7 with convolution kernel size 7×7, step size (1, 1), padding (3, 3);
the radar feature mainly comprises the steps of carrying out weight extraction on one Conv1×1 and two Conv3×3 respectively, carrying out element addition operation on an attention weight matrix obtained by three branches, and further extracting feature weights through a 7×7 convolution layer to generate spatial attention information of the radar feature; the image features are sequentially processed by Conv1×1, conv3×3 and Conv1×1 to extract image feature weights, and then the image features and the results processed by the channel attention module are multiplied by elements to generate channel attention information of the image features;
performing splicing and fusion operation on the spatial attention information of the radar features and the channel attention information of the image features to generate an image radar feature tensor, and performing secondary regression by using a detection head to obtain information such as depth, speed, rotation angle, attribute and the like of the target;
as described above, the method for detecting the target by fusing millimeter wave radar and visual characteristic attention has the characteristics and beneficial effects that:
(1) The method for adaptively adjusting the spatial information size of the radar point cloud projected to the image by utilizing the millimeter wave radar scattering cross section intensity is provided, the point cloud projected to the image is adaptively adjusted to the height and width of the radar point expansion by utilizing the radar scattering cross section intensity, and the spatial information of the radar point is improved;
(2) Compared with the prior art, the attention mechanical learning mechanism is added in the feature fusion module. Because of the sparsity of millimeter wave Lei Dadian cloud, radar features are processed to generate space attention information, image feature processing to generate channel attention information, and then the attention information of two paths is fused, so that the weights of millimeter wave radar features and visual image features are reasonably distributed again, radar point cloud information is enhanced, and the accuracy and the robustness of three-dimensional target detection are improved.
Drawings
FIG. 1 is a schematic flow chart of the algorithm of the present invention.
Fig. 2 is a schematic diagram of a millimeter wave radar and visual feature attention fused object detection network structure according to the present invention.
Fig. 3 is a schematic diagram of a feature attention fusion network according to the present invention.
Detailed Description
In order to make the objects and technical solutions of the present invention more clear and understandable, the technical solutions of the present invention will be further described in detail below with reference to the accompanying drawings and specific examples. The following specific examples are given by way of illustration only and are not intended to limit the scope of the invention.
As shown in fig. 2, the network structure for target detection is a millimeter wave radar and visual feature attention fusion, and mainly comprises five modules of data preprocessing, data layer fusion, feature extraction, feature attention fusion and target detection.
And a data preprocessing module: preprocessing millimeter wave radar point cloud data, and generating a radar image.
The detailed flow is as follows:
1. downloading Nuscenes data set, and reading millimeter wave Lei Dadian cloud data in front and frame image information of a visual camera from the Nuscenes data set, wherein the millimeter wave Lei Dadian cloud data mainly comprises information read from radar echo signals, including various information such as distance, position, target speed and scattering section intensity of a radar reaching a target;
2. the Lei Dadian cloud is preprocessed, and the specific implementation method comprises the following steps:
(1) Scanning radar detection points for multiple times and accumulating data of the radar detection points, so that the density of point cloud data is increased;
(2) Loading point cloud data, calculating the distance between the radar and a target, namely depth information, filtering the points according to the distance by setting a proper threshold value, and adding z bias for the radar point cloud; and then sorting the screened point clouds according to the rule from small to large according to the distance between the point clouds. The expression for the filter points by distance is:
1m≤d≤100m
where d represents the distance of the radar to the target point.
(3) Representing radar points as 3D points detected by radar in a self-centering coordinate system and parameterizing the points as P radar (x,y,z,v x ,v y Sigma), where (x, y, z) represents the position of the target, v x ,v y Respectively representing radial speeds of the target in x and y directions, and sigma represents scattering section intensity of the target;
3. based on Nuscenes data set, millimeter wave radar point is projected to image plane through given parameters inside and outside the camera in the data set. And converting the millimeter wave radar point from a self-coordinate system to an image coordinate system by using a coordinate conversion formula, and generating a radar image with the same size as the visual image. The coordinate conversion specific operation is that firstly, radar points are converted from a self coordinate system to a global coordinate system, then the global coordinate system is converted to a camera coordinate system, and finally, the camera coordinate system is converted to an image coordinate system.
4. Due to sparsity of the millimeter wave Lei Dadian cloud, for radar point cloud data projected to an image plane, the point cloud data is preliminarily expanded in a height range of 0.5 to 2.0 meters in the longitudinal direction and in a width range of 0.2 to 1.5 meters in the lateral direction in pixels. The step establishes preliminary association between radar information and pixels of a camera, and enlarges the information of radar point cloud in space; then, the radar point pixels on the radar image are adaptively adjusted in the transverse width w through the scattering cross section intensity sigma of the radar point p And a longitudinal height h p The expression is:
wherein σ (·) represents the function of the scattering cross-section intensity, w' p 、h′ p Respectively representing specific point cloud width and height values; s is S p Representing the point cloud pixel area.
Through the operation, the pixel area of the two-dimensional point cloud with changeable size can be obtained, and the problem of fixed size of the radar point cloud projected on an image in the prior art is solved.
And a data layer fusion module: depth information obtained by millimeter wave radar echo, velocity component v in X and Y axes x ,v y Projecting as pixel values into a visual image channel; setting all pixel values of the corresponding radar channels to zero at positions without radar echo; and finally, converting the visual image added with the radar information into a three-channel image, and realizing fusion of millimeter wave radar point cloud data and the visual image in a data layer.
And the feature extraction module is used for: the method comprises the steps of extracting visual image features and extracting millimeter wave radar features.
1. The extraction of visual image features comprises the following detailed procedures:
(1) Taking the fusion result of millimeter wave Lei Dadian cloud data and visual images in a data layer as the input of a CneterNet central point detection network taking improved DLA-34 as a backbone network, wherein the fusion result of the data layer is three-channel visual image information I with radar information i+r ∈R W×H×C Where W and H represent the height and width of the image, respectively. Generating a predicted key point heat map after downsamplingR represents the downsampling rate, C represents the number of object classes; for the generated heat map, a Focal Loss function is established, and the expression is:
wherein N represents the number of key points in the image; α, β represents a super parameter, typically α=2, β=4;ground truth heat map values representing the target generated by the gaussian kernel.
(2) Information such as center point, 2D size (W and H), center offset, depth, rotation angle, and 3D size of the object on the image is predicted by a main regression head consisting of 256-channel Conv3 x 3 convolution layers and Conv1 x 1 convolution layers that generate the desired output, which provides a coarse 3DBox and precisely located 2DBox for each detected object.
2. The extraction of millimeter wave radar features comprises the following detailed procedures:
(1) Based on depth information obtained by echo of millimeter wave radar, velocity component v in X and Y axes, which is mentioned in data preprocessing module x ,v y And converting the three-channel original radar point cloud image into pixel values, and generating a three-channel original radar point cloud image with the same size as the visual image. Root of Chinese characterAccording to a rigid transformation formula, the radar points are converted into a camera coordinate system, and the formula is as follows:
X image =RX radar +T
wherein X is image And X radar And respectively representing coordinates of radar point clouds in a camera and millimeter wave radar coordinate system. R and T represent a rotation matrix and a translation matrix, respectively.
(2) For the generated radar point cloud image, in order to supplement the height information of the radar, the radar points are expanded into columns with the size of (1.5,0.2,0.2) in the [ x, y, z ] direction, the spatial information of the point cloud is enhanced, and the columns are projected to a pixel coordinate system to be associated with a 2D bounding box.
(3) And combining the 2D bounding box, the estimated depth information and the camera calibration matrix which are output in the visual image feature extraction to create a 3-dimensional ROI (region of interest) cone region for the object, simultaneously projecting a cylinder to a camera coordinate system to be matched with the depth value of the constructed 3D cone, ignoring points outside the cone, and controlling the size of the cone region through a control parameter delta.
(4) After associating radar points with corresponding objects, complementary features are established for the image using depth and speed information of the radar point cloud, and 3 radar heat map channels (d, v x ,v y ) And the width and height of the radar heat map are proportional to the 2DBox of the target, the heat map value is composed of the depth value d of the target and the radial velocity (v x And v y ) X and Y component determination of (C), table h j The expression is as follows:
wherein M is i Representing the normalization factor; i=1, 2,3 denotes a characteristic channel; f (f) i Representing three channels (d, v x ,v y ) Is a characteristic value of (2); c x,j 、c y,j Respectively representing the x-axis and y-axis coordinates of the center point of the jth target on the image; w (w) j 、h j Representing the width and height of the jth target 2 DBox; alpha represents a superparameter for controllingWidth and height dimensions of the 2D Box. If two objects have regions of the heat map that overlap, the region with the smaller heat map value is typically selected.
Feature attention fusion module: the visual image features and the millimeter wave radar features are input into a feature attention fusion module together, and a structural schematic diagram of a feature attention fusion network is shown in fig. 3, wherein the feature attention fusion network comprises a channel attention module CAM and three different attention weight generating units. The first is a convolution layer conv1×1 with a convolution kernel size of 1×1, a step size of (1, 1), and a padding of (0, 0); the second is a convolution layer conv3×3 with a convolution kernel size of 3×3, a step size of (1, 1), and a padding of (1, 1); the third is the convolution layer conv7×7 with a convolution kernel size of 7×7, step size of (1, 1), padding of (3, 3).
The specific implementation steps of the feature attention fusion network are as follows:
(1) Respectively extracting weights of the radar features obtained in the fourth step by using one Conv1×1 and two Conv3×3, performing element addition operation on attention weight matrixes obtained by the three branches, and further extracting feature weights by a convolution layer of 7×7 to generate spatial attention information of the radar features;
(2) The generated image features are sequentially processed by Conv1×1, conv3×3 and Conv1×1 to extract image feature weights, and then the image feature weights are multiplied by the results of the original image features processed by the channel attention module according to elements to generate channel attention information of the image features;
(3) Performing stitching and fusion operation by using the spatial attention information of the radar features in (1) and the channel attention information of the image features in (2) to generate an image radar feature tensor;
specifically, the method for establishing the channel attention mechanism by processing the image features through CAM comprises the following steps:
visual image features I to be input img ∈R W×H×3 Respectively carrying out maximum stratification and average pooling treatment to obtain 2I' img ∈R 1×1×3 Respectively transmitting the two feature maps into a two-layer perceptron network, and performing corresponding elements of each tensor on the output featuresAfter the addition processing, the image channel characteristics can be generated by using the activation function processing, and finally tensor corresponding elements are multiplied with the original visual image characteristics, wherein the expression of the process is as follows:
M c (I img )=Sigmoid(MLP(avgpool(I img ))+MLP(maxpool(I img )))
in the formula, sigmoid (·) is an activation function; MLP is a network of perceptrons, used as matrix operation; avgpool (·) and maxpool (·) are represented as a mean pooling calculation and a maximum pooling calculation, respectively; m is M c (I img ) Is the output of the image channel attention mechanism module.
The target detection module: and (3) taking the output result after feature attention fusion as input of secondary regression of a detection head of the central point detection network of the central Net, recalculating depth, speed, rotation angle and attribute information of the target, wherein the secondary regression head consists of 3 Conv3×3 convolution layers, and outputting the Conv1×1 convolution layers. And combining the information extracted by the image features and depth, speed, rotation angle and attribute information obtained by regression of the secondary regression heads, and restoring a detection result of the 3D target through a 3D boundary frame decoder, wherein the depth and the gesture are in both regression heads, and only the result of the secondary regression with higher accuracy is utilized.
It should be noted that the foregoing is merely illustrative of the present invention, and not intended to limit the invention, and various changes may be made within the knowledge of one skilled in the art without departing from the spirit of the invention.
Claims (8)
1. The method for detecting the target by fusing millimeter wave radar and visual characteristic attention is characterized by comprising the following steps of:
step one: acquiring millimeter wave Lei Dadian cloud data and a frame image of a visual camera;
step two: preprocessing the radar point cloud data, and fusing the preprocessed radar point cloud data and the frame image in a data layer;
step three: performing preliminary detection on the frame image, and extracting image features;
step four: performing target association on the radar point cloud data and the frame image, and extracting radar features;
step five: inputting the image features and the radar features into a feature attention fusion module, and reasonably distributing weights of the two features;
step six: and decoding the detection result of the target by using a 3Dbox decoder and outputting the detection result.
2. The method for detecting the target by fusion of millimeter wave radar and visual feature attention according to claim 1, wherein in the second step, the specific step of preprocessing the radar point cloud data is as follows:
scanning radar detection points for multiple times and accumulating data of the radar detection points, so that the density of point clouds is increased;
loading point cloud data, filtering points according to distance by setting a threshold value of a proper distance, adding z bias to the radar point cloud, and removing the radar point cloud with abnormal speed;
the radar point is expressed as a 3D point detected by radar in a self-center coordinate system, and is parameterized as P radar (x,y,z,v x ,v y Sigma), where (x, y, z) represents the position of the target, v x ,v y Radial speeds of the target in the x and y directions respectively, and sigma represents the scattering section intensity of the target;
the millimeter wave radar point is projected to the image plane by the camera inner and outer parameters given in the dataset.
3. The method for detecting the target by fusing millimeter wave radar and visual feature attention according to claim 2, wherein the specific step of projecting the millimeter wave radar point cloud to the image plane is as follows:
converting millimeter wave radar points from a self-coordinate system to an image coordinate system through coordinate conversion, and generating a radar image with the same size as the visual image;
the point cloud data is primarily expanded into a height range of 0.5-2.0 m in the longitudinal direction and a width range of 0.2-1.5 m in the transverse direction by pixels, so that primary association is carried out between radar information and the pixels of a camera;
the specific height values of radar point pixels on the radar image in the transverse width and the longitudinal height are adaptively adjusted through the scattering section intensity sigma of the radar points, so that the two-dimensional point cloud pixel area with changeable size is obtained.
4. The method for detecting the target by fusing millimeter wave radar and visual feature attention according to claim 1, wherein in the second step, the specific step of fusing the preprocessed radar point cloud data and the frame image in a data layer is as follows:
depth information obtained by millimeter wave radar echo is firstly utilized, and velocity components v in X and Y axes x 、v y Projecting the pixel values into visual image channels, and setting all corresponding radar channels to be zero at pixel positions without radar echo; and finally, converting the visual image added with the radar information into a three-channel image, and realizing fusion of millimeter wave radar point cloud data and the frame image in a data layer.
5. The method for detecting the target by fusion of millimeter wave radar and visual feature attention according to claim 4, wherein in the third step, preliminary detection is performed on the frame image, and the specific step of extracting the image features is as follows:
and taking a fusion result of the millimeter wave Lei Dadian cloud data and the frame image in a data layer as input of a CneterNet central point detection network which takes DLA-43 as a backbone network, carrying out preliminary detection on the image, and obtaining image characteristic information of a target by using a detection head regression, wherein the image characteristic information comprises a rough 3D bounding box, depth, observation angle, 2D size and speed.
6. The method for detecting the target by fusion of millimeter wave radar and visual feature attention according to claim 5, wherein in the fourth step, the radar point cloud data and the frame image are subject to target association, and the specific steps of extracting radar features are as follows:
generating a 3D visual cone by using the image characteristic information of the target obtained by regression and a camera calibration matrix, and associating radar detection with the target in the visual cone area;
expanding the radar point cloud data into a 3D column with a fixed size to increase the association rate of the point cloud, projecting the column to a pixel coordinate system to be associated with a 2D bounding box, simultaneously projecting the column to a camera coordinate system to be matched with a depth value of a constructed 3D viewing cone, and neglecting points outside the viewing cone;
depth and velocity information using radar point cloud data to create complementary features for images and generate 3-channel (d, v x ,v y ) Is a radar feature of (2).
7. The method for detecting a target by fusion of millimeter wave radar and visual feature attention according to claim 6, wherein in the fifth step, the specific step of inputting the image feature and the radar feature into a feature attention fusion module is as follows:
respectively extracting weights of the radar features obtained in the fourth step by using one Conv1×1 and two Conv3×3, performing element addition operation on attention weight matrixes obtained by the three branches, and further extracting feature weights by a convolution layer of 7×7 to generate spatial attention information of the radar features;
the generated image features are sequentially processed by Conv1×1, conv3×3 and Conv1×1 to extract image feature weights, and then the image feature weights are multiplied by the results of the original image features processed by the channel attention module according to elements to generate channel attention information of the image features;
performing splicing and fusion operation by using the space attention information of the radar features and the channel attention information of the image features to generate an image radar feature tensor;
and performing secondary regression on the generated image radar characteristic tensor by using a detection head to obtain the depth, speed, rotation angle and attribute information of the target.
8. The method for detecting a target by fusion of millimeter wave radar and visual feature attention according to claim 7, wherein in the sixth step, the specific steps of decoding the detection result of the target by using a 3Dbox decoder and outputting the decoded detection result are as follows:
and combining the image characteristic information with the information obtained by regression after passing through the characteristic attention fusion module, recovering the detection result of the 3D target through a 3D boundary frame decoder, and outputting the detection result to a visual image for display.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310590332.9A CN116682105A (en) | 2023-05-24 | 2023-05-24 | Millimeter wave radar and visual feature attention fusion target detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310590332.9A CN116682105A (en) | 2023-05-24 | 2023-05-24 | Millimeter wave radar and visual feature attention fusion target detection method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116682105A true CN116682105A (en) | 2023-09-01 |
Family
ID=87777992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310590332.9A Pending CN116682105A (en) | 2023-05-24 | 2023-05-24 | Millimeter wave radar and visual feature attention fusion target detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116682105A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117079416A (en) * | 2023-10-16 | 2023-11-17 | 德心智能科技(常州)有限公司 | Multi-person 5D radar falling detection method and system based on artificial intelligence algorithm |
CN117237613A (en) * | 2023-11-03 | 2023-12-15 | 华诺星空技术股份有限公司 | Foreign matter intrusion detection method, device and storage medium based on convolutional neural network |
-
2023
- 2023-05-24 CN CN202310590332.9A patent/CN116682105A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117079416A (en) * | 2023-10-16 | 2023-11-17 | 德心智能科技(常州)有限公司 | Multi-person 5D radar falling detection method and system based on artificial intelligence algorithm |
CN117079416B (en) * | 2023-10-16 | 2023-12-26 | 德心智能科技(常州)有限公司 | Multi-person 5D radar falling detection method and system based on artificial intelligence algorithm |
CN117237613A (en) * | 2023-11-03 | 2023-12-15 | 华诺星空技术股份有限公司 | Foreign matter intrusion detection method, device and storage medium based on convolutional neural network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Han et al. | Underwater image processing and object detection based on deep CNN method | |
CN110298262B (en) | Object identification method and device | |
CN111325794B (en) | Visual simultaneous localization and map construction method based on depth convolution self-encoder | |
CN109584248B (en) | Infrared target instance segmentation method based on feature fusion and dense connection network | |
CN113052835B (en) | Medicine box detection method and system based on three-dimensional point cloud and image data fusion | |
CN116682105A (en) | Millimeter wave radar and visual feature attention fusion target detection method | |
CN110910437B (en) | Depth prediction method for complex indoor scene | |
CN114255197B (en) | Infrared and visible light image self-adaptive fusion alignment method and system | |
CN115423734B (en) | Infrared and visible light image fusion method based on multi-scale attention mechanism | |
Han et al. | A context-scale-aware detector and a new benchmark for remote sensing small weak object detection in unmanned aerial vehicle images | |
Wang et al. | A feature-supervised generative adversarial network for environmental monitoring during hazy days | |
CN117422971A (en) | Bimodal target detection method and system based on cross-modal attention mechanism fusion | |
Chen et al. | Scene segmentation of remotely sensed images with data augmentation using U-net++ | |
CN117037142A (en) | 3D target detection method based on deep learning | |
Zhang et al. | PSNet: Perspective-sensitive convolutional network for object detection | |
CN118351410A (en) | Multi-mode three-dimensional detection method based on sparse agent attention | |
Jia et al. | Drone-NeRF: Efficient NeRF based 3D scene reconstruction for large-scale drone survey | |
CN117612153A (en) | Three-dimensional target identification and positioning method based on image and point cloud information completion | |
Zhou et al. | Improved YOLOv7 models based on modulated deformable convolution and swin transformer for object detection in fisheye images | |
Gupta | A survey of techniques and applications for real time image processing | |
CN116778262A (en) | Three-dimensional target detection method and system based on virtual point cloud | |
Tang et al. | Encoder-decoder structure with the feature pyramid for depth estimation from a single image | |
Chen et al. | Monocular image depth prediction without depth sensors: An unsupervised learning method | |
CN115984568A (en) | Target detection method in haze environment based on YOLOv3 network | |
CN115862000A (en) | Target detection method, target detection device, vehicle and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |