CN116469095A

CN116469095A - Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion

Info

Publication number: CN116469095A
Application number: CN202310437055.8A
Authority: CN
Inventors: 张晖; 孙恩东; 赵海涛; 朱洪波
Original assignee: Nanjing University of Posts and Telecommunications
Current assignee: Nanjing University of Posts and Telecommunications
Priority date: 2023-04-21
Filing date: 2023-04-21
Publication date: 2023-07-21

Abstract

The invention provides a scene self-adaptive three-dimensional target detection method for radar sensing fusion, which comprises the steps of firstly respectively extracting original point cloud characteristics and image characteristics from input point cloud data and image data; then projecting the original point cloud onto an image feature map through matrix transformation, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud; a scene-driven three-dimensional target detection network is built again, a time-of-day induction factor and a scene scale factor are proposed according to the weather time condition and the target scale size in the current scene, point-by-point fusion of the characteristic and the global characteristic of the original point cloud characteristic is realized in a self-adaptive mode according to the time-of-day field induction factor and the scene scale factor, the global fusion characteristic is input into a three-dimensional region generation network to generate a region of interest, and the region characteristic candidates are extracted and sent into the three-dimensional target detection network to complete three-dimensional target detection; and finally, detecting the target in real time by using the trained three-dimensional target detection network.

Description

Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion

Technical Field

The invention relates to a scene self-adaptive three-dimensional target detection method for radar sensing fusion, and belongs to the field of deep learning.

Background

In recent years, with the rapid development of deep learning and the appearance of high-performance graphics cards, the development of a three-dimensional target detection method is greatly promoted. The three-dimensional target detection based on the multi-source data can automatically fuse the target characteristics, and the information of different modes is complemented, so that the detection precision and the scene applicability are improved to a great extent, and the three-dimensional target detection method based on the deep learning is popular in research.

The three-dimensional target detection method based on the multi-source data aims to solve the problems that: finding out how a more accurate and efficient method utilizes multi-source data to enable it to complement the respective inherent defects, thereby improving the accuracy of the method. Most of the existing fusion methods are to directly project the point cloud onto a two-dimensional plane, then extract the characteristics of the two-dimensional plane of the point cloud and the characteristics of the image and simply splice the characteristics, so that the problems of inconsistent distribution of data fields of different modes, information redundancy in direct matching and multiple interference factors exist. Meanwhile, the scene with serious shielding or lower brightness has more characteristic noise, and harmful information can be introduced in the characteristic fusion process, so that the accuracy of the detection result is reduced.

Disclosure of Invention

In order to overcome the defects of the existing method, the invention provides a scene self-adaptive three-dimensional target detection method for the radar sensing fusion, which comprises the steps of firstly respectively extracting original point cloud characteristics and image characteristics from input point cloud data and image data; then projecting the original point cloud onto an image feature map through matrix transformation, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud; a scene-driven three-dimensional target detection network is built again, a time-of-day induction factor and a scene scale factor are proposed according to the weather time condition and the target scale size in the current scene, point-by-point fusion of the characteristic and the global characteristic of the original point cloud characteristic is realized in a self-adaptive mode according to the time-of-day field induction factor and the scene scale factor, the global fusion characteristic is input into a three-dimensional region generation network to generate a region of interest, and the region characteristic candidates are extracted and sent into the three-dimensional target detection network to complete three-dimensional target detection; and finally, detecting the target in real time by using the trained three-dimensional target detection network.

In order to solve the technical problems, the invention adopts the following technical scheme:

a scene self-adaptive three-dimensional target detection method for radar sensing fusion comprises the following specific processes:

step 1, respectively extracting original point cloud characteristics and image characteristics from input point cloud data and image data;

step 2, projecting the input point cloud data onto the image features, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud data;

and 3, constructing a scene-driven three-dimensional target detection network, and detecting the target in real time by using the trained three-dimensional target detection network.

Further, the specific process of extracting the original point cloud features from the input point cloud data in the step 1 is as follows:

for point cloud data, firstly randomly selecting one point cloud data as initial point cloud data; randomly sampling the point cloud data in a mode of furthest point sampling in sequence, taking each acquired point cloud data as a sphere center after acquisition is completed, and designating a radius to form a sphere, wherein all the point cloud data contained in each sphere are called as a cluster; and finally, extracting the characteristics of each point cloud data in each cluster, carrying out maximum pooling on the channel dimension to obtain cluster characteristics, and combining all cluster characteristics to obtain the original point cloud characteristics.

Further, the specific process of extracting the image features from the input image data in the step 1 is as follows:

for image data, four concatenated convolution blocks are input first, and each convolution block is followed by a batch normalization layer and a ReLU activation function; and then, the output of the last convolution block is subjected to transposition convolution with four different steps in parallel to obtain four feature images with the same resolution, and the four feature images with the same resolution are spliced to obtain image features.

Further, the specific process of the step 2 is as follows:

step 2.1, projecting the input point cloud data root onto image features;

step 2.2, dividing point-by-point fusion areas of point cloud projection points:

wherein h is _i And w _i Respectively with the ith point cloud data P _i Point cloud projection point P on image feature _i ' the height and width of the point-by-point fusion area divided when being the center point;is an upward rounding function; phi (phi) _i Is P _i The corresponding weight coefficient of the' weight coefficient, is P _i 'corresponding discrete coefficients,'>k _min And k _max Representing discrete coefficient sets +.>Minimum and maximum values of (a); h and W denote the height and width of the image data, respectively; i, j e [1,2 ], L]L is the number of projection points of the point cloud; d, d _ij Is p' _i And the jth point cloud data P _j Point cloud projection point p 'on image feature' _j Is a distance of (2);

step 2.3, uniformly dividing the point-by-point fusion area into I rows and J columns, and calculating the coordinates of the central point of each grid;

step 2.4, obtaining a bilinear interpolation result and a nearest neighbor interpolation result of the center point of each grid;

step 2.5, carrying out weighted fusion on the bilinear interpolation result and the nearest neighbor interpolation result of the central point of each grid to obtain an interpolation feature vector of the corresponding grid;

step 2.6, calculating a center distance measurement coefficient corresponding to each grid, and normalizing;

step 2.7, carrying out weighted summation on interpolation feature vectors of each grid according to the normalized center distance measurement coefficient to obtain point fusion features of corresponding point cloud projection points;

and 2.8, splicing the point fusion characteristics of all the point cloud projection points to obtain the point-by-point fusion characteristics of the point cloud data.

Further, in the step 2.1, the input point cloud data is projected onto the image feature according to the following formula:

P _i ′＝K _r [R _c |T _c ]P _i

wherein P is _i ' ith point cloud data P _i A point cloud projection point on the image feature; k (K) _r Is a camera internal reference matrix; r is R _c For rotating matrix, T _c Is a projection matrix.

Further, the calculation formula of the coordinates of the lattice center point in the step 2.3 is as follows:

wherein, (u) _ab ,v _ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; a=1, 2, the combination of the first and second components, b=1, 2,..j; h is the height of the point-by-point fusion area, and w is the width of the point-by-point fusion area; and (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.

Further, in the step 2.4, the result f of bilinear interpolation of the center point of the a-th row and b-th column lattice in the point-by-point fusion area _{ab_1} And the nearest neighbor interpolation result f _{ab_2} The calculation process of (2) is as follows:

f _{ab_1} ＝f ₁ (u-u ₁ )(v-v ₁ )+f ₂ (u-u ₁ )(v ₂ -v)+f ₃ (u ₂ -u)(v-v ₁ )+f ₄ (u ₂ -u)(v ₂ -v)

wherein, (u) ₁ ,v ₁ )、(u ₁ ,v ₂ )、(u ₂ ,v ₁ )、(u ₂ ,v ₂ ) Coordinates of the cloud projection points of four nearest neighbors of the grid center point are respectively: d, d ₁ ,d ₂ ,d ₃ ,d ₄ {d ₁ ,d ₂ ,d ₃ ,d ₄ The distance from the projection point of the cloud of four nearest neighbors to the center point of the grid; f (f) ₁ 、f ₂ 、f ₃ 、f ₄ The feature vectors corresponding to the four nearest neighbor cloud projection points,is the characteristic vector corresponding to the nearest neighbor point cloud projection point of the grid center point.

Further, in the step 2.5, the interpolation feature vector f of the a-th row and b-th column lattices in the point-by-point fusion area _ab The calculation formula of (2) is as follows:

wherein f _{ab_1} And f _{ab_2} Respectively obtaining a bilinear interpolation result and a nearest neighbor interpolation result of the central point of the a-th row and b-th column lattice in the point-by-point fusion area;for the aggregation factor> N _d Is the length of the feature vector dimension,/->f _{ω_ψ} Representing the eigenvalue of the omega th eigenvector in the phi-th dimension; mu (mu) _ψ Is the mean of the values in the dimension ψ; sigma (sigma) _ψ Is the standard deviation of the values in dimension ψ.

Further, in the step 2.6, a center distance measurement coefficient η corresponding to the a-th row and b-th column lattices in the point-by-point fusion area _ab Normalized result eta _ab The' calculation formula is as follows:

wherein, (u) _ab ,v _ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; min (u, u) _ab ) Represents u _ab And the smaller of u, max (u, u _ab ) Represents u _ab And the larger of u; min (v, v) _ab ) Representing v and v _ab In (c) is smaller, max (v, v _ab ) Representing v and v _ab And (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.

Further, in the step 2.7, the point fusion feature f of the point cloud projection point corresponding to the a-th row and b-th column grid in the point fusion area is obtained _LI The calculation formula of (2) is as follows:

wherein eta _ab ' Point-by-Point fusionNormalization result f of center distance measurement coefficient corresponding to a row and a column of grids in combination area _ab The interpolation feature vector of the row a and the column b in the point-by-point fusion area.

Further, in the step 2, before projecting the input point cloud data onto the image feature, the method further includes performing point cloud filtering on the input point cloud data.

Further, the specific process of the step 3 is as follows:

step 3.1, fusing the features F point by point _LI And original point cloud feature F _L Inputting a global feature fusion network, and carrying out global feature fusion:

first, F is connected through two full connection layers _LI And F _L Mapping into the same channel dimension;

then, the time of day induction factor alpha is adaptively adjusted, and added to form compact characteristics, and the compact characteristics are sent into a full-connection layer to obtainWherein FC represents a fully connected layer; the operation of multiplication by element is as follows;is an element-by-element addition operation; alpha is an induction factor of the time of day;

F′ _LI adjusting weights for the point-by-point registration features; tan is a tan activation function, sigmoid is a sigmoid activation function; w (W) _map Registering the feature weight map point by point; f' _LI The point-by-point registration feature after adjustment;the characteristic splicing operation is that; f' _LI Is a global fusion feature;

then, F 'is added' _LI Compressed into a weight graph W _map ，W _map With F' _LI Multiplication to obtain F _LI The method comprises the steps of carrying out a first treatment on the surface of the Wherein F' _LI ＝W⊙F′ _LI ，W _map ＝sigmoid(FC(tanh(F′ _LI ) A) is set forth; tanh is tanh activation function, sigmoid is sigmoid activation function；

Finally, performing scale judgment through a scene scale factor beta, and performing feature stitching to obtain global fusion features;

step 3.2, the global fusion feature F ', is fused' _LI Inputting a three-dimensional region generation network to generate a region of interest, extracting region characteristics of the region of interest, and then sending the region characteristics into a three-dimensional target detection network to finish detection;

further, in the step 3.1, the method for calculating the time of day sensing factor α is as follows:

(1) Converting input image data from an RGB color space to a YCbCr color space;

(2) The calculation formula of the time of day sensing factor alpha is as follows:

wherein Y+Cb+Cr represents the total number of the three components Y, cb and Cr in the input image data and the prior image for normalization; y is Y _{cur_χ} 、Cb _{cur_χ} 、Cr _{cur_χ} Representing the number of X, Y, cb and Cr components in the input image data; y is Y _{bas_χ} 、Cb _{bas_χ} 、Cr _{bas_χ} Representing the number of X values of three components of Y, cb and Cr in the prior image.

Further, in the step 3.1, the calculation process of the scene scale factor β is as follows:

firstly, carrying out semantic segmentation on input image data by using a U-Net++ network to obtain the category of each pixel point in the input image data;

then, the point cloud data are projected onto a segmentation result diagram output by a U-Net++ network, and the category of the projection point is judged by adopting a K nearest neighbor method;

finally, calculating a scene scale factor according to the number of projection points of each category, wherein the calculation formula is as follows:

wherein beta is _max The number of proxels, beta, owned by the category that owns the most proxels _min The number of proxels owned by the category that owns the least proxel.

Compared with the prior art, the technical scheme provided by the invention has the following technical effects:

according to the invention, more detailed fusion of the point cloud and the image data is realized, more detailed information of the target is obtained, and the precision of three-dimensional target detection is improved; and simultaneously, the time-of-day sensing factors and the scene scale factors are provided for different scenes, so that the algorithm has better robustness in a complex scene and has good application prospect in an intelligent driving scene.

Drawings

FIG. 1 is a flow chart of a scene-adaptive three-dimensional object detection method of the present invention for radar sensing fusion;

FIG. 2 is a schematic diagram of a fused region segmentation;

fig. 3 is a global feature fusion network.

Detailed Description

The invention will be described in further detail with reference to the accompanying drawings and examples.

As shown in fig. 1, the scene self-adaptive three-dimensional target detection method for the radar sensing fusion comprises the following specific steps:

and step 1, respectively extracting original point cloud characteristics and image characteristics from the input point cloud data and image data.

For point cloud data, firstly randomly selecting a point as an initial point, then randomly sampling in the input point cloud data in sequence by using the mode of furthest point sampling, forming a sphere by taking each acquired point as a sphere center designated radius after acquisition is completed, enabling all point clouds contained in the sphere to be called a cluster, finally extracting features of each point cloud in the cluster, carrying out maximum pooling on channel dimensions to obtain cluster features, and combining all cluster features to obtain the original point cloud features.

For image data, adopting four cascaded convolution blocks to perform feature extraction, wherein each convolution block consists of two 3×3 convolution layers, and a batch normalization layer and a ReLU activation function are arranged behind each convolution block; and then, further restoring the feature images with different scales to the same resolution by using four parallel transpose convolutions with different step sizes, and then splicing the feature images to obtain the image features containing richer image semantic information.

And 2, projecting the input point cloud data onto the image features through matrix transformation, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud.

And 2.1, performing point cloud filtering processing on the original point cloud data.

Step 2.2, projecting the filtered point cloud onto the image features according to the following formula:

P _i ′＝K _r [R _c |T _c ]P _i

Step 2.3, calculating a point-by-point fusion area of projection points according to discrete coefficients based on the point cloud projection points, wherein the discrete coefficients are as follows:

definition of ith Point cloud data P _i Point cloud projection point P on image feature _i ' discrete coefficientsThe calculation formula is as follows:

wherein d is _ij Is p' _i And the jth point cloud data P _j Point cloud projection point p 'on image feature' _j Is a distance of (2); l is point cloud projectionThe number of point cloud projection points shadow onto the image; h and W denote the height and width of the input image data, respectively; (x) _i ,y _i )、(x _j ,y _j ) Respectively p' _i 、p′ _j Coordinates of (c); repeating the above calculation to obtain a discrete coefficient set composed of discrete coefficients of the point cloud projection points

Further, the discrete coefficient of each point cloud projection point is normalized and encoded to obtain a weight coefficient of each point cloud projection point, and a point-by-point fusion area of the point cloud projection point is calculated by using the weight coefficient, wherein the calculation formula is as follows:

wherein phi is _i Is the projection point p' _i Corresponding weight coefficients;and->Respectively representing the minimum value and the maximum value in the discrete coefficient set; h is a _i And w _i Respectively p' _i The height and width of the point-by-point fusion area divided when the center point is adopted; />As a round-up function.

Step 2.4, after obtaining a point-by-point fusion area of projection points, dividing the point-by-point fusion area into I rows and J columns, obtaining I multiplied by J grids with the same size at the moment, and calculating the central point of each grid, wherein the calculation formula is as follows:

wherein (u) _ab ,v _ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; a=1, 2, the combination of the first and second components, b=1, 2,..j; h is the height of the point-by-point fusion area, and w is the width of the point-by-point fusion area; and (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.

Step 2.5, calculating the center point (u _ab ,v _ab ) Results f of bilinear interpolation _{ab_1} And the nearest neighbor interpolation result f _{ab_2} The calculation process is as follows:

wherein (u) ₁ ,v ₁ )、(u ₁ ,v ₂ )、(u ₂ ,v ₁ )、(u ₂ ,v ₂ ) Coordinates of four nearest neighbor point cloud projection points of the grid center point are respectively; d, d ₁ ,d ₂ ,d ₃ ,d ₄ {d ₁ ,d ₂ ,d ₃ ,d ₄ The distance from the projection point of the cloud of four nearest neighbors to the center point of the grid; f (f) ₁ 、f ₂ 、f ₃ 、f ₄ The feature vectors corresponding to the four nearest neighbor cloud projection points,is the characteristic vector corresponding to the nearest neighbor point cloud projection point of the grid center point.

And 2.6, carrying out weighted fusion on the bilinear interpolation result and the nearest neighbor interpolation result of the central point of each grid to obtain the interpolation feature vector of the corresponding grid, wherein the method comprises the following specific steps:

defining the aggregation coefficientThe calculation formula is as follows:

wherein N is _d Is the length of the feature vector dimension; f (f) _{ω_ψ} ,ω∈{1,2,3,4},ψ∈{1,2,...N _d The ω -th feature vector f _ω A feature value in the ψ -th dimension; mu (mu) _ψ Is the mean of the values in the dimension ψ; sigma (sigma) _ψ Is the standard deviation of the value on the size of the ψ -th dimension;is the aggregate coefficient corresponding to the ψ -th dimension.

Weighting and fusing the bilinear interpolation result and the nearest neighbor interpolation result based on the aggregation coefficient to obtain an interpolation feature vector f corresponding to each grid _ab The calculation formula is as follows:

step 2.7, according to the center point position (u _ab ,v _ab ) Calculating a center distance measurement coefficient eta corresponding to each grid _ab And normalization is achieved, specifically as follows:

center distance measurement coefficient eta corresponding to a row and a column of grids in point-by-point fusion area _ab Normalized result eta _ab The' calculation formula is as follows:

wherein min (u, u) _ab ) Represents u _ab And the smaller of u, max (u, u _ab ) Represents u _ab And the larger of u; min (v, v) _ab ) Representing v and v _ab In (c) is smaller, max (v, v _ab ) Representing v and v _ab A larger value of (2); η (eta) _ab ' is the normalized center distance metric coefficient; and (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.

Step 2.8, obtaining interpolation characteristic vector f for each grid according to the normalized center distance measurement coefficient _ab And carrying out weighting operation to obtain point fusion characteristics corresponding to the projection points, wherein the formula is as follows:

wherein f _LI And (5) the point fusion characteristic of the point cloud projection point.

Step 2.9, combining the point fusion features corresponding to all the point cloud projection points to obtain a point-by-point fusion feature F _LI 。

And 3, constructing a scene-driven three-dimensional target detection network, providing a time-of-day sensing factor and a scene scale factor according to the weather time condition and the target scale size in the current scene, realizing the point-by-point fusion of the point-by-point fusion characteristics and the global characteristics of the original point cloud characteristics according to the time-of-day sensing factor and the scene scale factor, inputting the global fusion characteristics into a three-dimensional region generation network to generate a region of interest, extracting region characteristic candidates, and sending the region characteristic candidates into the three-dimensional target detection network to complete three-dimensional target detection.

Compared with the traditional RGB color space, the YCbCr color space separates the brightness and the color information, and can judge the image quality of the current scene, so that the image data is converted into the YCbCr color space from the RGB color space, and the calculation formula is as follows:

wherein R, G, B represents the red, green, blue components in the RGB color space, respectively; y, cb, cr represent a luminance component, a blue component, a red component in the YCbCr color space, respectively.

In general, the image quality output by the camera is relatively high under sunny days in daytime and under proper illumination, so that the time and weather are taken as priori knowledge, an image in sunny days in daytime and under proper illumination is selected as an priori image, if the change of the image output by the current scene (namely, the input image data) relative to the prior image is larger, the image quality of the current scene is lower, and the point-by-point fusion characteristic needs to be adaptively adjusted through the original point cloud characteristic, so that the image quality is judged by defining a time-of-day sensing factor alpha, and the calculation formula is as follows:

wherein Y+Cb+Cr represents the total number of the Y, cb and Cr components in the current scene image and the prior image for normalization; y is Y _{cur_i} 、Cb _{cur_i} 、Cr _{cur_i} Representing the number of the Y, cb and Cr component values as i under the current image; y is Y _{bas_i} 、Cb _{bas_i} 、Cr _{bas_i} Representing the number of the three component values of Y, cb and Cr as i under the prior image; when alpha is larger, the current scene image is larger in change relative to the prior image, and the quality of the current scene image is poorer; the smaller alpha indicates the better the current scene image quality.

However, even in a scene with good weather, when the targets in the scene are smaller or far away, even the target point cloud data acquired by the high-precision laser radar is too sparse, in this case, the information of the small-scale targets is less or has disappeared in the original point cloud features, but the point-by-point fusion features can provide more effective semantic information of the small-scale targets, and at the moment, the original point cloud features possibly introduce interference information in the global feature fusion process, so that the section judges the scale of the targets in the current scene by defining a scene scale factor beta, and the specific process is as follows:

firstly, carrying out semantic segmentation on a current scene image by using a U-Net++ network to obtain the category of each pixel point in the image, then projecting point cloud data onto a segmentation result diagram output by the U-Net++ network, and judging the category of a projection point by adopting a K nearest neighbor method; finally, calculating a scene scale factor according to the number of projection points of each category, wherein the calculation formula is as follows:

wherein beta is _max The number of proxels, beta, owned by the category that owns the most proxels _min The number of proxels owned by the category having the least proxels; when beta is smaller, the larger the scale difference between different targets is, the less effective information is in the original point cloud features, and the smaller the proportion occupied by the original point cloud features in the global feature fusion is; when the larger beta indicates that the smaller the scale gap between different targets, the more effective information in the original point cloud features, the larger the proportion of the original point cloud features occupied in the global feature fusion should be.

The effective degree of the point cloud and the image data in the current time of day scene can be judged by defining the two factors, and the point cloud-image global feature fusion can be realized according to the time of day sensing factor and the scene scale factor, and the calculation formula is as follows:

W _map ＝sigmoid(FC(tanh(F′ _LI )))

F″ _LI ＝W⊙F′ _LI

wherein FC is a fully connected layer; the operation of multiplication by element is as follows;is an element-by-element addition operation; alpha is an induction factor of the time of day; f'. _LI Adjusting weights for the point-by-point registration features; tan is a tan activation function, sigmoid is a sigmoid activation function; w (W) _map Registering the feature weight map point by point; f' _LI The point-by-point registration feature after adjustment; />The characteristic splicing operation is that; f'. _LI Is a global fusion feature.

In the global feature fusion network, two features are mapped into the same channel dimension through two full-connection layers, point-by-point registration features are adaptively adjusted through an astronomical sensing factor alpha, compact feature candidates are formed by adding the feature registration features into the full-connection layer, the compact feature candidates are compressed into a weight map and then multiplied by the point-by-point registration features to obtain adjusted point-by-point registration features, and finally feature importance is judged by multiplying the features by a scene scale factor beta, and feature stitching is performed to complete global feature fusion.

And finally, inputting the global fusion characteristics into a three-dimensional region generation network to generate a region of interest, extracting region characteristic candidates, and sending the region characteristic candidates into a three-dimensional target detection network to complete three-dimensional target detection.

The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims

1. The scene self-adaptive three-dimensional target detection method based on the radar sensing fusion is characterized by comprising the following specific processes of:

2. The method for detecting a scene-adaptive three-dimensional target by using a radar sensor fusion according to claim 1, wherein the specific process of extracting the original point cloud features from the input point cloud data in the step 1 is as follows:

3. The method for detecting a scene-adaptive three-dimensional object by using a radar sensor fusion according to claim 1, wherein the specific process of extracting the image features from the input image data in the step 1 is as follows:

4. The method for detecting a scene-adaptive three-dimensional target by using a radar sensing fusion according to claim 1, wherein the specific process of the step 2 is as follows:

step 2.1, projecting the input point cloud data onto image features;

wherein h is _i And w _i Respectively with the ith point cloud data P _i Point cloud projection point P on image feature _i ' the height and width of the point-by-point fusion area divided when being the center point;is an upward rounding function; phi (phi) _i Is P _i The corresponding weight coefficient of the' weight coefficient,k _pi′ is P _i 'corresponding discrete coefficients,'>k _min And k _max Representing discrete coefficient sets +.>Minimum and maximum values of (a); h and W denote the height and width of the image data, respectively; i, j e [1,2 ], L]L is the number of projection points of the point cloud; d, d _ij Is p' _i And the jth point cloud data P _j Point cloud projection point p 'on image feature' _j Is a distance of (2);

5. The method for detecting a scene-adaptive three-dimensional object in a radar sensor fusion according to claim 4, wherein in the step 2.4, the result f of bilinear interpolation of the center point of the a-th row and b-th column lattices in the point-by-point fusion area is _{ab_1} And the nearest neighbor interpolation result f _{ab_2} The calculation process of (2) is as follows:

wherein, (u) ₁ ,v ₁ )、(u ₁ ,v ₂ )、(u ₂ ,v ₁ )、(u ₂ ,v ₂ ) Coordinates of four nearest neighbor point cloud projection points of the grid center point are respectively; d, d ₁ ,d ₂ ,d ₃ ,d ₄ {d ₁ ,d ₂ ,d ₃ ,d ₄ The distance from the projection point of the four nearest neighbor point clouds to the center point of the gridSeparating; f (f) ₁ 、f ₂ 、f ₃ 、f ₄ The feature vectors corresponding to the four nearest neighbor cloud projection points,is the characteristic vector corresponding to the nearest neighbor point cloud projection point of the grid center point.

6. The method for detecting a scene-adaptive three-dimensional object in a radar sensor fusion according to claim 4, wherein in the step 2.5, the interpolation feature vector f of the a-th row and b-th column lattices in the point-by-point fusion area _ab The calculation formula of (2) is as follows:

7. The method for detecting a scene-adaptive three-dimensional object in a radar sensor fusion according to claim 4, wherein in the step 2.7, the a-th row and the b-th column are fused in a point-by-point fusion areaPoint fusion characteristics f of point cloud projection points corresponding to grids _LI The calculation formula of (2) is as follows:

wherein f _ab Interpolation characteristic vector eta of a row and a column of grids in the point-by-point fusion area _ab ' is the normalization result of the center distance measurement coefficient corresponding to the a-th row and b-th column lattices in the point-by-point fusion area, (u _ab ,v _ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; min (u, u) _ab ) Represents u _ab And the smaller of u, max (u, u _ab ) Represents u _ab And the larger of u; min (v, v) _ab ) Representing v and v _ab In (c) is smaller, max (v, v _ab ) Representing v and v _ab And (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.

8. The method for detecting a scene-adaptive three-dimensional target by using a radar sensor fusion according to claim 1, wherein the specific process of the step 3 is as follows:

then, the time of day induction factor alpha is adaptively adjusted, and added to form compact characteristics, and the compact characteristics are sent into a full-connection layer to obtainWherein FC represents a fully connected layer; the addition of the root of Manchurian ashElement-by-element multiplication operations; />Is an element-by-element addition operation; alpha is an induction factor of the time of day;

F′ _LI adjusting weights for the point-by-point registration features; tan is a tan activation function, sigmoid is a sigmoid activation function; w (W) _map Registering the feature weight map point by point; f' _LI The point-by-point registration feature after adjustment;the characteristic splicing operation is that; f'. _LI Is a global fusion feature;

then, F 'is added' _LI Compressed into a weight graph W _map ，W _map With F' _LI Multiplication to obtain F _LI The method comprises the steps of carrying out a first treatment on the surface of the Wherein F' _LI ＝W⊙F′ _LI ，W _map ＝sigmoid(FC(tanh(F′ _LI ) A) is set forth; tan is a tan activation function, sigmoid is a sigmoid activation function;

step 3.2, the global fusion feature F ', is fused' _LI Inputting the three-dimensional region generation network to generate a region of interest, extracting the region characteristics of the region of interest, and then sending the region characteristics into the three-dimensional target detection network to finish detection.

9. The method for detecting a scene-adaptive three-dimensional object by fusion of radar sensing according to claim 8, wherein in the step 3.1, the method for calculating the time-of-day sensing factor α is as follows:

(1) Converting input image data from an RGB color space to a YCbCr color space;

10. The method for detecting a scene-adaptive three-dimensional object in a radar sensing fusion according to claim 8, wherein in the step 3.1, the scene scale factor β is calculated as follows: