CN116469095A - Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion - Google Patents

Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion Download PDF

Info

Publication number
CN116469095A
CN116469095A CN202310437055.8A CN202310437055A CN116469095A CN 116469095 A CN116469095 A CN 116469095A CN 202310437055 A CN202310437055 A CN 202310437055A CN 116469095 A CN116469095 A CN 116469095A
Authority
CN
China
Prior art keywords
point
fusion
point cloud
scene
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310437055.8A
Other languages
Chinese (zh)
Inventor
张晖
孙恩东
赵海涛
朱洪波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202310437055.8A priority Critical patent/CN116469095A/en
Publication of CN116469095A publication Critical patent/CN116469095A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Processing (AREA)

Abstract

The invention provides a scene self-adaptive three-dimensional target detection method for radar sensing fusion, which comprises the steps of firstly respectively extracting original point cloud characteristics and image characteristics from input point cloud data and image data; then projecting the original point cloud onto an image feature map through matrix transformation, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud; a scene-driven three-dimensional target detection network is built again, a time-of-day induction factor and a scene scale factor are proposed according to the weather time condition and the target scale size in the current scene, point-by-point fusion of the characteristic and the global characteristic of the original point cloud characteristic is realized in a self-adaptive mode according to the time-of-day field induction factor and the scene scale factor, the global fusion characteristic is input into a three-dimensional region generation network to generate a region of interest, and the region characteristic candidates are extracted and sent into the three-dimensional target detection network to complete three-dimensional target detection; and finally, detecting the target in real time by using the trained three-dimensional target detection network.

Description

Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion
Technical Field
The invention relates to a scene self-adaptive three-dimensional target detection method for radar sensing fusion, and belongs to the field of deep learning.
Background
In recent years, with the rapid development of deep learning and the appearance of high-performance graphics cards, the development of a three-dimensional target detection method is greatly promoted. The three-dimensional target detection based on the multi-source data can automatically fuse the target characteristics, and the information of different modes is complemented, so that the detection precision and the scene applicability are improved to a great extent, and the three-dimensional target detection method based on the deep learning is popular in research.
The three-dimensional target detection method based on the multi-source data aims to solve the problems that: finding out how a more accurate and efficient method utilizes multi-source data to enable it to complement the respective inherent defects, thereby improving the accuracy of the method. Most of the existing fusion methods are to directly project the point cloud onto a two-dimensional plane, then extract the characteristics of the two-dimensional plane of the point cloud and the characteristics of the image and simply splice the characteristics, so that the problems of inconsistent distribution of data fields of different modes, information redundancy in direct matching and multiple interference factors exist. Meanwhile, the scene with serious shielding or lower brightness has more characteristic noise, and harmful information can be introduced in the characteristic fusion process, so that the accuracy of the detection result is reduced.
Disclosure of Invention
In order to overcome the defects of the existing method, the invention provides a scene self-adaptive three-dimensional target detection method for the radar sensing fusion, which comprises the steps of firstly respectively extracting original point cloud characteristics and image characteristics from input point cloud data and image data; then projecting the original point cloud onto an image feature map through matrix transformation, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud; a scene-driven three-dimensional target detection network is built again, a time-of-day induction factor and a scene scale factor are proposed according to the weather time condition and the target scale size in the current scene, point-by-point fusion of the characteristic and the global characteristic of the original point cloud characteristic is realized in a self-adaptive mode according to the time-of-day field induction factor and the scene scale factor, the global fusion characteristic is input into a three-dimensional region generation network to generate a region of interest, and the region characteristic candidates are extracted and sent into the three-dimensional target detection network to complete three-dimensional target detection; and finally, detecting the target in real time by using the trained three-dimensional target detection network.
In order to solve the technical problems, the invention adopts the following technical scheme:
a scene self-adaptive three-dimensional target detection method for radar sensing fusion comprises the following specific processes:
step 1, respectively extracting original point cloud characteristics and image characteristics from input point cloud data and image data;
step 2, projecting the input point cloud data onto the image features, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud data;
and 3, constructing a scene-driven three-dimensional target detection network, and detecting the target in real time by using the trained three-dimensional target detection network.
Further, the specific process of extracting the original point cloud features from the input point cloud data in the step 1 is as follows:
for point cloud data, firstly randomly selecting one point cloud data as initial point cloud data; randomly sampling the point cloud data in a mode of furthest point sampling in sequence, taking each acquired point cloud data as a sphere center after acquisition is completed, and designating a radius to form a sphere, wherein all the point cloud data contained in each sphere are called as a cluster; and finally, extracting the characteristics of each point cloud data in each cluster, carrying out maximum pooling on the channel dimension to obtain cluster characteristics, and combining all cluster characteristics to obtain the original point cloud characteristics.
Further, the specific process of extracting the image features from the input image data in the step 1 is as follows:
for image data, four concatenated convolution blocks are input first, and each convolution block is followed by a batch normalization layer and a ReLU activation function; and then, the output of the last convolution block is subjected to transposition convolution with four different steps in parallel to obtain four feature images with the same resolution, and the four feature images with the same resolution are spliced to obtain image features.
Further, the specific process of the step 2 is as follows:
step 2.1, projecting the input point cloud data root onto image features;
step 2.2, dividing point-by-point fusion areas of point cloud projection points:
wherein h is i And w i Respectively with the ith point cloud data P i Point cloud projection point P on image feature i ' the height and width of the point-by-point fusion area divided when being the center point;is an upward rounding function; phi (phi) i Is P i The corresponding weight coefficient of the' weight coefficient, is P i 'corresponding discrete coefficients,'>k min And k max Representing discrete coefficient sets +.>Minimum and maximum values of (a); h and W denote the height and width of the image data, respectively; i, j e [1,2 ], L]L is the number of projection points of the point cloud; d, d ij Is p' i And the jth point cloud data P j Point cloud projection point p 'on image feature' j Is a distance of (2);
step 2.3, uniformly dividing the point-by-point fusion area into I rows and J columns, and calculating the coordinates of the central point of each grid;
step 2.4, obtaining a bilinear interpolation result and a nearest neighbor interpolation result of the center point of each grid;
step 2.5, carrying out weighted fusion on the bilinear interpolation result and the nearest neighbor interpolation result of the central point of each grid to obtain an interpolation feature vector of the corresponding grid;
step 2.6, calculating a center distance measurement coefficient corresponding to each grid, and normalizing;
step 2.7, carrying out weighted summation on interpolation feature vectors of each grid according to the normalized center distance measurement coefficient to obtain point fusion features of corresponding point cloud projection points;
and 2.8, splicing the point fusion characteristics of all the point cloud projection points to obtain the point-by-point fusion characteristics of the point cloud data.
Further, in the step 2.1, the input point cloud data is projected onto the image feature according to the following formula:
P i ′=K r [R c |T c ]P i
wherein P is i ' ith point cloud data P i A point cloud projection point on the image feature; k (K) r Is a camera internal reference matrix; r is R c For rotating matrix, T c Is a projection matrix.
Further, the calculation formula of the coordinates of the lattice center point in the step 2.3 is as follows:
wherein, (u) ab ,v ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; a=1, 2, the combination of the first and second components, b=1, 2,..j; h is the height of the point-by-point fusion area, and w is the width of the point-by-point fusion area; and (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.
Further, in the step 2.4, the result f of bilinear interpolation of the center point of the a-th row and b-th column lattice in the point-by-point fusion area ab_1 And the nearest neighbor interpolation result f ab_2 The calculation process of (2) is as follows:
f ab_1 =f 1 (u-u 1 )(v-v 1 )+f 2 (u-u 1 )(v 2 -v)+f 3 (u 2 -u)(v-v 1 )+f 4 (u 2 -u)(v 2 -v)
wherein, (u) 1 ,v 1 )、(u 1 ,v 2 )、(u 2 ,v 1 )、(u 2 ,v 2 ) Coordinates of the cloud projection points of four nearest neighbors of the grid center point are respectively: d, d 1 ,d 2 ,d 3 ,d 4 {d 1 ,d 2 ,d 3 ,d 4 The distance from the projection point of the cloud of four nearest neighbors to the center point of the grid; f (f) 1 、f 2 、f 3 、f 4 The feature vectors corresponding to the four nearest neighbor cloud projection points,is the characteristic vector corresponding to the nearest neighbor point cloud projection point of the grid center point.
Further, in the step 2.5, the interpolation feature vector f of the a-th row and b-th column lattices in the point-by-point fusion area ab The calculation formula of (2) is as follows:
wherein f ab_1 And f ab_2 Respectively obtaining a bilinear interpolation result and a nearest neighbor interpolation result of the central point of the a-th row and b-th column lattice in the point-by-point fusion area;for the aggregation factor> N d Is the length of the feature vector dimension,/->f ω_ψ Representing the eigenvalue of the omega th eigenvector in the phi-th dimension; mu (mu) ψ Is the mean of the values in the dimension ψ; sigma (sigma) ψ Is the standard deviation of the values in dimension ψ.
Further, in the step 2.6, a center distance measurement coefficient η corresponding to the a-th row and b-th column lattices in the point-by-point fusion area ab Normalized result eta ab The' calculation formula is as follows:
wherein, (u) ab ,v ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; min (u, u) ab ) Represents u ab And the smaller of u, max (u, u ab ) Represents u ab And the larger of u; min (v, v) ab ) Representing v and v ab In (c) is smaller, max (v, v ab ) Representing v and v ab And (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.
Further, in the step 2.7, the point fusion feature f of the point cloud projection point corresponding to the a-th row and b-th column grid in the point fusion area is obtained LI The calculation formula of (2) is as follows:
wherein eta ab ' Point-by-Point fusionNormalization result f of center distance measurement coefficient corresponding to a row and a column of grids in combination area ab The interpolation feature vector of the row a and the column b in the point-by-point fusion area.
Further, in the step 2, before projecting the input point cloud data onto the image feature, the method further includes performing point cloud filtering on the input point cloud data.
Further, the specific process of the step 3 is as follows:
step 3.1, fusing the features F point by point LI And original point cloud feature F L Inputting a global feature fusion network, and carrying out global feature fusion:
first, F is connected through two full connection layers LI And F L Mapping into the same channel dimension;
then, the time of day induction factor alpha is adaptively adjusted, and added to form compact characteristics, and the compact characteristics are sent into a full-connection layer to obtainWherein FC represents a fully connected layer; the operation of multiplication by element is as follows;is an element-by-element addition operation; alpha is an induction factor of the time of day;
F′ LI adjusting weights for the point-by-point registration features; tan is a tan activation function, sigmoid is a sigmoid activation function; w (W) map Registering the feature weight map point by point; f' LI The point-by-point registration feature after adjustment;the characteristic splicing operation is that; f' LI Is a global fusion feature;
then, F 'is added' LI Compressed into a weight graph W map ,W map With F' LI Multiplication to obtain F LI The method comprises the steps of carrying out a first treatment on the surface of the Wherein F' LI =W⊙F′ LI ,W map =sigmoid(FC(tanh(F′ LI ) A) is set forth; tanh is tanh activation function, sigmoid is sigmoid activation function;
Finally, performing scale judgment through a scene scale factor beta, and performing feature stitching to obtain global fusion features;
step 3.2, the global fusion feature F ', is fused' LI Inputting a three-dimensional region generation network to generate a region of interest, extracting region characteristics of the region of interest, and then sending the region characteristics into a three-dimensional target detection network to finish detection;
further, in the step 3.1, the method for calculating the time of day sensing factor α is as follows:
(1) Converting input image data from an RGB color space to a YCbCr color space;
(2) The calculation formula of the time of day sensing factor alpha is as follows:
wherein Y+Cb+Cr represents the total number of the three components Y, cb and Cr in the input image data and the prior image for normalization; y is Y cur_χ 、Cb cur_χ 、Cr cur_χ Representing the number of X, Y, cb and Cr components in the input image data; y is Y bas_χ 、Cb bas_χ 、Cr bas_χ Representing the number of X values of three components of Y, cb and Cr in the prior image.
Further, in the step 3.1, the calculation process of the scene scale factor β is as follows:
firstly, carrying out semantic segmentation on input image data by using a U-Net++ network to obtain the category of each pixel point in the input image data;
then, the point cloud data are projected onto a segmentation result diagram output by a U-Net++ network, and the category of the projection point is judged by adopting a K nearest neighbor method;
finally, calculating a scene scale factor according to the number of projection points of each category, wherein the calculation formula is as follows:
wherein beta is max The number of proxels, beta, owned by the category that owns the most proxels min The number of proxels owned by the category that owns the least proxel.
Compared with the prior art, the technical scheme provided by the invention has the following technical effects:
according to the invention, more detailed fusion of the point cloud and the image data is realized, more detailed information of the target is obtained, and the precision of three-dimensional target detection is improved; and simultaneously, the time-of-day sensing factors and the scene scale factors are provided for different scenes, so that the algorithm has better robustness in a complex scene and has good application prospect in an intelligent driving scene.
Drawings
FIG. 1 is a flow chart of a scene-adaptive three-dimensional object detection method of the present invention for radar sensing fusion;
FIG. 2 is a schematic diagram of a fused region segmentation;
fig. 3 is a global feature fusion network.
Detailed Description
The invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, the scene self-adaptive three-dimensional target detection method for the radar sensing fusion comprises the following specific steps:
and step 1, respectively extracting original point cloud characteristics and image characteristics from the input point cloud data and image data.
For point cloud data, firstly randomly selecting a point as an initial point, then randomly sampling in the input point cloud data in sequence by using the mode of furthest point sampling, forming a sphere by taking each acquired point as a sphere center designated radius after acquisition is completed, enabling all point clouds contained in the sphere to be called a cluster, finally extracting features of each point cloud in the cluster, carrying out maximum pooling on channel dimensions to obtain cluster features, and combining all cluster features to obtain the original point cloud features.
For image data, adopting four cascaded convolution blocks to perform feature extraction, wherein each convolution block consists of two 3×3 convolution layers, and a batch normalization layer and a ReLU activation function are arranged behind each convolution block; and then, further restoring the feature images with different scales to the same resolution by using four parallel transpose convolutions with different step sizes, and then splicing the feature images to obtain the image features containing richer image semantic information.
And 2, projecting the input point cloud data onto the image features through matrix transformation, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud.
And 2.1, performing point cloud filtering processing on the original point cloud data.
Step 2.2, projecting the filtered point cloud onto the image features according to the following formula:
P i ′=K r [R c |T c ]P i
wherein P is i ' ith point cloud data P i A point cloud projection point on the image feature; k (K) r Is a camera internal reference matrix; r is R c For rotating matrix, T c Is a projection matrix.
Step 2.3, calculating a point-by-point fusion area of projection points according to discrete coefficients based on the point cloud projection points, wherein the discrete coefficients are as follows:
definition of ith Point cloud data P i Point cloud projection point P on image feature i ' discrete coefficientsThe calculation formula is as follows:
wherein d is ij Is p' i And the jth point cloud data P j Point cloud projection point p 'on image feature' j Is a distance of (2); l is point cloud projectionThe number of point cloud projection points shadow onto the image; h and W denote the height and width of the input image data, respectively; (x) i ,y i )、(x j ,y j ) Respectively p' i 、p′ j Coordinates of (c); repeating the above calculation to obtain a discrete coefficient set composed of discrete coefficients of the point cloud projection points
Further, the discrete coefficient of each point cloud projection point is normalized and encoded to obtain a weight coefficient of each point cloud projection point, and a point-by-point fusion area of the point cloud projection point is calculated by using the weight coefficient, wherein the calculation formula is as follows:
wherein phi is i Is the projection point p' i Corresponding weight coefficients;and->Respectively representing the minimum value and the maximum value in the discrete coefficient set; h is a i And w i Respectively p' i The height and width of the point-by-point fusion area divided when the center point is adopted; />As a round-up function.
Step 2.4, after obtaining a point-by-point fusion area of projection points, dividing the point-by-point fusion area into I rows and J columns, obtaining I multiplied by J grids with the same size at the moment, and calculating the central point of each grid, wherein the calculation formula is as follows:
wherein (u) ab ,v ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; a=1, 2, the combination of the first and second components, b=1, 2,..j; h is the height of the point-by-point fusion area, and w is the width of the point-by-point fusion area; and (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.
Step 2.5, calculating the center point (u ab ,v ab ) Results f of bilinear interpolation ab_1 And the nearest neighbor interpolation result f ab_2 The calculation process is as follows:
f ab_1 =f 1 (u-u 1 )(v-v 1 )+f 2 (u-u 1 )(v 2 -v)+f 3 (u 2 -u)(v-v 1 )+f 4 (u 2 -u)(v 2 -v)
wherein (u) 1 ,v 1 )、(u 1 ,v 2 )、(u 2 ,v 1 )、(u 2 ,v 2 ) Coordinates of four nearest neighbor point cloud projection points of the grid center point are respectively; d, d 1 ,d 2 ,d 3 ,d 4 {d 1 ,d 2 ,d 3 ,d 4 The distance from the projection point of the cloud of four nearest neighbors to the center point of the grid; f (f) 1 、f 2 、f 3 、f 4 The feature vectors corresponding to the four nearest neighbor cloud projection points,is the characteristic vector corresponding to the nearest neighbor point cloud projection point of the grid center point.
And 2.6, carrying out weighted fusion on the bilinear interpolation result and the nearest neighbor interpolation result of the central point of each grid to obtain the interpolation feature vector of the corresponding grid, wherein the method comprises the following specific steps:
defining the aggregation coefficientThe calculation formula is as follows:
wherein N is d Is the length of the feature vector dimension; f (f) ω_ψ ,ω∈{1,2,3,4},ψ∈{1,2,...N d The ω -th feature vector f ω A feature value in the ψ -th dimension; mu (mu) ψ Is the mean of the values in the dimension ψ; sigma (sigma) ψ Is the standard deviation of the value on the size of the ψ -th dimension;is the aggregate coefficient corresponding to the ψ -th dimension.
Weighting and fusing the bilinear interpolation result and the nearest neighbor interpolation result based on the aggregation coefficient to obtain an interpolation feature vector f corresponding to each grid ab The calculation formula is as follows:
step 2.7, according to the center point position (u ab ,v ab ) Calculating a center distance measurement coefficient eta corresponding to each grid ab And normalization is achieved, specifically as follows:
center distance measurement coefficient eta corresponding to a row and a column of grids in point-by-point fusion area ab Normalized result eta ab The' calculation formula is as follows:
wherein min (u, u) ab ) Represents u ab And the smaller of u, max (u, u ab ) Represents u ab And the larger of u; min (v, v) ab ) Representing v and v ab In (c) is smaller, max (v, v ab ) Representing v and v ab A larger value of (2); η (eta) ab ' is the normalized center distance metric coefficient; and (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.
Step 2.8, obtaining interpolation characteristic vector f for each grid according to the normalized center distance measurement coefficient ab And carrying out weighting operation to obtain point fusion characteristics corresponding to the projection points, wherein the formula is as follows:
wherein f LI And (5) the point fusion characteristic of the point cloud projection point.
Step 2.9, combining the point fusion features corresponding to all the point cloud projection points to obtain a point-by-point fusion feature F LI
And 3, constructing a scene-driven three-dimensional target detection network, providing a time-of-day sensing factor and a scene scale factor according to the weather time condition and the target scale size in the current scene, realizing the point-by-point fusion of the point-by-point fusion characteristics and the global characteristics of the original point cloud characteristics according to the time-of-day sensing factor and the scene scale factor, inputting the global fusion characteristics into a three-dimensional region generation network to generate a region of interest, extracting region characteristic candidates, and sending the region characteristic candidates into the three-dimensional target detection network to complete three-dimensional target detection.
Compared with the traditional RGB color space, the YCbCr color space separates the brightness and the color information, and can judge the image quality of the current scene, so that the image data is converted into the YCbCr color space from the RGB color space, and the calculation formula is as follows:
wherein R, G, B represents the red, green, blue components in the RGB color space, respectively; y, cb, cr represent a luminance component, a blue component, a red component in the YCbCr color space, respectively.
In general, the image quality output by the camera is relatively high under sunny days in daytime and under proper illumination, so that the time and weather are taken as priori knowledge, an image in sunny days in daytime and under proper illumination is selected as an priori image, if the change of the image output by the current scene (namely, the input image data) relative to the prior image is larger, the image quality of the current scene is lower, and the point-by-point fusion characteristic needs to be adaptively adjusted through the original point cloud characteristic, so that the image quality is judged by defining a time-of-day sensing factor alpha, and the calculation formula is as follows:
wherein Y+Cb+Cr represents the total number of the Y, cb and Cr components in the current scene image and the prior image for normalization; y is Y cur_i 、Cb cur_i 、Cr cur_i Representing the number of the Y, cb and Cr component values as i under the current image; y is Y bas_i 、Cb bas_i 、Cr bas_i Representing the number of the three component values of Y, cb and Cr as i under the prior image; when alpha is larger, the current scene image is larger in change relative to the prior image, and the quality of the current scene image is poorer; the smaller alpha indicates the better the current scene image quality.
However, even in a scene with good weather, when the targets in the scene are smaller or far away, even the target point cloud data acquired by the high-precision laser radar is too sparse, in this case, the information of the small-scale targets is less or has disappeared in the original point cloud features, but the point-by-point fusion features can provide more effective semantic information of the small-scale targets, and at the moment, the original point cloud features possibly introduce interference information in the global feature fusion process, so that the section judges the scale of the targets in the current scene by defining a scene scale factor beta, and the specific process is as follows:
firstly, carrying out semantic segmentation on a current scene image by using a U-Net++ network to obtain the category of each pixel point in the image, then projecting point cloud data onto a segmentation result diagram output by the U-Net++ network, and judging the category of a projection point by adopting a K nearest neighbor method; finally, calculating a scene scale factor according to the number of projection points of each category, wherein the calculation formula is as follows:
wherein beta is max The number of proxels, beta, owned by the category that owns the most proxels min The number of proxels owned by the category having the least proxels; when beta is smaller, the larger the scale difference between different targets is, the less effective information is in the original point cloud features, and the smaller the proportion occupied by the original point cloud features in the global feature fusion is; when the larger beta indicates that the smaller the scale gap between different targets, the more effective information in the original point cloud features, the larger the proportion of the original point cloud features occupied in the global feature fusion should be.
The effective degree of the point cloud and the image data in the current time of day scene can be judged by defining the two factors, and the point cloud-image global feature fusion can be realized according to the time of day sensing factor and the scene scale factor, and the calculation formula is as follows:
W map =sigmoid(FC(tanh(F′ LI )))
F″ LI =W⊙F′ LI
wherein FC is a fully connected layer; the operation of multiplication by element is as follows;is an element-by-element addition operation; alpha is an induction factor of the time of day; f'. LI Adjusting weights for the point-by-point registration features; tan is a tan activation function, sigmoid is a sigmoid activation function; w (W) map Registering the feature weight map point by point; f' LI The point-by-point registration feature after adjustment; />The characteristic splicing operation is that; f'. LI Is a global fusion feature.
In the global feature fusion network, two features are mapped into the same channel dimension through two full-connection layers, point-by-point registration features are adaptively adjusted through an astronomical sensing factor alpha, compact feature candidates are formed by adding the feature registration features into the full-connection layer, the compact feature candidates are compressed into a weight map and then multiplied by the point-by-point registration features to obtain adjusted point-by-point registration features, and finally feature importance is judged by multiplying the features by a scene scale factor beta, and feature stitching is performed to complete global feature fusion.
And finally, inputting the global fusion characteristics into a three-dimensional region generation network to generate a region of interest, extracting region characteristic candidates, and sending the region characteristic candidates into a three-dimensional target detection network to complete three-dimensional target detection.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims (10)

1. The scene self-adaptive three-dimensional target detection method based on the radar sensing fusion is characterized by comprising the following specific processes of:
step 1, respectively extracting original point cloud characteristics and image characteristics from input point cloud data and image data;
step 2, projecting the input point cloud data onto the image features, and carrying out point-by-point feature fusion on the point cloud projection points to obtain point-by-point fusion features of the point cloud data;
and 3, constructing a scene-driven three-dimensional target detection network, and detecting the target in real time by using the trained three-dimensional target detection network.
2. The method for detecting a scene-adaptive three-dimensional target by using a radar sensor fusion according to claim 1, wherein the specific process of extracting the original point cloud features from the input point cloud data in the step 1 is as follows:
for point cloud data, firstly randomly selecting one point cloud data as initial point cloud data; randomly sampling the point cloud data in a mode of furthest point sampling in sequence, taking each acquired point cloud data as a sphere center after acquisition is completed, and designating a radius to form a sphere, wherein all the point cloud data contained in each sphere are called as a cluster; and finally, extracting the characteristics of each point cloud data in each cluster, carrying out maximum pooling on the channel dimension to obtain cluster characteristics, and combining all cluster characteristics to obtain the original point cloud characteristics.
3. The method for detecting a scene-adaptive three-dimensional object by using a radar sensor fusion according to claim 1, wherein the specific process of extracting the image features from the input image data in the step 1 is as follows:
for image data, four concatenated convolution blocks are input first, and each convolution block is followed by a batch normalization layer and a ReLU activation function; and then, the output of the last convolution block is subjected to transposition convolution with four different steps in parallel to obtain four feature images with the same resolution, and the four feature images with the same resolution are spliced to obtain image features.
4. The method for detecting a scene-adaptive three-dimensional target by using a radar sensing fusion according to claim 1, wherein the specific process of the step 2 is as follows:
step 2.1, projecting the input point cloud data onto image features;
step 2.2, dividing point-by-point fusion areas of point cloud projection points:
wherein h is i And w i Respectively with the ith point cloud data P i Point cloud projection point P on image feature i ' the height and width of the point-by-point fusion area divided when being the center point;is an upward rounding function; phi (phi) i Is P i The corresponding weight coefficient of the' weight coefficient,k pi′ is P i 'corresponding discrete coefficients,'>k min And k max Representing discrete coefficient sets +.>Minimum and maximum values of (a); h and W denote the height and width of the image data, respectively; i, j e [1,2 ], L]L is the number of projection points of the point cloud; d, d ij Is p' i And the jth point cloud data P j Point cloud projection point p 'on image feature' j Is a distance of (2);
step 2.3, uniformly dividing the point-by-point fusion area into I rows and J columns, and calculating the coordinates of the central point of each grid;
step 2.4, obtaining a bilinear interpolation result and a nearest neighbor interpolation result of the center point of each grid;
step 2.5, carrying out weighted fusion on the bilinear interpolation result and the nearest neighbor interpolation result of the central point of each grid to obtain an interpolation feature vector of the corresponding grid;
step 2.6, calculating a center distance measurement coefficient corresponding to each grid, and normalizing;
step 2.7, carrying out weighted summation on interpolation feature vectors of each grid according to the normalized center distance measurement coefficient to obtain point fusion features of corresponding point cloud projection points;
and 2.8, splicing the point fusion characteristics of all the point cloud projection points to obtain the point-by-point fusion characteristics of the point cloud data.
5. The method for detecting a scene-adaptive three-dimensional object in a radar sensor fusion according to claim 4, wherein in the step 2.4, the result f of bilinear interpolation of the center point of the a-th row and b-th column lattices in the point-by-point fusion area is ab_1 And the nearest neighbor interpolation result f ab_2 The calculation process of (2) is as follows:
f ab_1 =f 1 (u-u 1 )(v-v 1 )+f 2 (u-u 1 )(v 2 -v)+f 3 (u 2 -u)(v-v 1 )+f 4 (u 2 -u)(v 2 -v)
wherein, (u) 1 ,v 1 )、(u 1 ,v 2 )、(u 2 ,v 1 )、(u 2 ,v 2 ) Coordinates of four nearest neighbor point cloud projection points of the grid center point are respectively; d, d 1 ,d 2 ,d 3 ,d 4 {d 1 ,d 2 ,d 3 ,d 4 The distance from the projection point of the four nearest neighbor point clouds to the center point of the gridSeparating; f (f) 1 、f 2 、f 3 、f 4 The feature vectors corresponding to the four nearest neighbor cloud projection points,is the characteristic vector corresponding to the nearest neighbor point cloud projection point of the grid center point.
6. The method for detecting a scene-adaptive three-dimensional object in a radar sensor fusion according to claim 4, wherein in the step 2.5, the interpolation feature vector f of the a-th row and b-th column lattices in the point-by-point fusion area ab The calculation formula of (2) is as follows:
wherein f ab_1 And f ab_2 Respectively obtaining a bilinear interpolation result and a nearest neighbor interpolation result of the central point of the a-th row and b-th column lattice in the point-by-point fusion area;for the aggregation factor> N d Is the length of the feature vector dimension,/->f ω_ψ Representing the eigenvalue of the omega th eigenvector in the phi-th dimension; mu (mu) ψ Is the mean of the values in the dimension ψ; sigma (sigma) ψ Is the standard deviation of the values in dimension ψ.
7. The method for detecting a scene-adaptive three-dimensional object in a radar sensor fusion according to claim 4, wherein in the step 2.7, the a-th row and the b-th column are fused in a point-by-point fusion areaPoint fusion characteristics f of point cloud projection points corresponding to grids LI The calculation formula of (2) is as follows:
wherein f ab Interpolation characteristic vector eta of a row and a column of grids in the point-by-point fusion area ab ' is the normalization result of the center distance measurement coefficient corresponding to the a-th row and b-th column lattices in the point-by-point fusion area, (u ab ,v ab ) Coordinates of a center point of the grid of the a-th row and the b-th column in the point-by-point fusion area; min (u, u) ab ) Represents u ab And the smaller of u, max (u, u ab ) Represents u ab And the larger of u; min (v, v) ab ) Representing v and v ab In (c) is smaller, max (v, v ab ) Representing v and v ab And (u, v) is the coordinates of the point cloud projection points in the point-by-point fusion area.
8. The method for detecting a scene-adaptive three-dimensional target by using a radar sensor fusion according to claim 1, wherein the specific process of the step 3 is as follows:
step 3.1, fusing the features F point by point LI And original point cloud feature F L Inputting a global feature fusion network, and carrying out global feature fusion:
first, F is connected through two full connection layers LI And F L Mapping into the same channel dimension;
then, the time of day induction factor alpha is adaptively adjusted, and added to form compact characteristics, and the compact characteristics are sent into a full-connection layer to obtainWherein FC represents a fully connected layer; the addition of the root of Manchurian ashElement-by-element multiplication operations; />Is an element-by-element addition operation; alpha is an induction factor of the time of day;
F′ LI adjusting weights for the point-by-point registration features; tan is a tan activation function, sigmoid is a sigmoid activation function; w (W) map Registering the feature weight map point by point; f' LI The point-by-point registration feature after adjustment;the characteristic splicing operation is that; f'. LI Is a global fusion feature;
then, F 'is added' LI Compressed into a weight graph W map ,W map With F' LI Multiplication to obtain F LI The method comprises the steps of carrying out a first treatment on the surface of the Wherein F' LI =W⊙F′ LI ,W map =sigmoid(FC(tanh(F′ LI ) A) is set forth; tan is a tan activation function, sigmoid is a sigmoid activation function;
finally, performing scale judgment through a scene scale factor beta, and performing feature stitching to obtain global fusion features;
step 3.2, the global fusion feature F ', is fused' LI Inputting the three-dimensional region generation network to generate a region of interest, extracting the region characteristics of the region of interest, and then sending the region characteristics into the three-dimensional target detection network to finish detection.
9. The method for detecting a scene-adaptive three-dimensional object by fusion of radar sensing according to claim 8, wherein in the step 3.1, the method for calculating the time-of-day sensing factor α is as follows:
(1) Converting input image data from an RGB color space to a YCbCr color space;
(2) The calculation formula of the time of day sensing factor alpha is as follows:
wherein Y+Cb+Cr represents the total number of the three components Y, cb and Cr in the input image data and the prior image for normalization; y is Y cur_χ 、Cb cur_χ 、Cr cur_χ Representing the number of X, Y, cb and Cr components in the input image data; y is Y bas_χ 、Cb bas_χ 、Cr bas_χ Representing the number of X values of three components of Y, cb and Cr in the prior image.
10. The method for detecting a scene-adaptive three-dimensional object in a radar sensing fusion according to claim 8, wherein in the step 3.1, the scene scale factor β is calculated as follows:
firstly, carrying out semantic segmentation on input image data by using a U-Net++ network to obtain the category of each pixel point in the input image data;
then, the point cloud data are projected onto a segmentation result diagram output by a U-Net++ network, and the category of the projection point is judged by adopting a K nearest neighbor method;
finally, calculating a scene scale factor according to the number of projection points of each category, wherein the calculation formula is as follows:
wherein beta is max The number of proxels, beta, owned by the category that owns the most proxels min The number of proxels owned by the category that owns the least proxel.
CN202310437055.8A 2023-04-21 2023-04-21 Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion Pending CN116469095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310437055.8A CN116469095A (en) 2023-04-21 2023-04-21 Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310437055.8A CN116469095A (en) 2023-04-21 2023-04-21 Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion

Publications (1)

Publication Number Publication Date
CN116469095A true CN116469095A (en) 2023-07-21

Family

ID=87173131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310437055.8A Pending CN116469095A (en) 2023-04-21 2023-04-21 Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion

Country Status (1)

Country Link
CN (1) CN116469095A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117388831A (en) * 2023-12-13 2024-01-12 中科视语(北京)科技有限公司 Camera and laser radar combined calibration method and device, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117388831A (en) * 2023-12-13 2024-01-12 中科视语(北京)科技有限公司 Camera and laser radar combined calibration method and device, electronic equipment and medium
CN117388831B (en) * 2023-12-13 2024-03-15 中科视语(北京)科技有限公司 Camera and laser radar combined calibration method and device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN110197505B (en) Remote sensing image binocular stereo matching method based on depth network and semantic information
CN108171249B (en) RGBD data-based local descriptor learning method
CN110796691B (en) Heterogeneous image registration method based on shape context and HOG characteristics
CN113610905B (en) Deep learning remote sensing image registration method based on sub-image matching and application
CN111932452B (en) Infrared image convolution neural network super-resolution method based on visible image enhancement
CN116469095A (en) Method for detecting self-adaptive three-dimensional target of space-time scene by using radar sensing fusion
CN116258817A (en) Automatic driving digital twin scene construction method and system based on multi-view three-dimensional reconstruction
CN112288758B (en) Infrared and visible light image registration method for power equipment
CN112365511A (en) Point cloud segmentation method based on overlapped region retrieval and alignment
CN111008642A (en) High-resolution remote sensing image classification method and system based on convolutional neural network
CN116664892A (en) Multi-temporal remote sensing image registration method based on cross attention and deformable convolution
CN113095371A (en) Feature point matching method and system for three-dimensional reconstruction
CN110288050B (en) Hyperspectral and LiDar image automatic registration method based on clustering and optical flow method
CN111127353B (en) High-dynamic image ghost-removing method based on block registration and matching
CN110570402B (en) Binocular salient object detection method based on boundary perception neural network
CN116052025A (en) Unmanned aerial vehicle video image small target tracking method based on twin network
CN114445442B (en) Multispectral image semantic segmentation method based on asymmetric cross fusion
CN117252928B (en) Visual image positioning system for modular intelligent assembly of electronic products
CN112926648B (en) Method and device for detecting abnormality of tobacco leaf tip in tobacco leaf baking process
CN114663298A (en) Disparity map repairing method and system based on semi-supervised deep learning
CN114005046A (en) Remote sensing scene classification method based on Gabor filter and covariance pooling
CN112509014B (en) Robust interpolation light stream computing method matched with pyramid shielding detection block
CN112329662B (en) Multi-view saliency estimation method based on unsupervised learning
CN107358625B (en) SAR image change detection method based on SPP Net and region-of-interest detection
CN113838088A (en) Hyperspectral video target tracking method based on depth tensor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination