CN111462237B - Target distance detection method for constructing four-channel virtual image by using multi-source information - Google Patents

Target distance detection method for constructing four-channel virtual image by using multi-source information Download PDF

Info

Publication number
CN111462237B
CN111462237B CN202010258411.6A CN202010258411A CN111462237B CN 111462237 B CN111462237 B CN 111462237B CN 202010258411 A CN202010258411 A CN 202010258411A CN 111462237 B CN111462237 B CN 111462237B
Authority
CN
China
Prior art keywords
target
radar
image
information
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010258411.6A
Other languages
Chinese (zh)
Other versions
CN111462237A (en
Inventor
杨殿阁
周韬华
江昆
于春磊
杨蒙蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010258411.6A priority Critical patent/CN111462237B/en
Publication of CN111462237A publication Critical patent/CN111462237A/en
Application granted granted Critical
Publication of CN111462237B publication Critical patent/CN111462237B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention relates to a target distance detection method for constructing a four-channel virtual image by utilizing multi-source information, which comprises the following steps: acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the size of the target and the reflection center position of the target; according to the reflection center position of a target under a radar plane and the target center pixel position in an image acquired by a monocular camera, searching the spatial conversion relation of two sensors by a combined calibration method, and simultaneously combining time synchronization to realize the association of asynchronous heterogeneous multi-source information; constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data; and building a convolutional neural network according to the virtual four-channel picture to realize target detection. The invention can improve the distance prediction capability of target detection, realize the lightweight network structure, save the computing resources and improve the spatial information prediction precision and speed of the existing visual 3D target detection algorithm.

Description

Target distance detection method for constructing four-channel virtual image by using multi-source information
Technical Field
The invention relates to the field of environment perception of intelligent automobiles, in particular to a target distance detection method for constructing a four-channel virtual image by utilizing multi-source information.
Background
In an intelligent automobile system, it is very critical to realize accurate, reliable and robust environmental perception. The image information contains rich semantic features, and with the development of artificial intelligence and deep learning, the target detection algorithm based on the vision realized by the convolutional neural network is increasingly mature, and becomes the current popular research. However, the monocular vision cannot directly acquire the target distance information and the convolutional neural network is more suitable for classification tasks, so that the target distance detection based on the vision still needs to be improved: the existing 3D visual target detection algorithm generally cannot meet the requirement of distance detection precision in a driving task, is inaccurate in distance precision and cannot be competent for a task of target tracking or driving decision; the network performance can be improved by increasing the network depth, such as ResNet, or increasing the network width, such as the inclusion method, but at the same time, the network structure is complex, the occupied computing resource is large, and the network is difficult to get on the vehicle for use.
The millimeter wave radar can accurately measure the distance and speed information of the target, has the capacity of all-weather work, can provide a new data source for a target detection task, makes up the visual deficiency, and realizes the distance information detection of the target. Therefore, research for improving the environmental perception performance by using multi-source fusion is also gradually focused. The existing multisource information fusion detection method mainly comprises two types: the pre-fusion algorithm provides a region of interest (ROI) position for vision by using target position information detected by a millimeter wave radar, and then visual target identification and classification are carried out; and the post-fusion is to process the radar and the image information respectively and correlate and fuse the target layer information obtained respectively. The former is interfered by clutter detected by the radar, and if the radar has missing detection, the target detection rate is influenced; the latter algorithm is time-consuming, and meanwhile, perception information provided by the millimeter wave radar cannot be fully utilized to achieve more effective fusion.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a target distance detection method for constructing a four-channel virtual image by using multi-source information, which can improve the distance prediction capability of target detection, further realize lightweight network structure, save computing resources, and improve the spatial information prediction accuracy and speed of the existing visual 3D target detection algorithm.
In order to achieve the purpose, the invention adopts the following technical scheme: a target distance detection method for constructing a four-channel virtual image by using multi-source information comprises the following steps: 1) acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the target size and the target reflection center position; 2) according to the reflection center position of a target under a radar plane and the target center pixel position in an image acquired by a monocular camera, searching the spatial conversion relation of two sensors by a combined calibration method, and simultaneously combining time synchronization to realize the association of asynchronous heterogeneous multi-source information; 3) constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data; 4) and (3) building a convolutional neural network according to the virtual four-channel picture to realize target detection: and (3) constructing an end-to-end target detection convolutional neural network to realize distance fusion, so that the training of a four-channel virtual picture generated by using radar original point cloud information and an RGB image can be realized, and the category, the image boundary frame and the distance information of the target can be predicted.
Further, in the step 1), in a traffic scene, the targets are all considered to be rigid bodies, and the original point information belonging to the same target is determined through a clustering algorithm according to the similarity between the position provided by the original point and the Doppler velocity; meanwhile, outliers are eliminated by using RANSAC algorithm, and then the size (w, h) of the target and the reflection center position of the target are obtained
Figure BDA0002438338130000021
Further, the step 2) specifically comprises the following steps: 2.1) determining the target reflection center position of a stator under a radar coordinate system
Figure BDA0002438338130000022
And the target center pixel position (u) in the image 0 ,v 0 ) The conversion relationship between them; 2.2) time synchronizing asynchronous information: taking the acquisition time of the camera as a reference, recording the time stamp t by an extrapolation method every time the sensing data of the camera is updated cam Finding the radar data closest to the moment with time stamp t radar Recording time difference Δ t ═ t cam -t radar The position information Δ x (Δ t) is updated to obtain the multi-sensor sensing data synchronized in time, considering that the speed of the object is unchanged in the short time.
Further, in the step 2.1), the measurement data (x) of the radar under the same target is directly searched by using the joint calibration r ,y r ,z r ) And the measurement data (u, v) of the image, the spatial transformation relationship of the two being:
Figure BDA0002438338130000023
in the formula, ω represents a proportional constant; p represents a 3 × 4 projection matrix; a represents an internal reference matrix of the camera; r represents a rotation relation in the external reference calibration; t represents the translation relationship in the external reference calibration.
Further, the joint calibration process comprises: detecting a target by the millimeter wave radar at the same time, recording data, and recording the position of the target in the image by shooting; then, obtaining the position of the target reflection center through the clustering algorithm in the step 1), and correspondingly finding the position in the image; this process is repeated to obtain multiple sets of data.
Further, in the step 3), a single-channel virtual picture with the same resolution as the RGB image is generated according to target feature information that can be reflected by the radar through a corresponding rule, and then is associated with the RGB color image of the camera to form a 4-channel virtual picture, which is used as input data for target distance detection network training.
Further, the corresponding rule is: (1) determining the center of the region of interest: the target reflection center position detected by the radar is positioned according to the calibration parameters found in the step 2)
Figure BDA0002438338130000024
Projecting onto an image; the position (u) of the central pixel point of the interested area of the radar detection target on the image is determined by the space conversion relation 0 ,v 0 ) (ii) a (2) Determining the size and pixel value of a target area in a radar single-channel virtual picture: determining a target area filling pixel value and an area size of a single-channel virtual picture by adopting a two-dimensional Gaussian model, wherein the mean value of Gaussian distribution is the pixel position (u) corresponding to the reflection center determined in the step (1) 0 ,v 0 ) (ii) a According to the 3 σ principle, pixels are considered to be zero outside 3 σ, so the variance (σ) of the Gaussian distribution 1 22 2 ) Reflecting the size of the target area in two dimensions of length and width on the image, and expressing the function of the variance and the relative distance r between the target and the sensor and the target size (w, h) estimated by the radar in the step 1) by using a function g(σ 1 22 2 ) G (w, h, r); meanwhile, in a traffic scene, the attention degree of the detection accuracy of a close-range moving target is reflected by filling pixel values; furthermore, because the radar can provide confidence σ that the reflection point is a target, the above factor is reflected by a scaling factor k, k ═ f (r, v) rel σ), k will affect the pixel values of the target fill area on the virtual image.
Further, in the step 4), in order to implement deep fusion, the neural network structure modification includes the following aspects: (1) modifying a training data reading function of the selected algorithm, and receiving data reading of the 4-channel image; (2) modifying the convolution layer convolution kernel and extracting the characteristics with higher dimensionality; (3) and modifying the reading mode of the marking information: adding a true labeling value of a relative distance beyond a true labeling value provided during image target detection training; (4) adding a distance prediction function: a penalty function for the distance prediction is added.
And further, the method also comprises a step of evaluating and optimizing the convolutional neural network, when the convolutional neural network shows loss function convergence on a training set, and after training is mature, a verification set which is distributed with the training set data is constructed for verification and the effect of the convolutional neural network is evaluated.
Further, the logic for quantitatively evaluating the network effect is as follows: judging that the prediction frame is a positive sample under the condition that the overlap rate of the prediction result and the truth value is greater than a threshold iou; considering the model to predict the target under the condition that the prediction box score is greater than the threshold score; evaluating the distance intervals according to different evaluation indexes, wherein the evaluation indexes comprise accuracy Precision TP/(TP + FP), Recall TP/(TP + FN), absolute errors of the predicted distance and errors of relative truth values; judging whether the training parameters need to be adjusted to improve the model according to the result; TP: actually, the number of positive examples is predicted by the model; FP: actually negative examples, the model predicts the number of positive examples; FN: in fact, positive cases, the model predicts the number of negative cases.
Due to the adoption of the technical scheme, the invention has the following advantages: 1. the method fully utilizes the information characteristics of the original point cloud provided by the radar to construct the virtual picture expressing the target information, and does not cause the loss of the original information. 2. The invention realizes the spatial synchronization of the multi-sensor information by using a low-cost millimeter wave radar and image combined calibration method, and has simple operation and high precision. 3. The method and the device directly acquire the spatial information of the target by fusing the radar and the image information through the virtual four-channel picture structure. 4. The invention adopts the end-to-end neural network to output the target detection information, can be directly used for the driving decision of the vehicle, reduces the intermediate processing links, and improves the accuracy, comprehensiveness and robustness of target identification.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of spatial joint calibration of millimeter wave radar and camera data for use in the present invention;
FIG. 3 is a diagram of a process for constructing a four-channel virtual picture according to the present invention;
fig. 4 is an example of the prediction result of the fusion model proposed by the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
As shown in fig. 1, the invention provides a target distance detection method for constructing a four-channel virtual image by using multi-source information, which performs information fusion by using a vehicle-mounted sensor millimeter wave radar and a monocular camera, and realizes traffic target detection containing distance information by using an end-to-end convolutional neural network. The invention specifically comprises the following steps:
1) acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the target size and the target reflection center position;
the radar original point information comprises a radial distance r, an angle theta and a Doppler relative velocity v rel And a reflection intensity γ;
under the traffic scene, the targets are all considered to be rigid bodies, so that the similarity between the position provided by the original point and the Doppler velocity is carried out through a clustering algorithm (such as K) means Clustering) to determine original point information belonging to the same target. At the same time, in order to try bestReducing the influence of the close vehicle or other static targets on the clustering result, eliminating abnormal points by using RANSAC algorithm, and further obtaining the target size (w, h) and the reflection center position of the target
Figure BDA0002438338130000041
2) According to the position of the reflection center of the target under the radar plane
Figure BDA0002438338130000042
The spatial conversion relation between the target central pixel position (u, v) and the target central pixel position in the image acquired by the monocular camera is searched by a joint calibration method, and the association of asynchronous heterogeneous multi-source information is realized by combining time synchronization;
the premise of realizing multi-source information fusion is to realize space-time synchronization of perception information under different acquisition frequencies and observation coordinate systems.
2.1) determining the target reflection center position of a stator under a radar coordinate system
Figure BDA0002438338130000043
And the target center pixel position (u, v) in the image.
Joint calibration involves the interconversion of the following coordinate systems: millimeter wave radar coordinate system (x) r ,y r ,z r ) Camera coordinate system (x) c ,y c ,z c ) An imaging plane coordinate system (x, y) and an image pixel coordinate system (u, v); wherein z is r Is a fixed value set in advance. The sequential calibration needs to calibrate the internal and external parameters of the camera first and then calibrate the external parameters of the millimeter wave radar, so as to determine the parameters of each conversion matrix. The calibration process is complicated, the requirement on the calibration precision is high, and the calibration precision is easily influenced by accumulated errors.
In this embodiment, a joint calibration method is used to directly search the observation data (x) of the same target under radar r ,y r ,z r ) And the measurement data (u, v) of the image, the spatial transformation relationship of the two being:
Figure BDA0002438338130000051
in the formula, ω represents a proportional constant; p represents a 3 × 4 projection matrix, including internal reference and external reference calibration; a represents an internal reference matrix of the camera; r represents a rotational relationship in the external reference calibration; t represents the translation relation in external reference calibration;
the purpose of the combined calibration is to solve the values of omega and 3 x 4 projection matrix P by collecting data and utilizing an SVD decomposition method. The experimental procedure for the combined calibration is shown in FIG. 2. And detecting the target by the millimeter wave radar at the same time, recording data, and recording the position of the target in the image by shooting. Then, the position of the target reflection center is obtained through the clustering algorithm in the step 1
Figure BDA0002438338130000052
Then correspondingly finding the target center position (u) in the image 0 ,v 0 ). Repeating this process can result in multiple sets of data.
Meanwhile, since the detection range of the millimeter wave radar is a sector plane with a fixed height and the ground is taken as the zero point of the z axis, the z axis is considered to be r 0, the height of the reflection center of the calibration object (angular reflection/rod-shaped reflection object) is a fixed value
Figure BDA0002438338130000053
Can obtain the same
Figure BDA0002438338130000054
Multiple sets of data under value
Figure BDA0002438338130000055
And the calibration precision is further improved by using geometric constraint. The method has the advantages of low cost and simple operation, and can obtain calibration parameters with higher precision.
2.2) because the data acquisition frequencies of the millimeter wave radar and the camera are not equal, time synchronization is needed to be carried out on asynchronous information: because the acquisition frequency of the millimeter wave radar is a fixed value, the millimeter wave radar takes picturesThe image head has an uncertain data frame dropping acquisition frequency, and the acquisition time of the camera is taken as a reference. By extrapolation, the timestamp t is recorded each time the camera's perception data is updated cam Finding the radar data closest to the moment with time stamp t radar Recording time difference Δ t ═ t cam -t radar . The recording time difference is usually less than 5ms, so that the position information Δ x (Δ t) is updated by considering that the speed of the object is unchanged in the short time, thereby obtaining the multi-sensor perception data synchronized in time.
The correlation of asynchronous heterogeneous multi-source information is realized based on the space-time synchronization method.
3) Constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data;
generating a single-channel virtual picture with the same resolution as the RGB image according to target characteristic information which can be reflected by the radar through a corresponding rule, and then associating the single-channel virtual picture with the RGB color image of the camera to form a 4-channel virtual picture (namely the 4-channel virtual picture is a single-channel virtual picture which is manufactured by the radar data in time-space synchronization with the picture in the step 2 and the first 3 channels are the RGB color image, and the 4 th channel is the single-channel virtual picture) as input data of target distance detection network training. The novel structure is beneficial to extracting features of the convolutional neural network for feature learning, and makes full use of various types of data provided by the radar on the basis of the ROI (region of interest) extraction algorithm extracted by the radar, including the size (w, h) of the target and the position of the central point of the target, which are obtained by inference in the step 1)
Figure BDA0002438338130000061
Target relative velocity v rel Reflection intensity γ, confidence σ.
The corresponding rules for making a virtual picture are:
(1) determining the center of area of interest
The target reflection center position detected by the radar is positioned according to the calibration parameters found in the step 2)
Figure BDA0002438338130000062
Projecting onto an image; switching from space to spaceThe method determines the position (u) of the central pixel point of the Region of Interest (ROI) of a radar detection target on an image 0 ,v 0 )。
(2) Determining size and pixel value of target area in radar single-channel virtual picture
And determining a target area filling pixel value and an area size of the single-channel virtual picture by adopting a two-dimensional Gaussian model. The size of the target reflected on the image is related to the relative distance between the target and the own vehicle and the size of the target itself. The mean value of Gaussian distribution is the pixel position (u) corresponding to the reflection center determined in the step (1) 0 ,v 0 ) (ii) a The variance of the gaussian distribution is a very important parameter in the gaussian distribution, reflects the shape characteristics of the gaussian distribution, and according to the 3 σ principle, the pixels outside 3 σ are considered to be zero. Thus the variance (σ) of the Gaussian distribution 1 22 2 ) Reflecting the size of a target area in two dimensions of length and width on an image, and expressing the variance and the relative distance r between a target and a sensor by a function g and expressing the function (sigma) of the target size (w, h) estimated by radar in the clustering algorithm of step 1) 1 22 2 )=g(w,h,r)。
Meanwhile, in a traffic scene, due to the importance of driving safety, the detection accuracy of a close-distance moving target is more concerned, so that the filling pixel value can reflect the concerned degree. Furthermore, since radar can provide confidence σ whether a reflection point is a target, it will also be one of the considerations of pixel fill value, which is reflected by a scaling factor k, k ═ f (r, v) rel σ), k will affect the pixel values of the target fill area on the virtual picture.
Combining the above principles, the default ρ is 0, and the pixel value G (u, v) filled in at an arbitrary pixel position (u, v) of the virtual picture conforms to the following gaussian distribution:
Figure BDA0002438338130000063
12 ]=[u 0 ,v 0 ],[σ 1 22 2 ]=g(w,h,r),k=f(r,v rel ,σ)
wherein (μ) 12 ) The mean value in a two-dimensional Gaussian distribution model, the physical meaning is the position of the center of reflection of a radar target
Figure BDA0002438338130000064
Position (u) of target center pixel point obtained by projecting to image 0 ,v 0 );
Figure BDA0002438338130000065
For variance in a two-dimensional gaussian distribution model, the physical meaning reflects the dimensional relationship of the target on the image, which is expressed by a function g, in relation to the actual size (w, h) of the target in both the length and width dimensions and the target-to-sensor relative distance r, according to the previous analysis: (sigma) 1 22 2 ) G (w, h, r); k is a scale factor of the model, the physical meaning influences the size of the filling pixel value, and according to the corresponding rule, the confidence coefficient sigma of the target provided by the radar, the relative distance r between the target and the sensor and the relative speed v rel Correlation, this relationship is represented by the function f: k ═ f (r, v) rel ,σ)。
Because the input data characteristic of the convolutional neural network extracted characteristic is image information, a single-channel picture capable of reflecting target position information and size information is manufactured by utilizing millimeter wave radar information, and is stacked with an RGB three-channel picture to form an RGB-D type 4-channel picture, and the 4-channel picture is sent to network learning to predict a target and distance information thereof. The final implementation effect is shown in fig. 3, and then four-channel picture data is used as input data for model training.
Meanwhile, since depth fusion is expected to provide prediction of the position, the target category and the spatial distance of the target on the image, the category information of the target, the relative position information in the image (stability of model training is facilitated by using the relative position information) and the distance information detected by the radar can be provided when the truth annotation text is made.
4) Building a convolutional neural network according to the virtual four-channel picture to realize target detection;
and (3) constructing an end-to-end target detection convolution neural network to realize deep fusion, so that the training of a four-channel virtual picture generated by utilizing radar original point cloud information and an RGB image can be realized, and the category, the image boundary frame and the distance information of the target can be predicted.
To achieve deep fusion, the neural network architecture modification includes the following aspects:
(1) modifying a training data reading function of the selected algorithm, and receiving data reading of the 4-channel image;
(2) modifying the convolutional layer convolution kernel, and extracting the characteristics of higher dimensionality: since the number of channels of a convolution kernel must be the same as the input of the convolution kernel (or called convolution filter) for the size of the convolution kernel, changing the input to a four-channel training picture increases the number of channels of the corresponding convolution kernel to 4.
(3) And modifying the reading mode of the labeling information: adding a true labeling value of a relative distance beyond a true labeling value provided during image target detection training;
(4) adding a distance prediction function: adding a loss function for distance prediction, preferably a squared loss function
Figure BDA0002438338130000071
Is a scale parameter, d is a model predicted distance value,
Figure BDA0002438338130000072
is the true value of the distance.
In a preferred embodiment, the YOLOv2 target detection algorithm is employed to achieve the above 4 aspects. The Darknet53 network corresponding to the YOLOv2 algorithm is a typical end-to-end convolution neural network for realizing a visual target detection task, can directly input images and output target detection results, and is high in algorithm speed and relatively high in accuracy. In addition, the deep learning architecture Darknet for realizing the YOLO algorithm is realized based on the C language, the bottom layer code of the deep learning architecture Darknet is easy to modify and expand, and the execution speed on the GPU is high. In the present embodiment, it is modified, and the number of pictures per training batch is set to 16, the initial learning rate is set to 0.001, the weight attenuation regular term is set to 0.0005, all the images used for training are unified to 608 × 608 resolution, the number of times of training is set to 50 ten thousand, and the model is stopped if it converges in advance. In the training process, attention is paid to the convergence process of the loss function, effect verification is carried out on the obtained model every 10000 times of training, and whether the model is trained completely is judged.
5) Convolutional neural network evaluation and optimization
When the convolutional neural network shows loss function convergence on a training set, after training is mature, establishing a verification set which is distributed with the training set data in the same way for verification and evaluating the effect of the convolutional neural network; adjustment of the model parameters is facilitated.
The logic for quantitative evaluation of network effects is as follows: judging that the prediction frame is a positive sample under the condition that the overlap rate of the prediction result and the truth value is greater than a threshold iou; considering the model to predict the target under the condition that the prediction box score is greater than the threshold score; evaluating the distance intervals according to different evaluation indexes, wherein the evaluation indexes comprise accuracy Precision TP/(TP + FP), Recall TP/(TP + FN), absolute errors of the predicted distance and errors of relative truth values; and judging whether the training parameters need to be adjusted to improve the model according to the result. Wherein:
TP (true Positive): actually, the number of positive examples is predicted by the model;
FP (false positives): actually negative examples, the model predicts the number of positive examples;
fn (false negative): actually positive examples, the model predicts the number of negative examples;
the method for improving the model according to the prediction result comprises the following steps:
when the accuracy precision of the model on the training set and the verification set is lower than a preset threshold value, the model is considered to be under-fitted and needs to be trained continuously;
when the accuracy precision of the model on the training set is higher than a preset threshold value and the accuracy on the verification set is lower than the preset threshold value, the model is considered to be over-fitted, the number of training rounds needs to be reduced, and the data volume of the training set is increased;
meanwhile, according to the evaluation results of different distance intervals, corresponding data amount is increased for the distance intervals with poor evaluation effect.
Adjusting training parameters: the learning rate and the training batch are continuously adjusted to obtain the best evaluation result.
In the training of the fusion network, the network effect is generated every 10000 times of training verification until the network training is mature; optimizing the network, increasing noise: considering the phenomena of false detection and missing detection of the millimeter wave radar, and adding random noise to input data; and modifying the projection relation: the model effect is related to the projection relation of the target frame, and the model effect is improved continuously, so that the prediction effect of the model is improved.
In addition, target information and speed information provided by the radar can be subjected to re-matching through data re-association, and a final target detection result of fusion perception is obtained. The finally realized model prediction effect is as shown in fig. 4, and can not only detect the position and the type of the target on the image, but also provide the prediction of the relative distance between the target and the sensor.
In conclusion, through training and verification, a mature model capable of detecting the target distance of the virtual four-channel picture is obtained. The method can better predict the category and distance information of the front target, and realizes the utilization of multi-source information.
In the example, a multi-sensor data acquisition system is carried on the experiment vehicle to synchronize multi-source information and generate a virtual four-channel picture. By means of deep fusion, the obtained target detection result can be applied to a simple algorithm for early warning of front vehicle collision and pedestrian avoidance, and driving decision assistance is achieved.
The method is different from the traditional fusion algorithm, the original point cloud information of the millimeter wave radar is fully utilized, target extraction is carried out through clustering and RANSAC algorithm, and the reflection center position and size of the target are obtained; the time-space synchronization of multi-source information is realized by utilizing a low-cost millimeter wave radar and image combined calibration method; the novel data structure for the fusion of the original point information and the visual information of the millimeter wave radar is provided, Gaussian distribution is utilized, radar point cloud information is combined, a single-channel virtual picture reflecting target position and distance information is generated, the single-channel virtual picture is associated with the visual information to construct an RGB-D four-channel virtual picture, the single-channel virtual picture is synchronously input into a convolutional neural network to carry out deep learning, the convolutional neural network is adjusted, and target detection with spatial information is achieved. Since the network input information contains richer spatial information about the target, the distance prediction capability of target detection can be further improved. Meanwhile, deep fusion is realized by utilizing an end-to-end neural network, the lightweight of a network structure can be further realized, the computing resources are saved, and the spatial information prediction precision and speed of the existing visual 3D target detection algorithm are improved.
The above embodiments are only for illustrating the present invention, and the steps may be changed, and on the basis of the technical solution of the present invention, the modification and equivalent changes of the individual steps according to the principle of the present invention should not be excluded from the protection scope of the present invention.

Claims (8)

1. A target distance detection method for constructing a four-channel virtual image by using multi-source information is characterized by comprising the following steps:
1) acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the target size and the target reflection center position;
2) according to the reflection center position of a target under a radar plane and the target center pixel position in an image acquired by a monocular camera, searching the spatial conversion relation of two sensors by a combined calibration method, and simultaneously combining time synchronization to realize the association of asynchronous heterogeneous multi-source information;
3) constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data;
4) and (3) building a convolutional neural network according to the virtual four-channel picture to realize target detection: constructing an end-to-end target detection convolutional neural network to realize distance fusion, so that the training of a four-channel virtual picture generated by using radar original point cloud information and an RGB image can be realized, and the category, the image boundary frame and the distance information of a target are predicted;
in the step 3), a single-channel virtual picture with the same resolution as the RGB image is generated according to target characteristic information which can be reflected by the radar through a corresponding rule, and then is associated with the RGB color image of the camera to form a 4-channel virtual picture which is used as input data of target distance detection network training;
the corresponding rule is:
(1) determining the center of the region of interest: the target reflection center position detected by the radar is positioned according to the calibration parameters found in the step 2)
Figure FDA0003798362030000011
Projecting onto an image; the position (u) of the central pixel point of the interested area of the radar detection target on the image is determined by the space conversion relation 0 ,v 0 ) (ii) a Wherein the content of the first and second substances,
Figure FDA0003798362030000012
is the reflection center position of the target;
(2) determining the size and pixel value of a target area in a radar single-channel virtual picture: determining a target area filling pixel value and an area size of a single-channel virtual picture by adopting a two-dimensional Gaussian model, wherein the mean value of Gaussian distribution is the pixel position (u) corresponding to the reflection center determined in the step (1) 0 ,v 0 ) (ii) a According to the 3 σ principle, pixels are considered to be zero outside 3 σ, so the variance (σ) of the Gaussian distribution 1 22 2 ) Reflecting the size of the target area in two dimensions of length and width on the image, and expressing the function (sigma) of the variance and the relative distance r between the target and the sensor and the target size (w, h) estimated by the radar in the step 1) by a function g 1 22 2 )=g(w,h,r);
Meanwhile, in a traffic scene, the attention degree of the detection accuracy of a close-range moving target is reflected by filling pixel values; furthermore, because the radar can provide confidence σ that the reflection point is a target, the above factor is reflected by a scaling factor k, k ═ f (r, v) rel σ), k will affect the virtualizationPixel values of a target filling area on the picture;
according to the above principle, the default ρ is 0, and the pixel value G (u, v) filled in at an arbitrary pixel position (u, v) of the virtual picture conforms to the following gaussian distribution:
Figure FDA0003798362030000021
12 ]=[u 0 ,v 0 ],[σ 1 22 2 ]=g(w,h,r),k=f(r,v rel ,σ)
wherein (μ) 12 ) The mean value in a two-dimensional Gaussian distribution model, the physical meaning is the position of the center of reflection of a radar target
Figure FDA0003798362030000022
Position (u) of target center pixel point obtained by projecting to image 0 ,v 0 );(σ 1 22 2 ) For variance in a two-dimensional gaussian distribution model, the physical meaning reflects the dimensional relationship of the object on the image, related to the actual size (w, h) of the object in both the length and width dimensions and the relative distance r of the object from the sensor, this relationship is represented by a function g: (sigma) 1 22 2 ) G (w, h, r); k is a scale factor of the model, the physical meaning influences the size of the filling pixel value, and according to the corresponding rule, the confidence coefficient sigma of the target provided by the radar, the relative distance r between the target and the sensor and the relative speed v rel Correlation, this relationship is represented by the function f: k ═ f (r, v) rel ,σ),v rel Is the target relative velocity.
2. The object distance detection method according to claim 1, characterized in that: in the step 1), in a traffic scene, the targets are all considered to be rigid bodies, and the original point information belonging to the same target is determined through a clustering algorithm according to the similarity between the position provided by the original point and the Doppler velocity; at the same time, using RANSThe AC algorithm eliminates abnormal points, and then obtains the target size (w, h) and the reflection center position of the target
Figure FDA0003798362030000023
3. The object distance detection method according to claim 1, characterized in that: the step 2) specifically comprises the following steps:
2.1) determining the target reflection center position of a fixed object under a radar coordinate system
Figure FDA0003798362030000024
And the target center pixel position (u) in the image 0 ,v 0 ) The conversion relationship between them;
2.2) time synchronizing asynchronous information: taking the acquisition time of the camera as a reference, recording the time stamp t by an extrapolation method every time the sensing data of the camera is updated cam Finding the radar data closest to the moment with time stamp t radar Recording time difference Δ t ═ t cam -t radar The position information Δ x (Δ t) is updated to obtain the multi-sensor sensing data synchronized in time, considering that the speed of the object is unchanged in the short time.
4. The object distance detection method according to claim 3, characterized in that: in the step 2.1), the measurement data (x) of the radar under the same target is directly searched by using the combined calibration r ,y r ,z r ) And the measurement data (u, v) of the image, the spatial transformation relationship of the two being:
Figure FDA0003798362030000025
P=A[R|t]
in the formula, ω represents a proportional constant; p represents a 3 × 4 projection matrix; a represents an internal reference matrix of the camera; r represents a rotational relationship in the external reference calibration; t represents the translation relationship in the external reference calibration.
5. The object distance detecting method according to claim 4, characterized in that: the process of the combined calibration comprises the following steps: detecting a target by the millimeter wave radar at the same time, recording data, and recording the position of the target in the image by shooting; then, obtaining the position of the target reflection center through the clustering algorithm in the step 1), and correspondingly finding the position in the image; this process is repeated to obtain multiple sets of data.
6. The object distance detection method according to claim 1, characterized in that: in the step 4), in order to implement deep fusion, the modification of the neural network structure includes the following aspects:
(1) modifying a training data reading function of the selected algorithm, and receiving data reading of the 4-channel image;
(2) modifying the convolution layer convolution kernel and extracting the characteristics with higher dimensionality;
(3) and modifying the reading mode of the labeling information: adding a true labeling value of a relative distance beyond a true labeling value provided during image target detection training;
(4) adding a distance prediction function: a loss function for distance prediction is added.
7. The object distance detection method according to claim 1, characterized in that: and when the convolutional neural network shows loss function convergence on a training set, constructing a verification set which is distributed with the training set data in the same way for verification and evaluating the effect of the convolutional neural network after the training is mature.
8. The object distance detecting method according to claim 7, characterized in that: the logic for quantitative evaluation of network effects is as follows: judging that the prediction frame is a positive sample under the condition that the overlapping rate of the prediction result and the truth value is greater than the threshold iou; considering the model to predict the target under the condition that the prediction box score is greater than the threshold score; evaluating the distance intervals according to different evaluation indexes, wherein the evaluation indexes comprise accuracy Precision TP/(TP + FP), Recall TP/(TP + FN), absolute errors of the predicted distance and errors of relative truth values; judging whether the training parameters need to be adjusted to improve the model according to the result; TP: actually, the number of positive examples is predicted by the model; FP: actually negative cases, the model predicts the number of positive cases; FN: in fact, positive cases, the model predicts the number of negative cases.
CN202010258411.6A 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information Active CN111462237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010258411.6A CN111462237B (en) 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010258411.6A CN111462237B (en) 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information

Publications (2)

Publication Number Publication Date
CN111462237A CN111462237A (en) 2020-07-28
CN111462237B true CN111462237B (en) 2022-09-20

Family

ID=71685888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010258411.6A Active CN111462237B (en) 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information

Country Status (1)

Country Link
CN (1) CN111462237B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111921199B (en) * 2020-08-25 2023-09-26 腾讯科技(深圳)有限公司 Method, device, terminal and storage medium for detecting state of virtual object
CN112184589B (en) * 2020-09-30 2021-10-08 清华大学 Point cloud intensity completion method and system based on semantic segmentation
CN112198503A (en) * 2020-10-16 2021-01-08 无锡威孚高科技集团股份有限公司 Target track prediction optimization method and device and radar system
CN112505684B (en) * 2020-11-17 2023-12-01 东南大学 Multi-target tracking method for radar vision fusion under side view angle of severe environment road
CN112528763B (en) * 2020-11-24 2024-06-21 浙江华锐捷技术有限公司 Target detection method, electronic equipment and computer storage medium
CN112766302B (en) * 2020-12-17 2024-03-29 浙江大华技术股份有限公司 Image fusion method and device, storage medium and electronic device
CN115131423A (en) * 2021-03-17 2022-09-30 航天科工深圳(集团)有限公司 Distance measuring method and device integrating millimeter wave radar and vision
CN113095154A (en) * 2021-03-19 2021-07-09 西安交通大学 Three-dimensional target detection system and method based on millimeter wave radar and monocular camera
CN115131594A (en) * 2021-03-26 2022-09-30 航天科工深圳(集团)有限公司 Millimeter wave radar data point classification method and device based on ensemble learning
CN113222111A (en) * 2021-04-01 2021-08-06 上海智能网联汽车技术中心有限公司 Automatic driving 4D perception method, system and medium suitable for all-weather environment
CN113221957B (en) * 2021-04-17 2024-04-16 南京航空航天大学 Method for enhancing radar information fusion characteristics based on center
CN113808219B (en) * 2021-09-17 2024-05-14 西安电子科技大学 Deep learning-based radar auxiliary camera calibration method
CN115701818B (en) * 2023-01-04 2023-05-09 江苏汉邦智能系统集成有限公司 Intelligent garbage classification control system based on artificial intelligence
CN115932702B (en) * 2023-03-14 2023-05-26 武汉格蓝若智能技术股份有限公司 Virtual standard based voltage transformer online operation calibration method and device
CN117930381A (en) * 2024-03-25 2024-04-26 海南中南标质量科学研究院有限公司 Port non-radiation perspective wave pass inspection system based on big data of Internet of things

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
ITRM20130426A1 (en) * 2013-07-19 2015-01-20 Consiglio Nazionale Ricerche METHOD FOR FILTERING INTERFEROMETRIC DATA ACQUIRED BY SYNTHETIC OPENING RADAR (SAR).
CN103559791B (en) * 2013-10-31 2015-11-18 北京联合大学 A kind of vehicle checking method merging radar and ccd video camera signal
US10602242B2 (en) * 2017-06-14 2020-03-24 GM Global Technology Operations LLC Apparatus, method and system for multi-mode fusion processing of data of multiple different formats sensed from heterogeneous devices
CN110378196B (en) * 2019-05-29 2022-08-02 电子科技大学 Road visual detection method combining laser point cloud data
CN110390697B (en) * 2019-07-11 2021-11-05 浙江大学 Millimeter wave radar and camera combined calibration method based on LM algorithm
CN110674733A (en) * 2019-09-23 2020-01-10 厦门金龙联合汽车工业有限公司 Multi-target detection and identification method and driving assistance method and system

Also Published As

Publication number Publication date
CN111462237A (en) 2020-07-28

Similar Documents

Publication Publication Date Title
CN111462237B (en) Target distance detection method for constructing four-channel virtual image by using multi-source information
Cao et al. An improved faster R-CNN for small object detection
CN110675418B (en) Target track optimization method based on DS evidence theory
CN112149550B (en) Automatic driving vehicle 3D target detection method based on multi-sensor fusion
CN112101092A (en) Automatic driving environment sensing method and system
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
CN111781608A (en) Moving target detection method and system based on FMCW laser radar
CN115685185B (en) 4D millimeter wave radar and vision fusion perception method
CN105160649A (en) Multi-target tracking method and system based on kernel function unsupervised clustering
CN115273034A (en) Traffic target detection and tracking method based on vehicle-mounted multi-sensor fusion
CN117274749B (en) Fused 3D target detection method based on 4D millimeter wave radar and image
CN111913177A (en) Method and device for detecting target object and storage medium
CN114758504A (en) Online vehicle overspeed early warning method and system based on filtering correction
CN114280611A (en) Road side sensing method integrating millimeter wave radar and camera
WO2024114119A1 (en) Sensor fusion method based on binocular camera guidance
CN114898314A (en) Target detection method, device and equipment for driving scene and storage medium
CN117111085A (en) Automatic driving automobile road cloud fusion sensing method
CN116978009A (en) Dynamic object filtering method based on 4D millimeter wave radar
Ennajar et al. Deep multi-modal object detection for autonomous driving
CN116817891A (en) Real-time multi-mode sensing high-precision map construction method
CN116482627A (en) Combined calibration method based on millimeter wave radar and monocular camera
CN113221744B (en) Monocular image 3D object detection method based on deep learning
CN116012712A (en) Object general feature-based target detection method, device, equipment and medium
US20240151855A1 (en) Lidar-based object tracking
CN113177966B (en) Three-dimensional scanning coherent laser radar point cloud processing method based on velocity clustering statistics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant