CN111462237A - Target distance detection method for constructing four-channel virtual image by using multi-source information - Google Patents

Target distance detection method for constructing four-channel virtual image by using multi-source information Download PDF

Info

Publication number
CN111462237A
CN111462237A CN202010258411.6A CN202010258411A CN111462237A CN 111462237 A CN111462237 A CN 111462237A CN 202010258411 A CN202010258411 A CN 202010258411A CN 111462237 A CN111462237 A CN 111462237A
Authority
CN
China
Prior art keywords
target
radar
image
information
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010258411.6A
Other languages
Chinese (zh)
Other versions
CN111462237B (en
Inventor
杨殿阁
周韬华
江昆
于春磊
杨蒙蒙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202010258411.6A priority Critical patent/CN111462237B/en
Publication of CN111462237A publication Critical patent/CN111462237A/en
Application granted granted Critical
Publication of CN111462237B publication Critical patent/CN111462237B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning

Abstract

The invention relates to a target distance detection method for constructing a four-channel virtual image by utilizing multi-source information, which comprises the following steps: acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the target size and the target reflection center position; according to the reflection center position of a target under a radar plane and the target center pixel position in an image acquired by a monocular camera, searching the spatial conversion relation of two sensors by a combined calibration method, and simultaneously combining time synchronization to realize the association of asynchronous heterogeneous multi-source information; constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data; and building a convolutional neural network according to the virtual four-channel picture to realize target detection. The invention can improve the distance prediction capability of target detection, realize the lightweight network structure, save the computing resources and improve the spatial information prediction precision and speed of the existing visual 3D target detection algorithm.

Description

Target distance detection method for constructing four-channel virtual image by using multi-source information
Technical Field
The invention relates to the field of environment perception of intelligent automobiles, in particular to a target distance detection method for constructing a four-channel virtual image by utilizing multi-source information.
Background
In an intelligent automobile system, it is very critical to realize accurate, reliable and robust environmental perception. The image information contains rich semantic features, and with the development of artificial intelligence and deep learning, the target detection algorithm based on the vision realized by the convolutional neural network is increasingly mature, and becomes the current popular research. However, the monocular vision cannot directly acquire the target distance information and the convolutional neural network is more suitable for classification tasks, so that the target distance detection based on the vision still needs to be improved: the existing 3D visual target detection algorithm can not meet the requirement of distance detection precision in a driving task generally, the distance precision is not accurate, and the task of target tracking or driving decision can not be performed; the network performance can be improved by increasing the network depth, such as ResNet, or increasing the network width, such as the inclusion method, but at the same time, the network structure is complex, the occupied computing resource is large, and the network is difficult to get on the vehicle for use.
The millimeter wave radar can accurately measure the distance and speed information of the target, has the capacity of all-weather work, can provide a new data source for a target detection task, makes up the visual deficiency, and realizes the distance information detection of the target. Therefore, research for improving the environmental perception performance by using multi-source fusion is also gradually focused. The existing multi-source information fusion detection method mainly comprises two methods: the pre-fusion algorithm provides a region of interest (ROI) position for vision by using target position information detected by a millimeter wave radar, and then visual target identification and classification are carried out; and the post-fusion is to process the radar and the image information respectively and correlate and fuse the target layer information obtained respectively. The former is interfered by clutter detected by the radar, and if the radar has missing detection, the target detection rate is influenced; the latter algorithm is time-consuming, and meanwhile, perception information provided by the millimeter wave radar cannot be fully utilized to achieve more effective fusion.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide a target distance detection method for constructing a four-channel virtual image by using multi-source information, which can improve the distance prediction capability of target detection, further realize lightweight network structure, save computing resources, and improve the spatial information prediction accuracy and speed of the existing visual 3D target detection algorithm.
In order to achieve the purpose, the invention adopts the following technical scheme: a target distance detection method for constructing a four-channel virtual image by using multi-source information comprises the following steps: 1) acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the target size and the target reflection center position; 2) according to the reflection center position of a target under a radar plane and the target center pixel position in an image acquired by a monocular camera, searching the spatial conversion relation of two sensors by a combined calibration method, and simultaneously combining time synchronization to realize the association of asynchronous heterogeneous multi-source information; 3) constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data; 4) and (3) building a convolutional neural network according to the virtual four-channel picture to realize target detection: and (3) constructing an end-to-end target detection convolutional neural network to realize distance fusion, so that the training of a four-channel virtual picture generated by using radar original point cloud information and an RGB image can be realized, and the category, the image boundary frame and the distance information of the target can be predicted.
Further, in the step 1), in a traffic scene, the targets are all considered to be rigid bodies, and the original point information belonging to the same target is determined through a clustering algorithm according to the similarity between the position provided by the original point and the Doppler velocity; meanwhile, outliers are eliminated by using RANSAC algorithm, and then the size (w, h) of the target and the reflection center position of the target are obtained
Figure BDA0002438338130000021
Further, the step 2) specifically comprises the following steps: 2.1) determining the Radar coordinatesTarget reflection center position of tethered calibration object
Figure BDA0002438338130000022
And the target center pixel position (u) in the image0,v0) The conversion relationship between them; 2.2) time synchronizing asynchronous information: taking the acquisition time of the camera as a reference, recording the time stamp t by an extrapolation method every time the sensing data of the camera is updatedcamFinding the radar data closest to the moment with time stamp tradarRecording time difference Δ t ═ tcam-tradarThe position information Δ x (Δ t) is updated to obtain the multi-sensor sensing data synchronized in time, considering that the speed of the object is unchanged in the short time.
Further, in the step 2.1), the measurement data (x) of the radar under the same target is directly searched by using the joint calibrationr,yr,zr) And the measurement data (u, v) of the image, the spatial transformation relationship of the two being:
Figure BDA0002438338130000023
in the formula, omega represents a proportional constant, P represents a projection matrix of 3 × 4, A represents an internal reference matrix of the camera, R represents a rotation relation in the external reference calibration, and t represents a translation relation in the external reference calibration.
Further, the joint calibration process comprises: detecting a target by the millimeter wave radar at the same time, recording data, and recording the position of the target in the image by shooting; then, obtaining the position of the target reflection center through the clustering algorithm in the step 1), and correspondingly finding the position in the image; this process is repeated to obtain multiple sets of data.
Further, in the step 3), a single-channel virtual picture with the same resolution as the RGB image is generated according to target feature information that can be reflected by the radar through a corresponding rule, and then is associated with the RGB color image of the camera to form a 4-channel virtual picture, which is used as input data for target distance detection network training.
Further, it is characterized byThe corresponding rule is as follows: (1) determining the center of the region of interest: the target reflection center position detected by the radar is positioned according to the calibration parameters found in the step 2)
Figure BDA0002438338130000024
Projecting onto an image; the position (u) of the central pixel point of the interested area of the radar detection target on the image is determined by the space conversion relation0,v0) (ii) a (2) Determining the size and pixel value of a target area in a radar single-channel virtual picture: determining a target area filling pixel value and an area size of a single-channel virtual picture by adopting a two-dimensional Gaussian model, wherein the mean value of Gaussian distribution is the pixel position (u) corresponding to the reflection center determined in the step (1)0,v0) (ii) a According to the 3 σ principle, pixels are considered to be zero outside 3 σ, so the variance (σ) of the Gaussian distribution1 22 2) Reflecting the size of the target area in two dimensions of length and width on the image, and expressing the function (sigma) of the variance and the relative distance r between the target and the sensor and the target size (w, h) estimated by the radar in the step 1) by a function g1 22 2) G (w, h, r); meanwhile, in a traffic scene, the attention degree of the detection accuracy of a close-range moving target is reflected by filling pixel values; furthermore, because the radar can provide confidence σ that the reflection point is a target, the above factor is reflected by a scaling factor k, k ═ f (r, v)relσ), k will affect the pixel values of the target fill area on the virtual image.
Further, in the step 4), in order to implement deep fusion, the neural network structure modification includes the following aspects: (1) modifying a training data reading function of the selected algorithm, and receiving data reading of the 4-channel image; (2) modifying the convolution layer convolution kernel and extracting the characteristics with higher dimensionality; (3) and modifying the reading mode of the labeling information: adding a true labeling value of a relative distance beyond a true labeling value provided during image target detection training; (4) adding a distance prediction function: a loss function for distance prediction is added.
And further, the method also comprises a step of evaluating and optimizing the convolutional neural network, when the convolutional neural network shows loss function convergence on a training set, and after training is mature, a verification set which is distributed with the training set data is constructed for verification and the effect of the convolutional neural network is evaluated.
Further, the logic for quantitatively evaluating the network effect is as follows: judging that the prediction frame is a positive sample under the condition that the overlapping rate of the prediction result and the truth value is greater than the threshold iou; considering the model to predict the target under the condition that the prediction box score is greater than the threshold score; evaluating the distance intervals according to different evaluation indexes, wherein the evaluation indexes comprise accuracy Precision TP/(TP + FP), Recall TP/(TP + FN), absolute errors of the predicted distance and errors of relative truth values; judging whether the training parameters need to be adjusted to improve the model according to the result; TP: actually, the number of positive examples is predicted by the model; FP: actually negative examples, the model predicts the number of positive examples; FN: in fact, positive cases, the model predicts the number of negative cases.
Due to the adoption of the technical scheme, the invention has the following advantages: 1. the method fully utilizes the information characteristics of the original point cloud provided by the radar to construct the virtual picture expressing the target information, and does not cause the loss of the original information. 2. The invention realizes the spatial synchronization of the multi-sensor information by using a low-cost millimeter wave radar and image combined calibration method, and has simple operation and high precision. 3. The method and the device directly acquire the spatial information of the target by fusing the radar and the image information through the virtual four-channel picture structure. 4. The invention adopts the end-to-end neural network to output the target detection information, can be directly used for the driving decision of the vehicle, reduces the intermediate processing links, and improves the accuracy, comprehensiveness and robustness of target identification.
Drawings
FIG. 1 is a schematic overall flow diagram of the present invention;
FIG. 2 is a schematic diagram of spatial joint calibration of millimeter wave radar and camera data for use in the present invention;
FIG. 3 is a diagram of a process for constructing a four-channel virtual picture according to the present invention;
fig. 4 is an example of the prediction result of the fusion model proposed by the present invention.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
As shown in fig. 1, the invention provides a target distance detection method for constructing a four-channel virtual image by using multi-source information, which performs information fusion by using a vehicle-mounted sensor millimeter wave radar and a monocular camera, and realizes traffic target detection containing distance information by using an end-to-end convolutional neural network. The invention specifically comprises the following steps:
1) acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the target size and the target reflection center position;
the radar original point information comprises a radial distance r, an angle theta and a Doppler relative velocity vrelAnd a reflection intensity γ;
under the traffic scene, the targets are all considered to be rigid bodies, so that the similarity between the position provided by the original point and the Doppler velocity is carried out through a clustering algorithm (such as K)meansClustering) to determine original point information belonging to the same target. Meanwhile, in order to reduce the influence of close vehicles or other static targets on clustering results as much as possible, an RANSAC algorithm is used for eliminating outliers, and then the target size (w, h) and the reflection center position of the target are obtained
Figure BDA0002438338130000041
2) According to the position of the reflection center of the target under the radar plane
Figure BDA0002438338130000042
The spatial conversion relation between the target central pixel position (u, v) and the target central pixel position in the image acquired by the monocular camera is searched by a joint calibration method, and the association of asynchronous heterogeneous multi-source information is realized by combining time synchronization;
the premise of realizing multi-source information fusion is to realize space-time synchronization of perception information under different acquisition frequencies and observation coordinate systems.
2.1) determining the target reflection center position of a stator under a radar coordinate system
Figure BDA0002438338130000043
And the target center pixel position (u, v) in the image.
Joint calibration involves the interconversion of the following coordinate systems: millimeter wave radar coordinate system (x)r,yr,zr) Camera coordinate system (x)c,yc,zc) An imaging plane coordinate system (x, y) and an image pixel coordinate system (u, v); wherein z isrIs a fixed value set in advance. The sequential calibration needs to calibrate the internal and external parameters of the camera first and then calibrate the external parameters of the millimeter wave radar, so as to determine the parameters of each conversion matrix. The calibration process is complicated, the requirement on the calibration precision is high, and the calibration precision is easily influenced by accumulated errors.
In this embodiment, a joint calibration method is used to directly search the observation data (x) of the same target under radarr,yr,zr) And the measurement data (u, v) of the image, the spatial transformation relationship of the two being:
Figure BDA0002438338130000051
in the formula, omega represents a proportional constant, P represents a projection matrix of 3 × 4, which comprises an internal reference and an external reference calibration, A represents an internal reference matrix of a camera, R represents a rotation relation in the external reference calibration, and t represents a translation relation in the external reference calibration;
the purpose of the combined calibration is to solve the values of omega and 3 × 4 projection matrix P by collecting data and utilizing an SVD decomposition method, the steps of the experimental process of the combined calibration are shown in figure 2, a millimeter wave radar detects a target and records data at the same time, the position of the target in a recorded image is shot, and the position of the reflection center of the target is obtained by the clustering algorithm in the step 1
Figure BDA0002438338130000052
Then correspondingly finding the target center position (u) in the image0,v0). Repeating this process can result in multiple sets of data.
Meanwhile, since the detection range of the millimeter wave radar is a sector plane with a fixed height and the ground is taken as the zero point of the z axis, the z axis is considered to ber0, the height of the reflection center of the calibration object (angular reflection/rod-shaped reflection object) is a fixed value
Figure BDA0002438338130000053
Can obtain the same
Figure BDA0002438338130000054
Multiple sets of data under value
Figure BDA0002438338130000055
And the calibration precision is further improved by using geometric constraint. The method has the advantages of low cost and simple operation, and can obtain calibration parameters with higher precision.
2.2) because the data acquisition frequencies of the millimeter wave radar and the camera are not equal, time synchronization is needed to be carried out on asynchronous information: because the acquisition frequency of the millimeter wave radar is a fixed value, the acquisition frequency of the camera is not fixed due to data frame dropping, and the acquisition time of the camera is used as a reference. By extrapolation, the timestamp t is recorded each time the camera's perception data is updatedcamFinding the radar data closest to the moment with time stamp tradarRecording time difference Δ t ═ tcam-tradar. The recording time difference is usually less than 5ms, so that the position information Δ x (Δ t) is updated by considering that the speed of the object is unchanged in the short time, thereby obtaining the multi-sensor perception data synchronized in time.
The correlation of asynchronous heterogeneous multi-source information is realized based on the space-time synchronization method.
3) Constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data;
generating a single-channel virtual picture with the same resolution as the RGB image according to target characteristic information which can be reflected by a radar through a corresponding rule, and then associating the single-channel virtual picture with the RGB color image of the camera to form a 4-channel virtual picture (namely the 4-channel virtual picture is formed by the way that the first 3 channels are the RGB color image and the 4 th channel is the 4 th channelA single-channel virtual picture made of the radar data in time-space synchronization with the picture in the step 2) is used as input data for training the target distance detection network. The novel structure is beneficial to extracting features of the convolutional neural network for feature learning, and makes full use of various types of data provided by the radar on the basis of the ROI (region of interest) extraction algorithm extracted by the radar, including the size (w, h) of the target and the position of the central point of the target, which are obtained by inference in the step 1)
Figure BDA0002438338130000061
Target relative velocity vrelReflection intensity γ, confidence σ.
The corresponding rules for making a virtual picture are:
(1) determining the center of area of interest
The target reflection center position detected by the radar is positioned according to the calibration parameters found in the step 2)
Figure BDA0002438338130000062
Projecting onto an image; the position (u) of a central pixel point of a Region of interest (ROI) of a radar detection target on an image is determined by a spatial conversion relation0,v0)。
(2) Determining size and pixel value of target area in radar single-channel virtual picture
And determining a target area filling pixel value and an area size of the single-channel virtual picture by adopting a two-dimensional Gaussian model. The size of the target reflected on the image is related to the relative distance between the target and the own vehicle and the size of the target itself. The mean value of Gaussian distribution is the pixel position (u) corresponding to the reflection center determined in the step (1)0,v0) (ii) a The variance of the gaussian distribution is a very important parameter in the gaussian distribution, reflects the shape characteristics of the gaussian distribution, and according to the 3 σ principle, the pixels outside 3 σ are considered to be zero. Thus the variance (σ) of the Gaussian distribution1 22 2) Reflecting the size of a target area in two dimensions of length and width on an image, and expressing the variance and the relative distance r between a target and a sensor by a function g and expressing the function (sigma) of the target size (w, h) estimated by radar in the clustering algorithm of step 1)1 22 2)=g(w,h,r)。
Meanwhile, in a traffic scene, due to the importance of driving safety, the detection accuracy of a close-distance moving target is more concerned, so that the filling pixel value can reflect the concerned degree. Furthermore, since radar can provide confidence σ whether a reflection point is a target, it will also be one of the considerations of pixel fill value, which is reflected by a scaling factor k, k ═ f (r, v)relσ), k will affect the pixel values of the target fill area on the virtual image.
Combining the above principles, the default ρ is 0, and the pixel value G (u, v) filled in at an arbitrary pixel position (u, v) of the virtual picture conforms to the following gaussian distribution:
Figure BDA0002438338130000063
12]=[u0,v0],[σ1 22 2]=g(w,h,r),k=f(r,vrel,σ)
wherein (μ)12) The mean value in a two-dimensional Gaussian distribution model, the physical meaning is the position of the center of reflection of a radar target
Figure BDA0002438338130000064
Position (u) of target center pixel point obtained by projecting to image0,v0);
Figure BDA0002438338130000065
For variance in a two-dimensional gaussian distribution model, the physical meaning reflects the dimensional relationship of the target on the image, which is expressed by a function g, in relation to the actual size (w, h) of the target in both the length and width dimensions and the target-to-sensor relative distance r, according to the previous analysis: (sigma)1 22 2) G (w, h, r); k is a scale factor of the model, the physical meaning influences the size of the filling pixel value, and the confidence coefficient sigma of the target, the target and the propagation provided by the radar are calculated according to the corresponding ruleRelative distance r, relative velocity v of sensorrelCorrelation, this relationship is represented by the function f: k ═ f (r, v)rel,σ)。
Because the input data characteristic of the convolutional neural network extracted characteristic is image information, a single-channel picture capable of reflecting target position information and size information is manufactured by utilizing millimeter wave radar information, and is stacked with an RGB three-channel picture to form an RGB-D type 4-channel picture, and the 4-channel picture is sent to network learning to predict a target and distance information thereof. The final implementation effect is shown in fig. 3, and then four-channel picture data is used as input data for model training.
Meanwhile, since depth fusion is expected to provide prediction of the position, the target category and the spatial distance of the target on the image, the category information of the target, the relative position information in the image (stability of model training is facilitated by using the relative position information) and the distance information detected by the radar can be provided when the truth annotation text is made.
4) Building a convolutional neural network according to the virtual four-channel picture to realize target detection;
and (3) constructing an end-to-end target detection convolutional neural network to realize deep fusion, so that the training of a four-channel virtual picture generated by using radar original point cloud information and an RGB image can be realized, and the category, the image boundary frame and the distance information of the target can be predicted.
To achieve deep fusion, the neural network architecture modification includes the following aspects:
(1) modifying a training data reading function of the selected algorithm, and receiving data reading of the 4-channel image;
(2) modifying the convolutional layer convolution kernel, and extracting the characteristics of higher dimensionality: since the number of channels of a convolution kernel must be the same as the input of the convolution kernel (or called convolution filter) for the size of the convolution kernel, changing the input to a four-channel training picture increases the number of channels of the corresponding convolution kernel to 4.
(3) And modifying the reading mode of the labeling information: adding a true labeling value of a relative distance beyond a true labeling value provided during image target detection training;
(4) adding a distance prediction function: adding a loss function for distance prediction, preferably a squared loss function
Figure BDA0002438338130000071
Is a scale parameter, d is a model predicted distance value,
Figure BDA0002438338130000072
is the true value of the distance.
In a preferred embodiment, a YO L Ov2 target detection algorithm is used to achieve the above 4 aspects, a Darknet53 network corresponding to the YO L Ov2 algorithm is a typical end-to-end convolutional neural network for achieving a visual target detection task, which can directly input an image output target detection result, the algorithm speed is high, and the accuracy is relatively high.
5) Convolutional neural network evaluation and optimization
When the convolutional neural network shows loss function convergence on a training set, after training is mature, establishing a verification set which is distributed with the training set data in the same way for verification and evaluating the effect of the convolutional neural network; adjustment of the model parameters is facilitated.
The logic for quantitative evaluation of network effects is as follows: judging that the prediction frame is a positive sample under the condition that the overlapping rate of the prediction result and the truth value is greater than the threshold iou; considering the model to predict the target under the condition that the prediction box score is greater than the threshold score; evaluating the distance intervals according to different evaluation indexes, wherein the evaluation indexes comprise accuracy Precision TP/(TP + FP), Recall TP/(TP + FN), absolute errors of the predicted distance and errors of relative truth values; and judging whether the training parameters need to be adjusted to improve the model according to the result. Wherein:
TP (true Positive): actually, the number of positive examples is predicted by the model;
FP (false positives): actually negative examples, the model predicts the number of positive examples;
fn (false negative): actually positive examples, the model predicts the number of negative examples;
the method for improving the model according to the prediction result comprises the following steps:
when the accuracy precision of the model on the training set and the verification set is lower than a preset threshold value, the model is considered to be under-fitted and needs to be trained continuously;
when the accuracy precision of the model on the training set is higher than a preset threshold value and the accuracy on the verification set is lower than the preset threshold value, the model is considered to be over-fitted, the number of training rounds needs to be reduced, and the data volume of the training set is increased;
meanwhile, according to the evaluation results of different distance intervals, corresponding data amount is increased for the distance intervals with poor evaluation effect.
Adjusting training parameters: the learning rate and the training batch are continuously adjusted to obtain the best evaluation result.
In the training of the fusion network, the network effect is generated every 10000 times of training verification until the network training is mature; optimizing the network, increasing noise: considering the phenomena of false detection and missing detection of the millimeter wave radar, and adding random noise to input data; and modifying the projection relation: the model effect is related to the projection relation of the target frame, and the model effect is improved continuously, so that the prediction effect of the model is improved.
In addition, target information and speed information provided by the radar can be subjected to re-matching through data re-association, and a final target detection result of fusion perception is obtained. The finally realized model prediction effect is as shown in fig. 4, and can not only detect the position and the type of the target on the image, but also provide the prediction of the relative distance between the target and the sensor.
In conclusion, through training and verification, a mature model capable of detecting the target distance of the virtual four-channel picture is obtained. The method can better predict the category and distance information of the front target, and realizes the utilization of multi-source information.
In the example, a multi-sensor data acquisition system is carried on the experiment vehicle to synchronize multi-source information and generate a virtual four-channel picture. By means of deep fusion, the obtained target detection result can be applied to a simple algorithm for early warning of front vehicle collision and pedestrian avoidance, and driving decision assistance is achieved.
The method is different from the traditional fusion algorithm, the original point cloud information of the millimeter wave radar is fully utilized, target extraction is carried out through clustering and RANSAC algorithm, and the reflection center position and size of the target are obtained; the time-space synchronization of multi-source information is realized by utilizing a low-cost millimeter wave radar and image combined calibration method; the novel data structure for the fusion of the original point information and the visual information of the millimeter wave radar is provided, a single-channel virtual picture reflecting target position and distance information is generated by utilizing Gaussian distribution and combining radar point cloud information, an RGB-D four-channel virtual picture is constructed by being associated with the visual information, the single-channel virtual picture is synchronously input into a convolutional neural network for deep learning, the convolutional neural network is adjusted, and target detection with spatial information is realized. Since the network input information contains richer spatial information about the target, the distance prediction capability of target detection can be further improved. Meanwhile, deep fusion is realized by utilizing an end-to-end neural network, the lightweight of a network structure can be further realized, the computing resources are saved, and the spatial information prediction precision and speed of the existing visual 3D target detection algorithm are improved.
The above embodiments are only for illustrating the present invention, and the steps may be changed, and on the basis of the technical solution of the present invention, the modification and equivalent changes of the individual steps according to the principle of the present invention should not be excluded from the protection scope of the present invention.

Claims (10)

1. A target distance detection method for constructing a four-channel virtual image by using multi-source information is characterized by comprising the following steps:
1) acquiring original point cloud data by using a millimeter wave radar to perform information processing, determining radar original point information belonging to the same target, and obtaining the target size and the target reflection center position;
2) according to the reflection center position of a target under a radar plane and the target center pixel position in an image acquired by a monocular camera, searching the spatial conversion relation of two sensors by a combined calibration method, and simultaneously combining time synchronization to realize the association of asynchronous heterogeneous multi-source information;
3) constructing a virtual four-channel picture containing distance information according to the incidence relation between the millimeter wave radar and the image data;
4) and (3) building a convolutional neural network according to the virtual four-channel picture to realize target detection: and (3) constructing an end-to-end target detection convolutional neural network to realize distance fusion, so that the training of a four-channel virtual picture generated by using radar original point cloud information and an RGB image can be realized, and the category, the image boundary frame and the distance information of the target can be predicted.
2. The object distance detection method according to claim 1, characterized in that: in the step 1), in a traffic scene, the targets are all considered to be rigid bodies, and the original point information belonging to the same target is determined through a clustering algorithm according to the similarity between the position provided by the original point and the Doppler velocity; meanwhile, outliers are eliminated by using RANSAC algorithm, and then the size (w, h) of the target and the reflection center position of the target are obtained
Figure FDA0002438338120000012
3. The object distance detection method according to claim 1, characterized in that: the step 2) specifically comprises the following steps:
2.1) determining the target reflection center position of a stator under a radar coordinate system
Figure FDA0002438338120000013
And the target center pixel position (u) in the image0,v0) The conversion relationship between them;
2.2) time synchronizing asynchronous information: taking the acquisition time of the camera as a reference, recording the time stamp t by an extrapolation method every time the sensing data of the camera is updatedcamFinding the radar data closest to the moment with time stamp tradarRecording time difference Δ t ═ tcam-tradarThe position information Δ x (Δ t) is updated to obtain the multi-sensor sensing data synchronized in time, considering that the speed of the object is unchanged in the short time.
4. The object distance detection method according to claim 3, characterized in that: in the step 2.1), the measurement data (x) of the radar under the same target is directly searched by using the combined calibrationr,yr,zr) And the measurement data (u, v) of the image, the spatial transformation relationship of the two being:
Figure FDA0002438338120000011
P=A[R|t]
in the formula, omega represents a proportional constant, P represents a projection matrix of 3 × 4, A represents an internal reference matrix of the camera, R represents a rotation relation in the external reference calibration, and t represents a translation relation in the external reference calibration.
5. The object distance detecting method according to claim 4, characterized in that: the process of the combined calibration comprises the following steps: detecting a target by the millimeter wave radar at the same time, recording data, and recording the position of the target in the image by shooting; then, obtaining the position of the target reflection center through the clustering algorithm in the step 1), and correspondingly finding the position in the image; this process is repeated to obtain multiple sets of data.
6. The object distance detection method according to claim 1, characterized in that: in the step 3), a single-channel virtual picture with the same resolution as the RGB image is generated according to target characteristic information which can be reflected by the radar through a corresponding rule, and then the single-channel virtual picture is associated with the RGB color image of the camera to form a 4-channel virtual picture which is used as input data of target distance detection network training.
7. The object distance detecting method according to claim 6, characterized in that: the corresponding rule is:
(1) determining the center of the region of interest: the target reflection center position detected by the radar is positioned according to the calibration parameters found in the step 2)
Figure FDA0002438338120000021
Projecting onto an image; the position (u) of the central pixel point of the interested area of the radar detection target on the image is determined by the space conversion relation0,v0);
(2) Determining the size and pixel value of a target area in a radar single-channel virtual picture: determining a target area filling pixel value and an area size of a single-channel virtual picture by adopting a two-dimensional Gaussian model, wherein the mean value of Gaussian distribution is the pixel position (u) corresponding to the reflection center determined in the step (1)0,v0) (ii) a According to the 3 σ principle, pixels are considered to be zero outside 3 σ, so the variance (σ) of the Gaussian distribution1 22 2) Reflecting the size of the target area in two dimensions of length and width on the image, and expressing the function (sigma) of the variance and the relative distance r between the target and the sensor and the target size (w, h) estimated by the radar in the step 1) by a function g1 22 2)=g(w,h,r);
Meanwhile, in a traffic scene, the attention degree of the detection accuracy of a close-range moving target is reflected by filling pixel values; furthermore, because the radar can provide confidence σ that the reflection point is a target, the above factor is reflected by a scaling factor k, k ═ f (r, v)relσ), k will affect the pixel values of the target fill area on the virtual image.
8. The object distance detection method according to claim 1, characterized in that: in the step 4), in order to implement deep fusion, the modification of the neural network structure includes the following aspects:
(1) modifying a training data reading function of the selected algorithm, and receiving data reading of the 4-channel image;
(2) modifying the convolution layer convolution kernel and extracting the characteristics with higher dimensionality;
(3) and modifying the reading mode of the labeling information: adding a true labeling value of a relative distance beyond a true labeling value provided during image target detection training;
(4) adding a distance prediction function: a loss function for distance prediction is added.
9. The object distance detection method according to claim 1, characterized in that: and when the convolutional neural network shows loss function convergence on a training set, constructing a verification set which is distributed with the training set data in the same way for verification and evaluating the effect of the convolutional neural network after the training is mature.
10. The object distance detecting method according to claim 9, characterized in that: the logic for quantitative evaluation of network effects is as follows: judging that the prediction frame is a positive sample under the condition that the overlapping rate of the prediction result and the truth value is greater than the threshold iou; considering the model to predict the target under the condition that the prediction box score is greater than the threshold score; evaluating the distance intervals according to different evaluation indexes, wherein the evaluation indexes comprise accuracy Precision TP/(TP + FP), Recall TP/(TP + FN), absolute errors of the predicted distance and errors of relative truth values; judging whether the training parameters need to be adjusted to improve the model according to the result; TP: actually, the number of positive examples is predicted by the model; FP: actually negative examples, the model predicts the number of positive examples; FN: in fact, positive cases, the model predicts the number of negative cases.
CN202010258411.6A 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information Active CN111462237B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010258411.6A CN111462237B (en) 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010258411.6A CN111462237B (en) 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information

Publications (2)

Publication Number Publication Date
CN111462237A true CN111462237A (en) 2020-07-28
CN111462237B CN111462237B (en) 2022-09-20

Family

ID=71685888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010258411.6A Active CN111462237B (en) 2020-04-03 2020-04-03 Target distance detection method for constructing four-channel virtual image by using multi-source information

Country Status (1)

Country Link
CN (1) CN111462237B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111921199A (en) * 2020-08-25 2020-11-13 腾讯科技(深圳)有限公司 Virtual object state detection method, device, terminal and storage medium
CN112505684A (en) * 2020-11-17 2021-03-16 东南大学 Vehicle multi-target tracking method based on radar vision fusion under road side view angle in severe environment
CN112528763A (en) * 2020-11-24 2021-03-19 浙江大华汽车技术有限公司 Target detection method, electronic device and computer storage medium
CN112766302A (en) * 2020-12-17 2021-05-07 浙江大华技术股份有限公司 Image fusion method and device, storage medium and electronic device
CN113095154A (en) * 2021-03-19 2021-07-09 西安交通大学 Three-dimensional target detection system and method based on millimeter wave radar and monocular camera
CN113221957A (en) * 2021-04-17 2021-08-06 南京航空航天大学 Radar information fusion characteristic enhancement method based on Centernet
CN113222111A (en) * 2021-04-01 2021-08-06 上海智能网联汽车技术中心有限公司 Automatic driving 4D perception method, system and medium suitable for all-weather environment
CN113808219A (en) * 2021-09-17 2021-12-17 西安电子科技大学 Radar-assisted camera calibration method based on deep learning
US11315271B2 (en) * 2020-09-30 2022-04-26 Tsinghua University Point cloud intensity completion method and system based on semantic segmentation
CN115052333A (en) * 2021-03-08 2022-09-13 罗伯特·博世有限公司 Method and device for time synchronization of a first vehicle and a second vehicle
CN115701818A (en) * 2023-01-04 2023-02-14 江苏汉邦智能系统集成有限公司 Intelligent garbage classification control system based on artificial intelligence
CN115932702A (en) * 2023-03-14 2023-04-07 武汉格蓝若智能技术股份有限公司 Voltage transformer online operation calibration method and device based on virtual standard device
CN113808219B (en) * 2021-09-17 2024-05-14 西安电子科技大学 Deep learning-based radar auxiliary camera calibration method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103559791A (en) * 2013-10-31 2014-02-05 北京联合大学 Vehicle detection method fusing radar and CCD camera signals
WO2015008310A1 (en) * 2013-07-19 2015-01-22 Consiglio Nazionale Delle Ricerche Method for filtering of interferometric data acquired by synthetic aperture radar (sar)
CN109086788A (en) * 2017-06-14 2018-12-25 通用汽车环球科技运作有限责任公司 The equipment of the multi-pattern Fusion processing of data for a variety of different-formats from isomery device sensing, method and system
CN110378196A (en) * 2019-05-29 2019-10-25 电子科技大学 A kind of road vision detection method of combination laser point cloud data
CN110390697A (en) * 2019-07-11 2019-10-29 浙江大学 A kind of millimetre-wave radar based on LM algorithm and camera combined calibrating method
CN110674733A (en) * 2019-09-23 2020-01-10 厦门金龙联合汽车工业有限公司 Multi-target detection and identification method and driving assistance method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015008310A1 (en) * 2013-07-19 2015-01-22 Consiglio Nazionale Delle Ricerche Method for filtering of interferometric data acquired by synthetic aperture radar (sar)
CN103559791A (en) * 2013-10-31 2014-02-05 北京联合大学 Vehicle detection method fusing radar and CCD camera signals
CN109086788A (en) * 2017-06-14 2018-12-25 通用汽车环球科技运作有限责任公司 The equipment of the multi-pattern Fusion processing of data for a variety of different-formats from isomery device sensing, method and system
CN110378196A (en) * 2019-05-29 2019-10-25 电子科技大学 A kind of road vision detection method of combination laser point cloud data
CN110390697A (en) * 2019-07-11 2019-10-29 浙江大学 A kind of millimetre-wave radar based on LM algorithm and camera combined calibrating method
CN110674733A (en) * 2019-09-23 2020-01-10 厦门金龙联合汽车工业有限公司 Multi-target detection and identification method and driving assistance method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
NINGBO LONG 等: "Unifying obstacle detection, recognition,and fusion based on millimeter wave radar and RGB-depth sensors for the visually impaired", 《REV. SCI. INSTRUM. 90, 044102 (2019)》 *
TAOHUA ZHOU 等: "Object Detection Using Multi-Sensor Fusion Based on Deep Learning", 《CONFERENCE: 19TH COTA INTERNATIONAL CONFERENCE OF TRANSPORTATION PROFESSIONALS》 *
夏朝阳等: "基于多通道调频连续波毫米波雷达的微动手势识别", 《电子与信息学报》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111921199A (en) * 2020-08-25 2020-11-13 腾讯科技(深圳)有限公司 Virtual object state detection method, device, terminal and storage medium
CN111921199B (en) * 2020-08-25 2023-09-26 腾讯科技(深圳)有限公司 Method, device, terminal and storage medium for detecting state of virtual object
US11315271B2 (en) * 2020-09-30 2022-04-26 Tsinghua University Point cloud intensity completion method and system based on semantic segmentation
CN112505684A (en) * 2020-11-17 2021-03-16 东南大学 Vehicle multi-target tracking method based on radar vision fusion under road side view angle in severe environment
CN112505684B (en) * 2020-11-17 2023-12-01 东南大学 Multi-target tracking method for radar vision fusion under side view angle of severe environment road
CN112528763A (en) * 2020-11-24 2021-03-19 浙江大华汽车技术有限公司 Target detection method, electronic device and computer storage medium
CN112766302A (en) * 2020-12-17 2021-05-07 浙江大华技术股份有限公司 Image fusion method and device, storage medium and electronic device
CN112766302B (en) * 2020-12-17 2024-03-29 浙江大华技术股份有限公司 Image fusion method and device, storage medium and electronic device
CN115052333A (en) * 2021-03-08 2022-09-13 罗伯特·博世有限公司 Method and device for time synchronization of a first vehicle and a second vehicle
CN113095154A (en) * 2021-03-19 2021-07-09 西安交通大学 Three-dimensional target detection system and method based on millimeter wave radar and monocular camera
CN113222111A (en) * 2021-04-01 2021-08-06 上海智能网联汽车技术中心有限公司 Automatic driving 4D perception method, system and medium suitable for all-weather environment
CN113221957A (en) * 2021-04-17 2021-08-06 南京航空航天大学 Radar information fusion characteristic enhancement method based on Centernet
CN113221957B (en) * 2021-04-17 2024-04-16 南京航空航天大学 Method for enhancing radar information fusion characteristics based on center
CN113808219A (en) * 2021-09-17 2021-12-17 西安电子科技大学 Radar-assisted camera calibration method based on deep learning
CN113808219B (en) * 2021-09-17 2024-05-14 西安电子科技大学 Deep learning-based radar auxiliary camera calibration method
CN115701818A (en) * 2023-01-04 2023-02-14 江苏汉邦智能系统集成有限公司 Intelligent garbage classification control system based on artificial intelligence
CN115701818B (en) * 2023-01-04 2023-05-09 江苏汉邦智能系统集成有限公司 Intelligent garbage classification control system based on artificial intelligence
CN115932702A (en) * 2023-03-14 2023-04-07 武汉格蓝若智能技术股份有限公司 Voltage transformer online operation calibration method and device based on virtual standard device

Also Published As

Publication number Publication date
CN111462237B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN111462237B (en) Target distance detection method for constructing four-channel virtual image by using multi-source information
CN110675418B (en) Target track optimization method based on DS evidence theory
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
CN112149550A (en) Automatic driving vehicle 3D target detection method based on multi-sensor fusion
CN115685185B (en) 4D millimeter wave radar and vision fusion perception method
CN114495064A (en) Monocular depth estimation-based vehicle surrounding obstacle early warning method
CN115273034A (en) Traffic target detection and tracking method based on vehicle-mounted multi-sensor fusion
CN117274749B (en) Fused 3D target detection method based on 4D millimeter wave radar and image
CN111913177A (en) Method and device for detecting target object and storage medium
CN114280611A (en) Road side sensing method integrating millimeter wave radar and camera
CN114758504A (en) Online vehicle overspeed early warning method and system based on filtering correction
CN115761534A (en) Method for detecting and tracking small target of infrared unmanned aerial vehicle under air background
CN117111085A (en) Automatic driving automobile road cloud fusion sensing method
CN116978009A (en) Dynamic object filtering method based on 4D millimeter wave radar
Ennajar et al. Deep multi-modal object detection for autonomous driving
CN116817891A (en) Real-time multi-mode sensing high-precision map construction method
CN116482627A (en) Combined calibration method based on millimeter wave radar and monocular camera
CN113177966B (en) Three-dimensional scanning coherent laser radar point cloud processing method based on velocity clustering statistics
CN113221744B (en) Monocular image 3D object detection method based on deep learning
CN115546595A (en) Track tracking method and system based on fusion sensing of laser radar and camera
CN115471526A (en) Automatic driving target detection and tracking method based on multi-source heterogeneous information fusion
WO2023009180A1 (en) Lidar-based object tracking
CN116433712A (en) Fusion tracking method and device based on pre-fusion of multi-sensor time sequence sensing results
CN111090105B (en) Vehicle-mounted laser radar point cloud signal ground point separation method
CN116433715A (en) Time sequence tracking method, device and medium based on multi-sensor front fusion result

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant