CN117406212A

CN117406212A - Visual fusion detection method for traffic multi-element radar

Info

Publication number: CN117406212A
Application number: CN202311452810.6A
Authority: CN
Inventors: 柯文雄; 陈潜; 刘剑; 席光荣; 尹洁珺; 王士铎; 刘涛; 李小柳; 李由之
Original assignee: Shanghai Radio Equipment Research Institute
Current assignee: Shanghai Radio Equipment Research Institute
Priority date: 2023-11-02
Filing date: 2023-11-02
Publication date: 2024-01-16

Abstract

A visual fusion detection method of traffic multi-element radar includes collecting original image data of traffic road through camera, collecting radar point cloud data through millimeter wave radar, recoding original point cloud data detected by radar into grid image, carrying out target detection on grid image and camera image through neural network, expressing target length, target width and speed in target information detected by radar and camera as length, width and height of three-dimensional feature cube in radar coordinate system, calculating cross-correlation of three-dimensional feature cube, calculating correlation of target according to cross-correlation, carrying out two-source information fusion on correlated target information, and removing false target and redundant information. The method and the device fully exert the advantages of the camera and the millimeter wave radar, can detect more abundant and accurate traffic target information, more accurately and rapidly correlate targets, meet the requirements of intelligent traffic application, and provide powerful support for realizing the efficient operation and traffic safety of an intelligent traffic system.

Description

Visual fusion detection method for traffic multi-element radar

Technical Field

The invention relates to the technical field of traffic radars, in particular to a visual fusion detection method of a traffic multi-element radar.

Background

In the intelligent traffic field, accurate and efficient detection of traffic targets is important for realizing intelligent traffic management and ensuring road traffic safety. Conventional traffic target detection methods typically employ a single sensor data, such as using only camera images or radar point clouds for target detection. However, due to the complexity and variability of road traffic scenarios, a single sensor often fails to meet the accuracy and robustness requirements. The camera image is influenced by factors such as illumination, weather and the like, and shadow, blurring and the like can possibly be generated, so that the target is difficult to accurately detect. While radar point clouds, while capable of providing distance and speed information, are relatively weak in terms of object classification and detail recognition.

In recent years, with the complexity of traffic scenes and the increasing requirements for identifying and tracking traffic targets, the requirement for multi-sensor fusion is gradually increasing. However, many difficulties still exist in current multi-sensor fusion. The data generated by the different sensors may have temporal-spatial inconsistencies, which may affect the fusion effect. And (3) designing a proper data fusion algorithm, integrating data generated by different sensors, extracting useful information and eliminating redundancy. Traffic target identification and tracking usually needs to be performed under the condition of higher real-time requirements, and multi-sensor fusion increases the complexity of data processing, and places higher requirements on computing resources and processing speed.

The statements herein merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Disclosure of Invention

The invention aims to provide a visual fusion detection method for a traffic multi-element radar, which comprehensively uses two sensors of a camera and a millimeter wave radar, fully exerts their respective advantages, can detect more abundant and accurate traffic target information, more accurately and rapidly associates targets, meets the requirements of intelligent traffic application, and provides powerful support for realizing the efficient operation and traffic safety of an intelligent traffic system.

In order to achieve the above purpose, the invention provides a visual fusion detection method for a traffic multi-element radar, which comprises the following steps:

step S1, acquiring original image data of a traffic road through a camera, recording acquisition time, acquiring point cloud data by a millimeter wave radar, acquiring a point cloud position, a speed, a scattering cross section RCS and the acquisition time to form radar point cloud data, calculating a coordinate transformation matrix of the camera and the radar, unifying pixel point coordinates into a radar coordinate system, and aligning a data time axis;

step S2, recoding original point cloud data detected by a radar into a grid image, and performing target detection on the grid image by using a neural network to obtain the position, the speed and the size of a radar target;

s3, performing target detection on the camera image by using a neural network, extracting the position, speed and size of a target from the image, detecting the category, color and brand, and converting the target detection information of the camera image into a radar coordinate system by adopting the coordinate transformation matrix obtained in the step S1;

s4, in a radar coordinate system, representing the target length, the target width and the target speed in target information detected by a radar and a camera as the length, the width and the height of a three-dimensional feature cube, calculating the intersection ratio of the three-dimensional feature cube, and calculating the target association degree according to the intersection ratio;

and S5, carrying out binary information fusion on the associated target information, removing false targets and redundant information, converting the false targets and redundant information into video, and displaying the video on a user interface.

In step S1, a 4D millimeter wave radar and a 4K resolution camera are selected as sensors to acquire traffic scene information, and the radar and the camera are fixed at similar positions on a traffic rod to realize the overlap ratio of a field of view and cover all traffic areas to be observed.

In step S1, a Zhang' S calibration algorithm is adopted to calibrate the camera, the internal parameters and the external parameters of the camera are estimated, the external parameters of the radar and the video equipment are calibrated by using angle inversion, the coordinates of the angle inversion under the radar and the video coordinate system are detected for a plurality of times, a coordinate transformation matrix for converting the points in the camera coordinate system into the radar coordinate system is obtained through calculation, and the pixel point coordinates are unified into the radar coordinate system.

In step S1, time service is performed using NTP, and time synchronization between the radar and the video device and NTP is ensured.

In the step S2, the radar point cloud data is converted into a grid with a fixed height h, width w and cell size S, each cell of the grid contains three channels, which represent the reflection section RCS, x-direction speed and y-direction speed of the radar point, respectively, if a plurality of radar points fall into the same cell, the three channel values of the cell take the average value of the radar points, and if a certain cell is not mapped to a radar point, the three channel values of the cell are set to zero.

In the step S2, a neural network yolv 5 is used to detect a two-dimensional object in the radar grid image, when the neural network yolv 5 is trained, each vehicle is classified into m classes according to the size, the neural network yolv 5 outputs predicted B target frames, each target frame includes x and y positions of a target center, a height h, a width w and corresponding class information, probability of a class to which the target belongs and a target confidence, and an average value of a-class speeds nearest to the target center is calculated as a measurement speed of the target.

In the step S3, the detected vehicle target is used as input, a neural network YoloV5 is used to detect a two-dimensional object in a camera image, so as to detect and position the vehicle target in the image, the neural network outputs predicted C target frames, each of which includes x and y positions of the center of the target, a height h, a width w and corresponding category information, and also includes probability of the category to which the target belongs and target confidence, and between adjacent video frames, a Lucas-Kanade optical flow algorithm is used to track pixels of the target frame, and the movement speed of the target is estimated according to the displacement of the pixels.

In the step S4, using the transverse direction of the plane where the road is located as the x-axis, the longitudinal direction as the y-axis, and the speed corresponding to the target as the z-axis, and representing the target information detected by the radar and the camera as a three-dimensional feature cube;

for the radar detected target information, the bottom center point coordinates (Px 1, py 1) of each target frame are used as the positions of the radar feature cube CU1, the length Δy1 and the width Δx1 of the target frame are used as the length L1 and the width W1 of the feature cube, and the corresponding speed v1 of the target frame is used as the height H1 of the feature cube;

for the target information detected by the camera, the coordinates (Px 2, py 2) of the bottom edge center point of each target frame under the radar coordinate system are taken as the position of a camera feature cube CU2, the length delta y2 and the width delta x2 of the target frame under the radar coordinate system are taken as the length L2 and the width W2 of the feature cube, and the corresponding speed v2 of the target frame is taken as the height H2 of the feature cube;

calculating the association degree between targets according to the intersection ratio IOU of the feature cubes:

wherein U is the union of the thunder feature cubes, and I is the intersection of the thunder feature cubes;

when the IOU is greater than the threshold th, the radar and video detection targets are deemed to be associated targets.

In the step S5, a target pair with high relevance is screened out through the target relevance, and is considered as a real relevance target; the radar targets corresponding to the low association degree are removed as false targets; in the case that multiple adjacent radar and video targets are associated with each other, they are combined and de-duplicated as duplicate targets, and only the target with the highest confidence is reserved.

In step S5, for the target corresponding to the associated target, the position and speed information detected by the radar is used as the real position and speed of the target, the size, color and brand attribute detected by the camera are added into the target attribute, and all the attribute information and motion state of the target are converted into video to be intuitively displayed on the user interface.

According to the invention, road traffic target detection is carried out by adopting a camera and a millimeter wave radar, original point cloud data detected by the radar are recoded into grid images, target detection is carried out on the radar grid images and the camera images by adopting a neural network, so that information such as the position, the speed, the size and the like of the target can be obtained, meanwhile, attributes such as category, color, brand and the like can be detected for the camera images, the length, the width and the speed of the target are expressed as the three-dimensional characteristic length of the characteristic cubes, the association relation between the targets is obtained by calculating the intersection ratio between the radar characteristic cubes, the rapid and effective target association is realized, the target attribute information from the radar targets is finally associated together, the false targets and redundant information are removed, and the information is displayed on a user interface, so that a user can intuitively know the detected traffic target information.

Drawings

Fig. 1 is a flow chart of a visual fusion detection method of a traffic multi-element radar.

Fig. 2 is a schematic view of a radar and camera installation in an embodiment of the invention.

Detailed Description

The following describes a preferred embodiment of the present invention in detail with reference to fig. 1 and 2.

As shown in fig. 1, the invention provides a visual fusion detection method for a traffic multi-element radar, which comprises the following steps:

and step S1, the camera and the radar collect data simultaneously.

The method comprises the steps of collecting original image data of a traffic road through a camera, recording collecting time, and collecting radar point cloud data comprising point cloud positions, speeds, scattering cross sections RCS, collecting time and the like through a millimeter wave radar.

And calculating an external parameter conversion matrix of the camera and the radar, unifying pixel point coordinates into a radar coordinate system, aligning a data time axis, and ensuring that the data are positioned on the same coordinate system and time axis.

In this embodiment, in order to obtain richer point cloud information and high-definition images, a 4D millimeter wave radar and a 4K resolution camera are selected as sensors to obtain information of traffic scenes. As shown in fig. 2, the radar and the camera are fixed at similar positions on the traffic rod, so as to realize the overlap ratio of the fields of view and cover all traffic areas to be observed. The 4K resolution camera collects original image data of traffic roads and records collection time, and the 4D millimeter wave radar collects radar point cloud data under a road scene, including point cloud positions, speeds, scattering cross sections RCS, collection time and the like. And calibrating the camera by adopting a Zhang calibration algorithm, and estimating internal parameters (including focal length, principal point and the like) and external parameters (the position and the gesture of the camera). And simultaneously, performing external parameter calibration of the radar and the video equipment by using the angle reverse. The pixel point coordinates can be unified into the radar coordinate system by detecting coordinates of angles under the radar and video coordinate systems for multiple times, and calculating to obtain a coordinate transformation matrix for converting points in the camera coordinate system into the radar coordinate system. To achieve time alignment, time service is performed using NTP (Network Time Protocol), ensuring that the time of the radar and video equipment is synchronized with the NTP time.

And step S2, lei Dadian cloud target detection.

Recoding the original point cloud data detected by the radar into a grid image, and carrying out target detection on the grid image by adopting a neural network to obtain the information such as the position, the speed, the size and the like of a radar target.

In this embodiment, the radar point cloud data is converted into a grid with a fixed height h, width w and cell size s. By mapping the locations of the radar points into a grid, the radar point cloud data can be converted into a similar image form, and radar points that exceed the size of the grid will be discarded. Each cell of the grid contains three channels representing the reflection section RCS, x-direction velocity and y-direction velocity of the radar point, respectively. If multiple radar points fall into the same cell, the three channel values of the cell take the average of the radar points, and if a cell is not mapped to a radar point, the three channel values of the cell will be set to zero to maintain the integrity and consistency of the grid.

Neural network YoloV5 is used for two-dimensional object detection in radar cell images. In radar sensing, the size of the target is related to the number of radar point clouds and the scatter cross section RCS, and thus the target size can be determined. During network training, each car is classified into m classes according to size. The network outputs predicted B target frames, each of which contains x and y positions of the target center, height h, width w, and corresponding category information, as well as probability of the category to which the target belongs and target confidence. Further, an average value of the nearest a-cell speeds of the target center is calculated as the measured speed of the target.

And S3, detecting a camera image target.

And (3) performing target detection on the camera image by using a neural network, extracting information such as the position, the speed, the size and the like of the target from the image, and detecting attributes such as category, color, brand and the like.

And converting the camera target into a radar coordinate system, and ensuring that the data are in the same coordinate system.

In this embodiment, the neural network YoloV5 is used to detect a two-dimensional object in a camera image, so as to detect and locate a target in the image. The neural network outputs predicted C target frames, each of which includes x and y positions of a target center, a height h, a width w, and corresponding category information, as well as probability of a category to which the target belongs and a target confidence. Then, the pixels of the target frame are tracked between adjacent video frames by using a Lucas-Kanade optical flow algorithm, and the motion speed of the target is estimated according to the displacement of the pixels. And finally, converting the image target detection information into a radar coordinate system according to the coordinate transformation matrix calculated in the step S1.

The vehicle target in the image is detected using the neural network YoloV5 with the detected vehicle target as input. The output of the neural network yolv 5 contains a plurality of heads, each head corresponding to a prediction of a target attribute. The output of the neural network yolv 5 may include a plurality of prediction results, each corresponding to a prediction of a target attribute, and finally obtain attribute information such as a category, a color, a brand, and the like of the vehicle.

And S4, calculating the association of the radar targets.

In a radar coordinate system, the length, the width and the speed of a target in target information detected by a radar and a camera are expressed as the length, the width and the height of a three-dimensional feature cube, the intersection ratio of the three-dimensional feature cube is calculated, and the target association degree is calculated according to the intersection ratio.

In the present embodiment, the lateral direction of the plane in which the road is located is used as the x-axis and the longitudinal direction is used as the y-axis, wherein in particular, the target information detected by the radar and the camera is represented as a three-dimensional feature cube with the speed corresponding to the target as the z-axis.

For the radar-detected target information, the bottom center point coordinates (Px 1, py 1) of each target frame are taken as the positions of the radar feature cube CU1, the length Δy1 and width Δx1 of the target frame are taken as the length L1 and width W1 of the feature cube, and the target frame corresponding speed v1 is taken as the height H1 of the feature cube.

For the target information detected by the camera, the coordinates (Px 2, py 2) of the bottom center point of each target frame in the radar coordinate system are used as the camera feature cube CU2 position, the length Δy2 and width Δx2 of the target frame in the radar coordinate system are used as the length L2 and width W2 of the feature cube, and the target frame corresponding speed v2 is used as the height H2 of the feature cube.

Then, the association degree between the targets is calculated according to the intersection ratio (IOU) of the feature cubes, and the formula is as follows:

wherein U is the union of the feature cubes of the thunder, and I is the intersection of the feature cubes of the thunder.

When the IOU is larger than the threshold th, the radar and the video detection targets are determined to be associated targets, and rapid and effective target association is realized by calculating the intersection ratio of the feature cubes.

And S5, fusing the target two-source information and converting the video.

And carrying out two-source information fusion on the associated target information, removing false targets and redundant information, converting the false targets and redundant information into video, and displaying the video on a user interface, so that a user can intuitively know the detected traffic target information.

In this embodiment, the target pair with high relevance is screened out through the target relevance, and is considered as a true relevance target. And eliminating radar targets corresponding to the low association degree as false targets. In the case where multiple adjacent radar, video targets are interrelated, they may be combined and deduplicated as duplicate targets, leaving only the target with the highest confidence.

And for the target corresponding to the associated target, taking the position and speed information detected by the radar as the real position and speed of the target, and adding the size, color and brand attribute detected by the camera into the target attribute. All attribute information and motion states of the targets are converted into videos to be intuitively displayed on a user interface, so that a user can better know the environment and motion states of the targets and the motion conditions of the targets on a road, and more comprehensive and accurate information is provided for traffic monitoring and management.

It should be noted that, in the embodiments of the present invention, the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are merely for convenience in describing the embodiments, and do not indicate or imply that the apparatus or element being referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communicated with the inside of two elements or the interaction relationship of the two elements. The specific meaning of the above terms in the present invention can be understood by those of ordinary skill in the art according to the specific circumstances.

While the present invention has been described in detail through the foregoing description of the preferred embodiment, it should be understood that the foregoing description is not to be considered as limiting the invention. Many modifications and substitutions of the present invention will become apparent to those of ordinary skill in the art upon reading the foregoing. Accordingly, the scope of the invention should be limited only by the attached claims.

Claims

1. The visual fusion detection method of the traffic multi-element radar is characterized by comprising the following steps of:

2. The visual fusion detection method of the traffic multi-element radar according to claim 1, wherein in the step S1, a 4D millimeter wave radar and a 4K resolution camera are selected as sensors to acquire information of traffic scenes, and the radar and the camera are fixed at similar positions on a traffic rod to realize the coincidence degree of fields of view and cover all traffic areas to be observed.

3. The visual fusion detection method of the traffic multi-element radar according to claim 2, wherein in the step S1, a Zhang calibration algorithm is adopted to calibrate the camera, the internal reference and the external reference of the camera are estimated, the external reference calibration of the radar and the video equipment is performed by using angle inversion, the coordinates of the angle inversion under the radar and the video coordinate system are detected for a plurality of times, a coordinate transformation matrix for converting the points in the camera coordinate system into the radar coordinate system is obtained through calculation, and the coordinates of the pixel points are unified into the radar coordinate system.

4. The traffic multi-element radar vision fusion detection method according to claim 3, wherein in step S1, NTP is used for timing, so as to ensure that the time of the radar and the video device is synchronized with the NTP time.

5. The visual fusion detection method of traffic multi-element radar according to claim 1, wherein in the step S2, radar point cloud data is converted into a grid having a fixed height h, width w and cell size S, each cell of the grid contains three channels, which represent the reflection section RCS, x-direction speed and y-direction speed of radar points, respectively, and if a plurality of radar points fall into the same cell, the three channel values of the cell take the average value of the radar points, and if a certain cell is not mapped to a radar point, the three channel values of the cell are set to zero.

6. The visual fusion detection method of traffic multi-element radar according to claim 5, wherein in the step S2, a neural network yolv 5 is used to detect two-dimensional objects in radar grid images, each vehicle is classified into m classes according to size when the neural network yolv 5 is trained, the neural network yolv 5 outputs predicted B target frames, each target frame includes x and y positions of a target center, a height h, a width w and corresponding class information, probability of a class to which the target belongs and a target confidence are included, and an average value of a-class speeds nearest to the target center is calculated as a measurement speed of the target.

7. The visual fusion detection method of traffic multi-element radar according to claim 1, wherein in the step S3, the detected vehicle target is taken as input, a neural network YoloV5 is used to detect a two-dimensional object in a camera image, so as to detect and locate the vehicle target in the image, the neural network outputs predicted C target frames, each of which contains x and y positions of a target center, a height h, a width w and corresponding category information, and also contains probability and target confidence of the category to which the target belongs, and between adjacent video frames, a Lucas-Kanade optical flow algorithm is used to track pixels of the target frame, and the movement speed of the target is estimated according to the displacement of the pixels.

8. The visual fusion detection method of the traffic multi-element radar according to claim 1, wherein in the step S4, the transverse direction of the plane where the road is located is used as an x-axis, the longitudinal direction is used as a y-axis, the speed corresponding to the target is used as a z-axis, and the target information detected by the radar and the camera is represented as a three-dimensional feature cube;

9. The visual fusion detection method of the traffic multi-element radar according to claim 1, wherein in the step S5, a target pair with high association degree is screened out through the association degree of the target, and is considered as a real association target; the radar targets corresponding to the low association degree are removed as false targets; in the case that multiple adjacent radar and video targets are associated with each other, they are combined and de-duplicated as duplicate targets, and only the target with the highest confidence is reserved.

10. The method for detecting the visual fusion of the traffic multi-element radar according to claim 9, wherein in the step S5, for the target corresponding to the associated target, the position and speed information detected by the radar is taken as the real position and speed of the target, the size, color and brand attribute detected by the camera are added into the target attribute, and all the attribute information and motion state of the target are converted into video to be visually displayed on the user interface.