CN116778290A

CN116778290A - Radar vision data association method based on deep learning algorithm

Info

Publication number: CN116778290A
Application number: CN202310734115.2A
Authority: CN
Inventors: 李小柳; 尹洁珺; 魏维伟; 付朝伟; 席光荣; 柯文雄; 郑成鑫; 李由之
Original assignee: Shanghai Radio Equipment Research Institute
Current assignee: Shanghai Radio Equipment Research Institute
Priority date: 2023-06-20
Filing date: 2023-06-20
Publication date: 2023-09-19

Abstract

The application provides a radar vision data association method based on a deep learning algorithm, which comprises the following steps: s1, acquiring a current radar target track and a visual image frame, and fusing the track in history; s2, inputting the current visual image frame into a centrfusion network to obtain a visual target detection frame for visual detection and identification, and obtaining the position of a visual target under a radar coordinate system based on a space synchronization back projection mechanism; s3, calculating the motion, scale and appearance similarity of the radar target and the visual target, and presetting corresponding weight coefficients for the similarity to obtain a first association similarity; s4, performing secondary matching based on the history fusion track and the radar and visual targets, and filtering out false alarm targets; s5, updating the weight coefficient based on the size of the visual target to obtain a second association similarity of the radar target and the visual target, establishing a corresponding radar-visual association pair, and updating a history fusion track; and (5) entering the next moment, and repeating the steps S1 to S5.

Description

Radar vision data association method based on deep learning algorithm

Technical Field

The application relates to the technical field of radar visual information fusion, in particular to a radar visual data association method based on a deep learning algorithm.

Background

As a breakthrough point for improving the sensing capability of the detection system, the radar camera information fusion technology can fully exert the advantages of each sensor, realize information complementation, make up the performance limit of a single sensor, acquire more stable and reliable environment compatible information, and has wide prospect in multiple fields such as military, civil use and the like.

In radar-video traffic fusion awareness systems, an important issue is how to determine whether output information from different local nodes is directed to the same target, i.e. a data association problem. In contrast, in the prior art, a radar frame closest to the current video frame in time sequence is searched to be used as a radar frame matched with the current video frame, and the current video frame and the matched radar frame are fused. The method does not fully utilize the data characteristics of the sensor, is easily influenced by the working state of the single sensor, and has poor correlation anti-interference capability. In another method, the radar echo signals and the target gesture recognition of the video monitoring are associated and matched, and matching fusion of the millimeter wave radar and the video target is achieved. The method is only suitable for scenes with large difference of target motion states and obvious change, and the matching effect on the complex situations of large clutter interference and scenes is greatly reduced. In addition, in a dense and crowded target scene, the difficulty of data association of various sensors is greatly increased, and the performance development of a fusion perception system is limited.

Therefore, how to provide a radar visual data association method to achieve better fusion perception performance becomes one of the key difficulties in the field.

Disclosure of Invention

The application aims to provide a radar visual data association method based on a deep learning algorithm, which is used for extracting multiple types of characteristics of radar and visual targets based on a central fsuion network architecture, designing a back projection mechanism to project visual detection information to a radar coordinate system, amplifying the position change of the visual targets, and carrying out multi-source data association by combining the radar and visual continuous frame matching information. A large number of false alarm targets are filtered through the first-stage association, then different characteristic weight coefficients are set for targets with different sizes, and a more accurate association result is output in the second-stage association, so that false association in dense scenes is avoided. The method has stronger scene adaptability and higher robustness in the scenes such as false alarm omission, dense targets and the like.

In order to achieve the above object, the present application provides a radar vision data association method based on a deep learning algorithm, comprising the steps of:

s1, acquiring a history fusion track, a current radar target track and a visual image frame; the history fusion track comprises a radar vision associated target track, a radar target track and a vision target track at the last moment;

s2, inputting the current visual image frame into a centrfusion network, and outputting a visual target detection frame identified by visual detection; a back projection mechanism is arranged, and the visual target detection frame is back projected to a radar coordinate system to obtain the position of the corresponding visual target under the radar coordinate system;

s3, calculating the motion, scale and appearance similarity of the radar target and the visual target, presetting corresponding weight coefficients for the motion, scale and appearance similarity, and calculating to obtain first association similarity of the radar target and the visual target;

s4, if the first association similarity is larger than a set first association threshold, estimating an updated position of a corresponding radar target based on the history fusion track, matching the updated position with the actual position of the corresponding radar target in the history fusion track, and filtering a false alarm target according to a matching result;

s5, respectively updating corresponding weight coefficients for the motion, the scale and the appearance similarity based on the size of the visual target, and updating the first association similarity into the second association similarity; if the second association similarity is larger than a set second association threshold, establishing a corresponding radar-vision association pair based on the corresponding radar target and the vision target, and updating a history fusion track based on the corresponding radar target; and (5) entering the next moment, and repeating the steps S1 to S5.

Optionally, in the step S1, a trace of a radar target is generated after the echo signal received by the current radar sensor is subjected to signal processing; and obtaining a current radar target track set R based on the point track through a joint probability data association algorithm and a Kalman filtering algorithm _t ， The current ith radar target track is the current ith radar target track, and m is the total track number in the current radar target track; t represents the number of frames of the echo signal;

the set of historical fusion trajectories is denoted as F _t-1 ， Is F _t-1 The kth historical fusion track in (a) and p is the total track number of the historical fusion track set.

Optionally, the step S2 includes:

s21, inputting the current visual image frame into a centrfusion network to obtain a visual target set marked by a visual target detection frame Representing the j-th identified visual target, wherein n is the total number of the identified visual targets;

s22, using equipment erection height H _dev Acquisition of target spatial altitude-H as a priori information _dev +H _obj /2，H _obj The target is high; obtaining a position conversion matrix between vision and radar sensors through internal reference calibration and three-dimensional coordinate conversion of the vision sensors; based on the position conversion matrix, visual target is obtainedThe lower edge of the visual target detection frame of (2) is used as a reference point to be projected to a radar coordinate system, so as to obtain a visual target +.>Back projection position in radar coordinate system +.> Visual target->The back projection is at the transverse position and the longitudinal position of the radar coordinate system.

Optionally, the step S3 includes:

s31, obtaining the motion similarity phi of the radar target and the visual target based on the position of the radar target and the back projection position of the visual target in the radar coordinate system ^motion (r _t ⁱ ,v _t ^j )：

Respectively represent the track r _t ⁱ The corresponding radar target is positioned horizontally and longitudinally under the radar coordinate system,respectively represent visual target +.>The width and the height of the visual target detection frame;

s32, calculating a visual targetTrack in fusion with history +.>Scale similarity +.>

Respectively express +.>The width and the length of a visual target detection frame of the associated visual target;respectively represent visual target +.>The width and the length of the visual target detection frame;

s33, calculating the visual target by using the Pasteur distance by taking the gray level histogram of the visual target as an appearance characteristicTrack in fusion with history +.>Appearance similarity between->

Respectively indicate->Is a gray histogram feature of (a);

s34, setting corresponding weight coefficient beta for the motion, scale and appearance similarity ₁ 、β ₂ 、β ₃ Respectively giving preset values to obtain an incidence matrixWherein visual objective->And track->First associated similarity of corresponding radar targetsExpressed as:

optionally, step S4 includes:

s41, if the first association similarity is larger than a set first association threshold, performing secondary association matching between the visual target and the history fusion track; estimating trajectories using a Kalman filtering algorithmUpdate position of corresponding radar target +.>Track->Is t', +.>Respectively, t' is represented by the track +.>A predicted lateral and longitudinal position;

s42, if the following relation is satisfied, the track is setPutting the corresponding radar target and visual target into a pool to be associated, otherwise taking the radar target and the visual target as false alarm targets and filtering:

for the lateral and longitudinal positions of the current visual target back projected under the radar coordinate system, thre_x, thre_y and thre_t represent the matching thresholds in terms of lateral position, longitudinal position and interval time respectively.

Optionally, step S5 includes:

s51, dividing the visual targets in the pool to be associated into a large target, a middle target and a small target based on the duty ratio of the visual targets in the visual image frame, and respectively updating the weight coefficients beta of the corresponding motion, scale and appearance similarity for the large target, the middle target and the small target ₁ 、β ₂ 、β ₃ The method comprises the steps of carrying out a first treatment on the surface of the Based on updated beta ₁ 、β ₂ 、β ₃ Calculating a second association similarity of the visual target and the radar target in the pool to be associated;

s52, screening the visual targets and the radar targets with the second association similarity larger than a second association threshold by using a Hungary algorithm to obtain one-to-one corresponding radar-visual association pairs;

s53, fusing corresponding radar features and visual features based on a centrfusion network and a radar-visual association pair, regressively outputting target information, and storing regressively output results into corresponding historical fusion tracks to serve as network input information at the next moment; the target information includes any one or more of a position, a speed, a moving direction, and a size of the target.

Optionally, in step S34, β ₁ 、β ₂ 、β ₃ The preset values of (2) are 1/3.

Optionally, in step S51, let σ be the pixel value of the visual target and Size be the pixel value of the visual image frame; sigma (sigma) ₁ ＝Size/100，σ ₂ = (3 x size)/100, when σ < σ ₁ For small target, if sigma ₁ ≤σ＜σ ₂ As the middle target, if σ is equal to or greater than σ ₂ Is largeTarget, updated beta ₁ 、β ₂ 、β ₃ The following is shown:

compared with the prior art, the application has the beneficial effects that:

1) The radar visual data association method based on the deep learning algorithm adopts a multi-feature fusion method, projects a radar target to a visual image frame based on a CenterFsuion network, and back projects the visual target with a visual target detection frame under a radar coordinate system to amplify the position change of the visual target. In the first-stage association, calculating the motion, scale and appearance similarity of the visual target and the radar target, and respectively giving corresponding weight coefficients to the similarity to obtain the first-stage association similarity. Based on the first association similarity and the history fusion track, the method is matched with the visual targets in space-time dimension, false alarm targets are filtered according to the matching result, the anti-interference capability of radar visual data association is improved, and the problem that association accuracy is reduced due to a large number of targets and false detection is solved.

2) According to different duty ratios of the visual targets in the visual image frames, the application updates the weight coefficients of the corresponding motion, scale and appearance similarity for the visual targets according to the size difference, and obtains more accurate second association similarity in the second-stage association, so as to obtain one-to-one stable radar-visual association pairs, and avoid error association in dense scenes. And the problem of confusion of track distribution when the radar targets are highly similar and blocked is solved.

3) The application combines deep learning feature extraction algorithm and history fusion trackAnd multi-source data association is carried out, so that the adaptability of the data association scene is stronger, and the accuracy of the data association is greatly improved.

Drawings

For a clearer description of the technical solutions of the present application, the drawings that are needed in the description will be briefly introduced below, it being obvious that the drawings in the following description are one embodiment of the present application, and that, without inventive effort, other drawings can be obtained by those skilled in the art from these drawings:

fig. 1 and fig. 2 are flowcharts of a radar vision data association method based on a deep learning algorithm in an embodiment of the present application;

FIG. 3 is a schematic diagram of projecting a radar target onto a visual image frame according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a back projection of a visual target onto a radar coordinate system according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a fusion output radar-vision correlation pair under a radar coordinate system in an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It is also to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".

In addition, in the description of the present application, the terms "first," "second," "third," etc. are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

The application provides a radar vision data association method based on a deep learning algorithm, which is shown in fig. 1 and 2 and comprises the following steps:

s1, acquiring a current radar target track and a visual image frame, and fusing the track in history;

in the step S1, the echo signal received by the radar sensor is processed to generate a point trace of the radar target; and obtaining a current radar target track set R based on the point track through a joint probability data association algorithm and a Kalman filtering algorithm _t ， The current ith radar target track is the current ith radar target track, and m is the total track number in the current radar target track; t represents the number of frames.

The set of historical fusion trajectories is denoted as F _t-1 ， Is F _t-1 The kth historical fusion track in (a) and p is the total track number of the historical fusion track set. In this embodiment, <' > a->Containing a trajectory at t-1 frames corresponding to a radar target having an associated visual target.

S2, inputting the current radar target track and the visual image frame into a centrfusion network, and outputting a visual target detection frame identified by visual detection; and setting a back projection mechanism, and back projecting the visual target detection frame of the visual target to a radar coordinate system to obtain the position of the corresponding visual target back projection under the radar coordinate system.

Step S2 includes:

s21, inputting the current visual image frame into a centrfusion network to obtain a visual target set marked by a visual target detection frame Represents the j-th visual target identified, and n is the total number of the visual targets identified. According to the mature multi-sensor space synchronization technology, radar detection points are projected to a pixel space, and the projection situation is schematically shown in fig. 3.

S22, using equipment erection height H _dev Acquisition of target spatial altitude-H as a priori information _dev +H _obj /2，H _obj The target itself is high. Based on the existing space synchronization technology, a position conversion matrix between the vision sensor and the radar sensor is obtained through internal parameter calibration and three-dimensional coordinate conversion of the vision sensor; based on the position conversion matrix, visual target is obtainedIs used as a reference point at the lower edge of the visual target detection frameProjected onto the radar coordinate system, as shown in FIG. 4, a visual target is obtained>Back projection position in radar coordinate system +.> Visual target->The back projection is at the transverse position and the longitudinal position of the radar coordinate system.

S3, calculating the motion, scale and appearance similarity of the radar target and the visual target, presetting corresponding weight coefficients for the motion, scale and appearance similarity, and calculating to obtain the first association similarity of the radar target and the visual target.

Step S3 includes:

Respectively indicate->Is a gray histogram feature of (a);

s34, setting corresponding weight coefficient beta for the motion, scale and appearance similarity ₁ 、β ₂ 、β ₃ Respectively giving preset values to obtain an incidence matrixWherein visual objective->And track->First associated similarity of corresponding radar target +.>Expressed as:

in this embodiment, beta ₁ 、β ₂ 、β ₃ The preset values of (2) are 1/3.

And S4, if the first association similarity is larger than a set first association threshold, estimating an updated position of the corresponding radar target based on the history fusion track, matching the updated position with the actual position of the corresponding radar target in the history fusion track, and filtering the false alarm target according to a matching result.

Step S4 includes:

s41, if the first association similarity is larger than a set first association threshold, performing secondary association matching between the visual target and the history fusion track. Estimating trajectories using a Kalman filtering algorithmUpdate position of corresponding radar target +.>Track->Is t', +.>Respectively, t' is represented by the track +.>A predicted lateral and longitudinal position;

The first-stage association of the radar target and the visual target is completed through the steps S3 and S4, and the interference introduced by the detection of the false alarm target is eliminated. In step S3, three types of characteristics including motion, scale and appearance similarity are extracted by using a centrfusion network, and three weight coefficients are added for carrying out radar-visual data association. And S4, acquiring continuous frame track information for estimating the target position of the current frame, matching the current detection value in the space-time dimension, and filtering the false alarm target according to the association result.

Step S5 includes:

in this embodiment, let σ be the pixel value of the visual target and Size be the pixel value of the visual image frame; sigma (sigma) ₁ ＝Size/100，σ ₂ = (3 x size)/100, when σ < σ ₁ For small target, if sigma ₁ ≤σ＜σ ₂ As the middle target, if σ is equal to or greater than σ ₂ For a large target, then updated beta ₁ 、β ₂ 、β ₃ The following is shown:

s53, fusing corresponding radar features and visual features of radar-visual association pairs based on a centrfusion network, regressively outputting information such as the position, the speed, the moving direction and the size of a corresponding target, and using the regressively output result to generate a history fusion track at the next moment.

In order to avoid the influence of the feature mutation on the overall similarity, the targets are divided according to the size according to the result of the first stage association, different weight coefficients are set for different targets, the false association in a dense scene is avoided, and a more accurate association result is output.

And collecting traffic data of the road section for 10 minutes, performing data association processing to obtain related evaluation index pairs shown in a table 1, and fusing output results according to association conditions as shown in fig. 5. Compared with the traditional data association, namely the Hungary algorithm and the central fusion network detection point association, the method disclosed by the application has the advantages that the effect of continuous frame tracks and the multi-class characteristic two-stage association are added, the association result is more stable and accurate under the shielding and dense scenes, the phenomenon of confusion of fusion tracks is greatly avoided, and the fusion perception system is more stable and reliable.

Table 1 analysis of the lightning correlation situation

Through the embodiment, the accurate association of the radar and the video target data is realized, the stability of a radar-video fusion system is enhanced, the adaptability of data association is stronger, the method has good processing output for shielding, jumping and other conditions in dense scenes, and the method has multiple advantages.

It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.

While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims

1. The radar vision data association method based on the deep learning algorithm is characterized by comprising the following steps:

s1, acquiring a history fusion track, a current radar target track and a visual image frame;

s2, inputting the current visual image frame into a centrfusion network, and outputting a visual target detection frame for detection and identification; a back projection mechanism is arranged, and the visual target detection frame is back projected to a radar coordinate system to obtain the position of the corresponding visual target under the radar coordinate system;

2. The method for correlating radar visual data based on the deep learning algorithm according to claim 1, wherein in the step S1, the trace of the radar target is generated after the echo signal received by the current radar sensor is subjected to signal processing; and obtaining a current radar target track set R based on the point track through a joint probability data association algorithm and a Kalman filtering algorithm _t ，R _t ＝{r _t ⁱ } _i∈[1,m] ，r _t ⁱ The current ith radar target track is the current ith radar target track, and m is the total track number in the current radar target track; t represents the number of frames of the echo signal;

3. The method for correlating radar visual data based on the deep learning algorithm according to claim 1, wherein the step S2 comprises:

s22, using equipment erection height H _dev Acquisition of target space height as a priori informationH _dev +H _obj /2，H _obj The target is high; obtaining a position conversion matrix between vision and radar sensors through internal reference calibration and three-dimensional coordinate conversion of the vision sensors; based on the position conversion matrix, visual target is obtainedThe lower edge of the visual target detection frame of (2) is used as a reference point to be projected to a radar coordinate system, so as to obtain a visual target +.>Back projection position in radar coordinate system +.> Visual target->The back projection is at the transverse position and the longitudinal position of the radar coordinate system.

4. The method for correlating radar visual data based on a deep learning algorithm according to claim 3, wherein the step S3 comprises:

Respectively express +.>The width and the length of a visual target detection frame of the associated visual target; />Respectively represent visual target +.>The width and the length of the visual target detection frame;

Respectively indicate->Is a gray histogram feature of (a);

s34, setting corresponding weight coefficient beta for the motion, scale and appearance similarity ₁ 、β ₂ 、β ₃ Respectively giving preset values to obtain an incidence matrixWherein visual objective->And trajectory r _t ⁱ First associated similarity of corresponding radar target +.>Expressed as:

5. the method of correlating radar visual data based on a deep learning algorithm according to claim 4, wherein the step S4 comprises:

s41, if the first association similarity is larger than a set first association threshold, performing secondary association matching between the radar and visual detection targets and the history fusion track; estimating trajectories using a Kalman filtering algorithmUpdated position of corresponding radar targetTrack->Is t', +.>Respectively represent the trajectories under the radar coordinate system at tA predicted lateral and longitudinal position;

the transverse position, the longitudinal position and the interval time matching threshold values are respectively indicated for the transverse position, the longitudinal position and the interval time of the radar of the current frame and the transverse position and the longitudinal position of the visual detection target under the radar coordinate system.

6. The method of correlating radar visual data based on a deep learning algorithm according to claim 5, wherein the step S5 comprises:

7. The method of correlating radar visual data based on a deep learning algorithm according to claim 4, wherein β in step S34 ₁ 、β ₂ 、β ₃ The preset values of (2) are 1/3.

8. The method of correlating radar visual data based on a deep learning algorithm according to claim 4, wherein in step S51, σ is made to be a pixel value of a visual targetLet Size be the visual image frame pixel value; sigma (sigma) ₁ ＝Size/100，σ ₂ = (3 x size)/100, when σ < σ ₁ For small target, if sigma ₁ ≤σ＜σ ₂ As the middle target, if σ is equal to or greater than σ ₂ For a large target, then updated beta ₁ 、β ₂ 、β ₃ The following is shown: