CN116778290A - Radar vision data association method based on deep learning algorithm - Google Patents

Radar vision data association method based on deep learning algorithm Download PDF

Info

Publication number
CN116778290A
CN116778290A CN202310734115.2A CN202310734115A CN116778290A CN 116778290 A CN116778290 A CN 116778290A CN 202310734115 A CN202310734115 A CN 202310734115A CN 116778290 A CN116778290 A CN 116778290A
Authority
CN
China
Prior art keywords
target
visual
radar
track
association
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310734115.2A
Other languages
Chinese (zh)
Inventor
李小柳
尹洁珺
魏维伟
付朝伟
席光荣
柯文雄
郑成鑫
李由之
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Radio Equipment Research Institute
Original Assignee
Shanghai Radio Equipment Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Radio Equipment Research Institute filed Critical Shanghai Radio Equipment Research Institute
Priority to CN202310734115.2A priority Critical patent/CN116778290A/en
Publication of CN116778290A publication Critical patent/CN116778290A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • G01S13/867Combination of radar systems with cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Medical Informatics (AREA)
  • Remote Sensing (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The application provides a radar vision data association method based on a deep learning algorithm, which comprises the following steps: s1, acquiring a current radar target track and a visual image frame, and fusing the track in history; s2, inputting the current visual image frame into a centrfusion network to obtain a visual target detection frame for visual detection and identification, and obtaining the position of a visual target under a radar coordinate system based on a space synchronization back projection mechanism; s3, calculating the motion, scale and appearance similarity of the radar target and the visual target, and presetting corresponding weight coefficients for the similarity to obtain a first association similarity; s4, performing secondary matching based on the history fusion track and the radar and visual targets, and filtering out false alarm targets; s5, updating the weight coefficient based on the size of the visual target to obtain a second association similarity of the radar target and the visual target, establishing a corresponding radar-visual association pair, and updating a history fusion track; and (5) entering the next moment, and repeating the steps S1 to S5.

Description

Radar vision data association method based on deep learning algorithm
Technical Field
The application relates to the technical field of radar visual information fusion, in particular to a radar visual data association method based on a deep learning algorithm.
Background
As a breakthrough point for improving the sensing capability of the detection system, the radar camera information fusion technology can fully exert the advantages of each sensor, realize information complementation, make up the performance limit of a single sensor, acquire more stable and reliable environment compatible information, and has wide prospect in multiple fields such as military, civil use and the like.
In radar-video traffic fusion awareness systems, an important issue is how to determine whether output information from different local nodes is directed to the same target, i.e. a data association problem. In contrast, in the prior art, a radar frame closest to the current video frame in time sequence is searched to be used as a radar frame matched with the current video frame, and the current video frame and the matched radar frame are fused. The method does not fully utilize the data characteristics of the sensor, is easily influenced by the working state of the single sensor, and has poor correlation anti-interference capability. In another method, the radar echo signals and the target gesture recognition of the video monitoring are associated and matched, and matching fusion of the millimeter wave radar and the video target is achieved. The method is only suitable for scenes with large difference of target motion states and obvious change, and the matching effect on the complex situations of large clutter interference and scenes is greatly reduced. In addition, in a dense and crowded target scene, the difficulty of data association of various sensors is greatly increased, and the performance development of a fusion perception system is limited.
Therefore, how to provide a radar visual data association method to achieve better fusion perception performance becomes one of the key difficulties in the field.
Disclosure of Invention
The application aims to provide a radar visual data association method based on a deep learning algorithm, which is used for extracting multiple types of characteristics of radar and visual targets based on a central fsuion network architecture, designing a back projection mechanism to project visual detection information to a radar coordinate system, amplifying the position change of the visual targets, and carrying out multi-source data association by combining the radar and visual continuous frame matching information. A large number of false alarm targets are filtered through the first-stage association, then different characteristic weight coefficients are set for targets with different sizes, and a more accurate association result is output in the second-stage association, so that false association in dense scenes is avoided. The method has stronger scene adaptability and higher robustness in the scenes such as false alarm omission, dense targets and the like.
In order to achieve the above object, the present application provides a radar vision data association method based on a deep learning algorithm, comprising the steps of:
s1, acquiring a history fusion track, a current radar target track and a visual image frame; the history fusion track comprises a radar vision associated target track, a radar target track and a vision target track at the last moment;
s2, inputting the current visual image frame into a centrfusion network, and outputting a visual target detection frame identified by visual detection; a back projection mechanism is arranged, and the visual target detection frame is back projected to a radar coordinate system to obtain the position of the corresponding visual target under the radar coordinate system;
s3, calculating the motion, scale and appearance similarity of the radar target and the visual target, presetting corresponding weight coefficients for the motion, scale and appearance similarity, and calculating to obtain first association similarity of the radar target and the visual target;
s4, if the first association similarity is larger than a set first association threshold, estimating an updated position of a corresponding radar target based on the history fusion track, matching the updated position with the actual position of the corresponding radar target in the history fusion track, and filtering a false alarm target according to a matching result;
s5, respectively updating corresponding weight coefficients for the motion, the scale and the appearance similarity based on the size of the visual target, and updating the first association similarity into the second association similarity; if the second association similarity is larger than a set second association threshold, establishing a corresponding radar-vision association pair based on the corresponding radar target and the vision target, and updating a history fusion track based on the corresponding radar target; and (5) entering the next moment, and repeating the steps S1 to S5.
Optionally, in the step S1, a trace of a radar target is generated after the echo signal received by the current radar sensor is subjected to signal processing; and obtaining a current radar target track set R based on the point track through a joint probability data association algorithm and a Kalman filtering algorithm t The current ith radar target track is the current ith radar target track, and m is the total track number in the current radar target track; t represents the number of frames of the echo signal;
the set of historical fusion trajectories is denoted as F t-1 Is F t-1 The kth historical fusion track in (a) and p is the total track number of the historical fusion track set.
Optionally, the step S2 includes:
s21, inputting the current visual image frame into a centrfusion network to obtain a visual target set marked by a visual target detection frame Representing the j-th identified visual target, wherein n is the total number of the identified visual targets;
s22, using equipment erection height H dev Acquisition of target spatial altitude-H as a priori information dev +H obj /2,H obj The target is high; obtaining a position conversion matrix between vision and radar sensors through internal reference calibration and three-dimensional coordinate conversion of the vision sensors; based on the position conversion matrix, visual target is obtainedThe lower edge of the visual target detection frame of (2) is used as a reference point to be projected to a radar coordinate system, so as to obtain a visual target +.>Back projection position in radar coordinate system +.> Visual target->The back projection is at the transverse position and the longitudinal position of the radar coordinate system.
Optionally, the step S3 includes:
s31, obtaining the motion similarity phi of the radar target and the visual target based on the position of the radar target and the back projection position of the visual target in the radar coordinate system motion (r t i ,v t j ):
Respectively represent the track r t i The corresponding radar target is positioned horizontally and longitudinally under the radar coordinate system,respectively represent visual target +.>The width and the height of the visual target detection frame;
s32, calculating a visual targetTrack in fusion with history +.>Scale similarity +.>
Respectively express +.>The width and the length of a visual target detection frame of the associated visual target;respectively represent visual target +.>The width and the length of the visual target detection frame;
s33, calculating the visual target by using the Pasteur distance by taking the gray level histogram of the visual target as an appearance characteristicTrack in fusion with history +.>Appearance similarity between->
Respectively indicate->Is a gray histogram feature of (a);
s34, setting corresponding weight coefficient beta for the motion, scale and appearance similarity 1 、β 2 、β 3 Respectively giving preset values to obtain an incidence matrixWherein visual objective->And track->First associated similarity of corresponding radar targetsExpressed as:
optionally, step S4 includes:
s41, if the first association similarity is larger than a set first association threshold, performing secondary association matching between the visual target and the history fusion track; estimating trajectories using a Kalman filtering algorithmUpdate position of corresponding radar target +.>Track->Is t', +.>Respectively, t' is represented by the track +.>A predicted lateral and longitudinal position;
s42, if the following relation is satisfied, the track is setPutting the corresponding radar target and visual target into a pool to be associated, otherwise taking the radar target and the visual target as false alarm targets and filtering:
for the lateral and longitudinal positions of the current visual target back projected under the radar coordinate system, thre_x, thre_y and thre_t represent the matching thresholds in terms of lateral position, longitudinal position and interval time respectively.
Optionally, step S5 includes:
s51, dividing the visual targets in the pool to be associated into a large target, a middle target and a small target based on the duty ratio of the visual targets in the visual image frame, and respectively updating the weight coefficients beta of the corresponding motion, scale and appearance similarity for the large target, the middle target and the small target 1 、β 2 、β 3 The method comprises the steps of carrying out a first treatment on the surface of the Based on updated beta 1 、β 2 、β 3 Calculating a second association similarity of the visual target and the radar target in the pool to be associated;
s52, screening the visual targets and the radar targets with the second association similarity larger than a second association threshold by using a Hungary algorithm to obtain one-to-one corresponding radar-visual association pairs;
s53, fusing corresponding radar features and visual features based on a centrfusion network and a radar-visual association pair, regressively outputting target information, and storing regressively output results into corresponding historical fusion tracks to serve as network input information at the next moment; the target information includes any one or more of a position, a speed, a moving direction, and a size of the target.
Optionally, in step S34, β 1 、β 2 、β 3 The preset values of (2) are 1/3.
Optionally, in step S51, let σ be the pixel value of the visual target and Size be the pixel value of the visual image frame; sigma (sigma) 1 =Size/100,σ 2 = (3 x size)/100, when σ < σ 1 For small target, if sigma 1 ≤σ<σ 2 As the middle target, if σ is equal to or greater than σ 2 Is largeTarget, updated beta 1 、β 2 、β 3 The following is shown:
compared with the prior art, the application has the beneficial effects that:
1) The radar visual data association method based on the deep learning algorithm adopts a multi-feature fusion method, projects a radar target to a visual image frame based on a CenterFsuion network, and back projects the visual target with a visual target detection frame under a radar coordinate system to amplify the position change of the visual target. In the first-stage association, calculating the motion, scale and appearance similarity of the visual target and the radar target, and respectively giving corresponding weight coefficients to the similarity to obtain the first-stage association similarity. Based on the first association similarity and the history fusion track, the method is matched with the visual targets in space-time dimension, false alarm targets are filtered according to the matching result, the anti-interference capability of radar visual data association is improved, and the problem that association accuracy is reduced due to a large number of targets and false detection is solved.
2) According to different duty ratios of the visual targets in the visual image frames, the application updates the weight coefficients of the corresponding motion, scale and appearance similarity for the visual targets according to the size difference, and obtains more accurate second association similarity in the second-stage association, so as to obtain one-to-one stable radar-visual association pairs, and avoid error association in dense scenes. And the problem of confusion of track distribution when the radar targets are highly similar and blocked is solved.
3) The application combines deep learning feature extraction algorithm and history fusion trackAnd multi-source data association is carried out, so that the adaptability of the data association scene is stronger, and the accuracy of the data association is greatly improved.
Drawings
For a clearer description of the technical solutions of the present application, the drawings that are needed in the description will be briefly introduced below, it being obvious that the drawings in the following description are one embodiment of the present application, and that, without inventive effort, other drawings can be obtained by those skilled in the art from these drawings:
fig. 1 and fig. 2 are flowcharts of a radar vision data association method based on a deep learning algorithm in an embodiment of the present application;
FIG. 3 is a schematic diagram of projecting a radar target onto a visual image frame according to an embodiment of the present application;
FIG. 4 is a schematic diagram of a back projection of a visual target onto a radar coordinate system according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a fusion output radar-vision correlation pair under a radar coordinate system in an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
In addition, in the description of the present application, the terms "first," "second," "third," etc. are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
The application provides a radar vision data association method based on a deep learning algorithm, which is shown in fig. 1 and 2 and comprises the following steps:
s1, acquiring a current radar target track and a visual image frame, and fusing the track in history;
in the step S1, the echo signal received by the radar sensor is processed to generate a point trace of the radar target; and obtaining a current radar target track set R based on the point track through a joint probability data association algorithm and a Kalman filtering algorithm t The current ith radar target track is the current ith radar target track, and m is the total track number in the current radar target track; t represents the number of frames.
The set of historical fusion trajectories is denoted as F t-1 Is F t-1 The kth historical fusion track in (a) and p is the total track number of the historical fusion track set. In this embodiment, <' > a->Containing a trajectory at t-1 frames corresponding to a radar target having an associated visual target.
S2, inputting the current radar target track and the visual image frame into a centrfusion network, and outputting a visual target detection frame identified by visual detection; and setting a back projection mechanism, and back projecting the visual target detection frame of the visual target to a radar coordinate system to obtain the position of the corresponding visual target back projection under the radar coordinate system.
Step S2 includes:
s21, inputting the current visual image frame into a centrfusion network to obtain a visual target set marked by a visual target detection frame Represents the j-th visual target identified, and n is the total number of the visual targets identified. According to the mature multi-sensor space synchronization technology, radar detection points are projected to a pixel space, and the projection situation is schematically shown in fig. 3.
S22, using equipment erection height H dev Acquisition of target spatial altitude-H as a priori information dev +H obj /2,H obj The target itself is high. Based on the existing space synchronization technology, a position conversion matrix between the vision sensor and the radar sensor is obtained through internal parameter calibration and three-dimensional coordinate conversion of the vision sensor; based on the position conversion matrix, visual target is obtainedIs used as a reference point at the lower edge of the visual target detection frameProjected onto the radar coordinate system, as shown in FIG. 4, a visual target is obtained>Back projection position in radar coordinate system +.> Visual target->The back projection is at the transverse position and the longitudinal position of the radar coordinate system.
S3, calculating the motion, scale and appearance similarity of the radar target and the visual target, presetting corresponding weight coefficients for the motion, scale and appearance similarity, and calculating to obtain the first association similarity of the radar target and the visual target.
Step S3 includes:
s31, obtaining the motion similarity phi of the radar target and the visual target based on the position of the radar target and the back projection position of the visual target in the radar coordinate system motion (r t i ,v t j ):
Respectively represent the track r t i The corresponding radar target is positioned horizontally and longitudinally under the radar coordinate system,respectively represent visual target +.>The width and the height of the visual target detection frame;
s32, calculating a visual targetTrack in fusion with history +.>Scale similarity +.>
Respectively express +.>The width and the length of a visual target detection frame of the associated visual target;respectively represent visual target +.>The width and the length of the visual target detection frame;
s33, calculating the visual target by using the Pasteur distance by taking the gray level histogram of the visual target as an appearance characteristicTrack in fusion with history +.>Appearance similarity between->
Respectively indicate->Is a gray histogram feature of (a);
s34, setting corresponding weight coefficient beta for the motion, scale and appearance similarity 1 、β 2 、β 3 Respectively giving preset values to obtain an incidence matrixWherein visual objective->And track->First associated similarity of corresponding radar target +.>Expressed as:
in this embodiment, beta 1 、β 2 、β 3 The preset values of (2) are 1/3.
And S4, if the first association similarity is larger than a set first association threshold, estimating an updated position of the corresponding radar target based on the history fusion track, matching the updated position with the actual position of the corresponding radar target in the history fusion track, and filtering the false alarm target according to a matching result.
Step S4 includes:
s41, if the first association similarity is larger than a set first association threshold, performing secondary association matching between the visual target and the history fusion track. Estimating trajectories using a Kalman filtering algorithmUpdate position of corresponding radar target +.>Track->Is t', +.>Respectively, t' is represented by the track +.>A predicted lateral and longitudinal position;
s42, if the following relation is satisfied, the track is setPutting the corresponding radar target and visual target into a pool to be associated, otherwise taking the radar target and the visual target as false alarm targets and filtering:
for the lateral and longitudinal positions of the current visual target back projected under the radar coordinate system, thre_x, thre_y and thre_t represent the matching thresholds in terms of lateral position, longitudinal position and interval time respectively.
The first-stage association of the radar target and the visual target is completed through the steps S3 and S4, and the interference introduced by the detection of the false alarm target is eliminated. In step S3, three types of characteristics including motion, scale and appearance similarity are extracted by using a centrfusion network, and three weight coefficients are added for carrying out radar-visual data association. And S4, acquiring continuous frame track information for estimating the target position of the current frame, matching the current detection value in the space-time dimension, and filtering the false alarm target according to the association result.
S5, respectively updating corresponding weight coefficients for the motion, the scale and the appearance similarity based on the size of the visual target, and updating the first association similarity into the second association similarity; if the second association similarity is larger than a set second association threshold, establishing a corresponding radar-vision association pair based on the corresponding radar target and the vision target, and updating a history fusion track based on the corresponding radar target; and (5) entering the next moment, and repeating the steps S1 to S5.
Step S5 includes:
s51, dividing the visual targets in the pool to be associated into a large target, a middle target and a small target based on the duty ratio of the visual targets in the visual image frame, and respectively updating the weight coefficients beta of the corresponding motion, scale and appearance similarity for the large target, the middle target and the small target 1 、β 2 、β 3 The method comprises the steps of carrying out a first treatment on the surface of the Based on updated beta 1 、β 2 、β 3 Calculating a second association similarity of the visual target and the radar target in the pool to be associated;
in this embodiment, let σ be the pixel value of the visual target and Size be the pixel value of the visual image frame; sigma (sigma) 1 =Size/100,σ 2 = (3 x size)/100, when σ < σ 1 For small target, if sigma 1 ≤σ<σ 2 As the middle target, if σ is equal to or greater than σ 2 For a large target, then updated beta 1 、β 2 、β 3 The following is shown:
s52, screening the visual targets and the radar targets with the second association similarity larger than a second association threshold by using a Hungary algorithm to obtain one-to-one corresponding radar-visual association pairs;
s53, fusing corresponding radar features and visual features of radar-visual association pairs based on a centrfusion network, regressively outputting information such as the position, the speed, the moving direction and the size of a corresponding target, and using the regressively output result to generate a history fusion track at the next moment.
In order to avoid the influence of the feature mutation on the overall similarity, the targets are divided according to the size according to the result of the first stage association, different weight coefficients are set for different targets, the false association in a dense scene is avoided, and a more accurate association result is output.
And collecting traffic data of the road section for 10 minutes, performing data association processing to obtain related evaluation index pairs shown in a table 1, and fusing output results according to association conditions as shown in fig. 5. Compared with the traditional data association, namely the Hungary algorithm and the central fusion network detection point association, the method disclosed by the application has the advantages that the effect of continuous frame tracks and the multi-class characteristic two-stage association are added, the association result is more stable and accurate under the shielding and dense scenes, the phenomenon of confusion of fusion tracks is greatly avoided, and the fusion perception system is more stable and reliable.
Table 1 analysis of the lightning correlation situation
Through the embodiment, the accurate association of the radar and the video target data is realized, the stability of a radar-video fusion system is enhanced, the adaptability of data association is stronger, the method has good processing output for shielding, jumping and other conditions in dense scenes, and the method has multiple advantages.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
While the application has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the application. Therefore, the protection scope of the application is subject to the protection scope of the claims.

Claims (8)

1. The radar vision data association method based on the deep learning algorithm is characterized by comprising the following steps:
s1, acquiring a history fusion track, a current radar target track and a visual image frame;
s2, inputting the current visual image frame into a centrfusion network, and outputting a visual target detection frame for detection and identification; a back projection mechanism is arranged, and the visual target detection frame is back projected to a radar coordinate system to obtain the position of the corresponding visual target under the radar coordinate system;
s3, calculating the motion, scale and appearance similarity of the radar target and the visual target, presetting corresponding weight coefficients for the motion, scale and appearance similarity, and calculating to obtain first association similarity of the radar target and the visual target;
s4, if the first association similarity is larger than a set first association threshold, estimating an updated position of a corresponding radar target based on the history fusion track, matching the updated position with the actual position of the corresponding radar target in the history fusion track, and filtering a false alarm target according to a matching result;
s5, respectively updating corresponding weight coefficients for the motion, the scale and the appearance similarity based on the size of the visual target, and updating the first association similarity into the second association similarity; if the second association similarity is larger than a set second association threshold, establishing a corresponding radar-vision association pair based on the corresponding radar target and the vision target, and updating a history fusion track based on the corresponding radar target; and (5) entering the next moment, and repeating the steps S1 to S5.
2. The method for correlating radar visual data based on the deep learning algorithm according to claim 1, wherein in the step S1, the trace of the radar target is generated after the echo signal received by the current radar sensor is subjected to signal processing; and obtaining a current radar target track set R based on the point track through a joint probability data association algorithm and a Kalman filtering algorithm t ,R t ={r t i } i∈[1,m] ,r t i The current ith radar target track is the current ith radar target track, and m is the total track number in the current radar target track; t represents the number of frames of the echo signal;
the set of historical fusion trajectories is denoted as F t-1 Is F t-1 The kth historical fusion track in (a) and p is the total track number of the historical fusion track set.
3. The method for correlating radar visual data based on the deep learning algorithm according to claim 1, wherein the step S2 comprises:
s21, inputting the current visual image frame into a centrfusion network to obtain a visual target set marked by a visual target detection frame Representing the j-th identified visual target, wherein n is the total number of the identified visual targets;
s22, using equipment erection height H dev Acquisition of target space height as a priori informationH dev +H obj /2,H obj The target is high; obtaining a position conversion matrix between vision and radar sensors through internal reference calibration and three-dimensional coordinate conversion of the vision sensors; based on the position conversion matrix, visual target is obtainedThe lower edge of the visual target detection frame of (2) is used as a reference point to be projected to a radar coordinate system, so as to obtain a visual target +.>Back projection position in radar coordinate system +.> Visual target->The back projection is at the transverse position and the longitudinal position of the radar coordinate system.
4. The method for correlating radar visual data based on a deep learning algorithm according to claim 3, wherein the step S3 comprises:
s31, obtaining the motion similarity phi of the radar target and the visual target based on the position of the radar target and the back projection position of the visual target in the radar coordinate system motion (r t i ,v t j ):
Respectively represent the track r t i The corresponding radar target is positioned horizontally and longitudinally under the radar coordinate system,respectively represent visual target +.>The width and the height of the visual target detection frame;
s32, calculating a visual targetTrack in fusion with history +.>Scale similarity +.>
Respectively express +.>The width and the length of a visual target detection frame of the associated visual target; />Respectively represent visual target +.>The width and the length of the visual target detection frame;
s33, calculating the visual target by using the Pasteur distance by taking the gray level histogram of the visual target as an appearance characteristicTrack in fusion with history +.>Appearance similarity between->
Respectively indicate->Is a gray histogram feature of (a);
s34, setting corresponding weight coefficient beta for the motion, scale and appearance similarity 1 、β 2 、β 3 Respectively giving preset values to obtain an incidence matrixWherein visual objective->And trajectory r t i First associated similarity of corresponding radar target +.>Expressed as:
5. the method of correlating radar visual data based on a deep learning algorithm according to claim 4, wherein the step S4 comprises:
s41, if the first association similarity is larger than a set first association threshold, performing secondary association matching between the radar and visual detection targets and the history fusion track; estimating trajectories using a Kalman filtering algorithmUpdated position of corresponding radar targetTrack->Is t', +.>Respectively represent the trajectories under the radar coordinate system at tA predicted lateral and longitudinal position;
s42, if the following relation is satisfied, the track is setPutting the corresponding radar target and visual target into a pool to be associated, otherwise taking the radar target and the visual target as false alarm targets and filtering:
the transverse position, the longitudinal position and the interval time matching threshold values are respectively indicated for the transverse position, the longitudinal position and the interval time of the radar of the current frame and the transverse position and the longitudinal position of the visual detection target under the radar coordinate system.
6. The method of correlating radar visual data based on a deep learning algorithm according to claim 5, wherein the step S5 comprises:
s51, dividing the visual targets in the pool to be associated into a large target, a middle target and a small target based on the duty ratio of the visual targets in the visual image frame, and respectively updating the weight coefficients beta of the corresponding motion, scale and appearance similarity for the large target, the middle target and the small target 1 、β 2 、β 3 The method comprises the steps of carrying out a first treatment on the surface of the Based on updated beta 1 、β 2 、β 3 Calculating a second association similarity of the visual target and the radar target in the pool to be associated;
s52, screening the visual targets and the radar targets with the second association similarity larger than a second association threshold by using a Hungary algorithm to obtain one-to-one corresponding radar-visual association pairs;
s53, fusing corresponding radar features and visual features based on a centrfusion network and a radar-visual association pair, regressively outputting target information, and storing regressively output results into corresponding historical fusion tracks to serve as network input information at the next moment; the target information includes any one or more of a position, a speed, a moving direction, and a size of the target.
7. The method of correlating radar visual data based on a deep learning algorithm according to claim 4, wherein β in step S34 1 、β 2 、β 3 The preset values of (2) are 1/3.
8. The method of correlating radar visual data based on a deep learning algorithm according to claim 4, wherein in step S51, σ is made to be a pixel value of a visual targetLet Size be the visual image frame pixel value; sigma (sigma) 1 =Size/100,σ 2 = (3 x size)/100, when σ < σ 1 For small target, if sigma 1 ≤σ<σ 2 As the middle target, if σ is equal to or greater than σ 2 For a large target, then updated beta 1 、β 2 、β 3 The following is shown:
CN202310734115.2A 2023-06-20 2023-06-20 Radar vision data association method based on deep learning algorithm Pending CN116778290A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310734115.2A CN116778290A (en) 2023-06-20 2023-06-20 Radar vision data association method based on deep learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310734115.2A CN116778290A (en) 2023-06-20 2023-06-20 Radar vision data association method based on deep learning algorithm

Publications (1)

Publication Number Publication Date
CN116778290A true CN116778290A (en) 2023-09-19

Family

ID=88009404

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310734115.2A Pending CN116778290A (en) 2023-06-20 2023-06-20 Radar vision data association method based on deep learning algorithm

Country Status (1)

Country Link
CN (1) CN116778290A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093872A (en) * 2023-10-19 2023-11-21 四川数字交通科技股份有限公司 Self-training method and system for radar target classification model
CN117890903A (en) * 2024-03-15 2024-04-16 哈尔滨工业大学(威海) Unmanned ship track correction method based on radar matching
CN118279677A (en) * 2024-06-03 2024-07-02 浙江大华技术股份有限公司 Target identification method and related device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117093872A (en) * 2023-10-19 2023-11-21 四川数字交通科技股份有限公司 Self-training method and system for radar target classification model
CN117093872B (en) * 2023-10-19 2024-01-02 四川数字交通科技股份有限公司 Self-training method and system for radar target classification model
CN117890903A (en) * 2024-03-15 2024-04-16 哈尔滨工业大学(威海) Unmanned ship track correction method based on radar matching
CN117890903B (en) * 2024-03-15 2024-06-07 哈尔滨工业大学(威海) Unmanned ship track correction method based on radar matching
CN118279677A (en) * 2024-06-03 2024-07-02 浙江大华技术股份有限公司 Target identification method and related device

Similar Documents

Publication Publication Date Title
CN116778290A (en) Radar vision data association method based on deep learning algorithm
CN110660082B (en) Target tracking method based on graph convolution and trajectory convolution network learning
CN104282020B (en) A kind of vehicle speed detection method based on target trajectory
CN102542289B (en) Pedestrian volume statistical method based on plurality of Gaussian counting models
CN110689562A (en) Trajectory loop detection optimization method based on generation of countermeasure network
CN109559324B (en) Target contour detection method in linear array image
CN107818571A (en) Ship automatic tracking method and system based on deep learning network and average drifting
CN103177456A (en) Method for detecting moving target of video image
CN102521612B (en) Multiple video object active tracking method based cooperative correlation particle filtering
CN108804992B (en) Crowd counting method based on deep learning
CN110555868A (en) method for detecting small moving target under complex ground background
CN111709968B (en) Low-altitude target detection tracking method based on image processing
CN113030973B (en) Scene monitoring radar signal processing system and method
CN106599918B (en) vehicle tracking method and system
CN109239702B (en) Airport low-altitude flying bird number statistical method based on target state set
CN109636834A (en) Video frequency vehicle target tracking algorism based on TLD innovatory algorithm
WO2019006632A1 (en) Video multi-target tracking method and device
CN110929670A (en) Muck truck cleanliness video identification and analysis method based on yolo3 technology
CN110826575A (en) Underwater target identification method based on machine learning
CN114092404A (en) Infrared target detection method and computer readable storage medium
CN105740819A (en) Integer programming based crowd density estimation method
CN111242972B (en) On-line cross-scale multi-fluid target matching tracking method
CN117008077A (en) Target detection method and marine electronic fence system
CN116862832A (en) Three-dimensional live-action model-based operator positioning method
Zhang et al. Vehicle detection and tracking in remote sensing satellite vidio based on dynamic association

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination