CN116403180A - 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning - Google Patents

4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning Download PDF

Info

Publication number
CN116403180A
CN116403180A CN202310647626.0A CN202310647626A CN116403180A CN 116403180 A CN116403180 A CN 116403180A CN 202310647626 A CN202310647626 A CN 202310647626A CN 116403180 A CN116403180 A CN 116403180A
Authority
CN
China
Prior art keywords
tracking
millimeter wave
features
target detection
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310647626.0A
Other languages
Chinese (zh)
Other versions
CN116403180B (en
Inventor
娄慧丽
陆新飞
薛旦
史颂华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Geometry Partner Intelligent Driving Co ltd
Original Assignee
Shanghai Geometry Partner Intelligent Driving Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Geometry Partner Intelligent Driving Co ltd filed Critical Shanghai Geometry Partner Intelligent Driving Co ltd
Priority to CN202310647626.0A priority Critical patent/CN116403180B/en
Publication of CN116403180A publication Critical patent/CN116403180A/en
Application granted granted Critical
Publication of CN116403180B publication Critical patent/CN116403180B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/02Systems using reflection of radio waves, e.g. primary radar systems; Analogous systems
    • G01S13/50Systems of measurement based on relative movement of target
    • G01S13/58Velocity or trajectory determination systems; Sense-of-movement determination systems
    • G01S13/585Velocity or trajectory determination systems; Sense-of-movement determination systems processing the video signal in order to evaluate or display the velocity value
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Probability & Statistics with Applications (AREA)
  • Radar Systems Or Details Thereof (AREA)

Abstract

The invention relates to a method for realizing target detection, tracking and speed measurement of a 4D millimeter wave radar based on a deep learning network model, wherein the method comprises the following steps: performing point cloud clipping on the currently acquired millimeter wave points Yun Lei of the front frame and the rear frame; extracting the characteristics of the selected input characteristics by using a characteristic pyramid structure; extracting target key points from the extracted features by using a detection head of the anchor-free frame CenterNet; performing cross attention fusion matching processing on the acquired target key points by using a Transformer attention mechanism; and acquiring classification, regression and tracking confidence and tracking position information of target detection through the multi-task detection head. The invention also relates to a corresponding device, a processor and a storage medium thereof. The method, the device, the processor and the storage medium for realizing the target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model have obvious technical advantages in use scene and complexity.

Description

4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning
Technical Field
The invention relates to the technical field of automatic driving, in particular to the technical field of laser radar, and particularly relates to a method, a device, a processor and a computer readable storage medium for realizing target detection, tracking and speed measurement of a 4D millimeter wave radar based on a deep learning network model.
Background
The current mainstream autopilot solution takes vision and laser as main sensors, is influenced by the sensors themselves, and the feasibility of the technology faces great challenges in the presence of severe weather such as shielding and rain and fog. The traditional 3D millimeter radar only plays an auxiliary role as an additional attribute of multi-source sensor fusion on the solution of target detection and tracking due to sparsity of observation point clouds.
The deep learning-based automatic driving solution uses a data-driven manner to adapt to a large number of scenes, and has strong generalization capability. In the field of autopilot, deep learning target detection and tracking algorithms are mostly based on cameras and lidars. Three-dimensional single-target detection tracking based on lidar is a challenging problem in robotics and autopilot technology. At present, the existing laser radar detection tracking method often has the problem of sparse or partial shielding of long-distance objects, which makes the extracted characteristics of the model ambiguous. The blurred features can make the target object difficult to locate, ultimately resulting in poor tracking results.
The 4D millimeter wave imaging radar has the advantages that the angle resolution performance is greatly improved, the pitching angle measurement precision is improved, the positioning of the height with centimeter level is realized, the resolution is high, the information such as amplitude, phase, energy distribution, intensity and speed is abundant, and a new solution is provided for target detection and tracking. At present, most of the solutions based on the 4D millimeter wave radar perform target detection by clustering, perform multi-target tracking by filtering and matching, and require a large number of rules to be manually added to a complex scene. The current deep learning target detection scheme based on the 4D millimeter wave radar is similar to a laser radar, such as three-dimensional object detection using Lidar, point pillars and voxel-based target detection are used as basic models, power measurement of Doppler, range, azimuth and elevation dimensions is used as input, and provided three-dimensional spatial information features are used as input to realize accurate three-dimensional perception.
The improvement of 4D millimeter wave imaging radar on the point cloud resolution and accurate speed information provide new solutions for target detection and tracking, and the patent uses the deep learning data drive, uses the strong correlation of millimeter wave radar speed between front and back frames, uses the mode of deep learning single-stage model prediction to carry out target detection, speed measurement and multi-target tracking of 4D millimeter wave Lei Dadian cloud on front and back frame millimeter wave radar data.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides a method, a device, a processor and a computer readable storage medium thereof for realizing target detection, tracking and speed measurement of a 4D millimeter wave radar based on a deep learning network model.
In order to achieve the above object, the method, the device, the processor and the computer readable storage medium thereof for realizing the target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model of the invention are as follows:
the method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model is mainly characterized by comprising the following steps of:
(1) Performing point cloud clipping on the currently acquired front and rear frames of millimeter wave points Yun Lei, and taking the point cloud clipping as an input characteristic of target detection, tracking and speed measurement;
(2) Extracting the characteristics of the selected input characteristics by using a characteristic pyramid structure;
(3) Based on the currently extracted features, extracting target key points from the extracted features by using a detection head of the anchor-free frame CenterNet, and obtaining actual position information of the target points;
(4) Cross attention fusion matching processing is carried out on the obtained target key points by using a transducer attention mechanism so as to obtain the correlation degree between frames before and after the target;
(5) The target detection classification, regression, tracking confidence and tracking position information are obtained through the multi-task detection head, so that the detection, tracking and speed measurement of the target point are realized.
Preferably, the step (1) specifically includes:
and selecting front and back millimeter wave Lei Dadian clouds with a time interval of 100ms, performing point cloud clipping, performing voxelization on the intercepted point clouds, converting the voxelized point clouds into voxel grids with a height of 1, selecting the center offset of the point clouds in each voxel grid, the density of the point clouds in the voxels, the radial speed of the point clouds and the point cloud compensation speed as input features, and projecting the input features into a 2D BEV grid for subsequent processing.
Preferably, the step (2) specifically includes:
the feature pyramid structure includes a downsampling layer and an upsampling layer, wherein,
the feature extraction is carried out by the downsampling layer through a convolution layer and a resnet18, and the feature extracted by the downsampling layer is subjected to downsampling 16 times compared with the feature input to the downsampling layer; taking the features subjected to 2 times of downsampling, 4 times of downsampling, 8 times of downsampling and 16 times of downsampling in sequence as input features of the upsampling layer;
the up-sampling layer carries out first up-sampling on the feature layer which is subjected to 16 times of down-sampling treatment by the down-sampling layer to obtain a first up-sampling result; performing channel splicing and convolution processing on the first upsampling result and the downsampling layer downsampling 8 times characteristic to obtain a second upsampling result; performing channel splicing and convolution processing on the second upsampling result and the downsampling 4 times characteristic of the downsampling layer to obtain a third upsampling result;
directly performing up-sampling treatment on the first up-sampling result and the second up-sampling result to obtain a first layer output of the characteristic pyramid structure; outputting the third upsampling result as a second layer of the feature pyramid structure; performing channel splicing and convolution processing on the features of the up-sampling result of the third time and the down-sampling result of the down-sampling layer by 2 times to obtain a third layer output of the feature pyramid structure;
and performing channel splicing on the first layer output, the second layer output and the third layer output, and taking the channel splicing result as a final feature extraction result of the feature pyramid structure.
Preferably, the step (3) specifically includes the following steps:
(3.1) carrying out 1X 1 convolution processing on the extracted features twice to obtain the confidence coefficient of the key point of the target frame;
(3.2) extracting 512 key points with highest confidence coefficient of the key points as tracking target key points;
(3.3) extracting the feature position_feature of the feature map corresponding to the BEV grid position of the tracking target key point to obtain the projection feature of the aerial view to the key point;
(3.4) passing through the grid location
Figure SMS_1
Actual distance size of each grid
Figure SMS_2
And minimum point cloud range->
Figure SMS_3
Obtain the actual position information of the target point +.>
Figure SMS_4
The specific calculation formula is as follows:
Figure SMS_5
Figure SMS_6
Figure SMS_7
preferably, the step (4) specifically includes the following steps:
(4.1) information on the actual position of the target point
Figure SMS_8
And inputting the feature position_feature of the feature map into a transducer attention mechanism;
(4.2) extracting the sequence features of the front and rear frames by using a self-Attention mechanism (Attention), wherein the self-Attention mechanism is specifically as follows: k (Key) =q (Query) =v (Valuse), calculated as follows:
first, the dot product between Q and K is calculated, and the result is divided by the product to prevent excessive results
Figure SMS_9
Wherein->
Figure SMS_10
Dimension K>
Figure SMS_11
Normalizing the result into probability distribution by using a softmax normalized exponential function as a transposed matrix of the matrix K, and multiplying the probability distribution by the matrix V to obtain a final weight summation result:
Figure SMS_12
(4.3) using the obtained weight summation result of the current frame and the weight summation result of the previous frame as the input of a cross attention mechanism, and using the key point position as the position embedding of the transducer attention mechanism, thereby obtaining the correlation degree between the frames before and after the target.
Preferably, the step (4.3) specifically includes:
and (3) taking the current frame weight summation result as K (Key) and V (Value), taking the previous frame weight summation result as Q (Query), and calculating by utilizing the step (4.2) to obtain the correlation degree of the front and rear frame targets.
Preferably, the step (5) specifically includes the following steps:
(5.1) performing channel splicing on the output characteristics of the cross attention mechanism and the position information of the key points of the current frame to obtain spliced characteristics;
(5.2) extracting the fused features from the spliced features by using 3 convolution layers, and taking the first two dimension features of the fused features as tracking center points; and taking a result obtained by using two layers of convolutions of the spliced features as tracking confidence;
and (5.3) carrying out normalization processing on the features of the non-first two dimensions of the fusion feature and the key point features of the current frame by using a normalization exponential function, and then carrying out channel splicing on the features after normalization processing and the key point features of the current frame to obtain splicing features, wherein the splicing features respectively obtain the category confidence coefficient, regression frame and speed of target detection by two-layer convolution.
Preferably, the cross-attention mechanism module further comprises:
predicting the position of a central point of a key point of a current frame in a previous frame from the characteristics subjected to fusion processing, flexibly adjusting a central offset threshold of the previous frame and taking the central offset threshold as a front frame and rear frame matching standard, and taking the central offset threshold as a final tracking matching mode, wherein the method comprises the following specific steps of:
Figure SMS_13
wherein track_id i For the id of the current frame, track_id i-1 For the tracking id of the previous frame,
Figure SMS_14
for the center point of the previous frame prediction, +.>
Figure SMS_15
For the center point of the target tracked by the current frame, x_thresh, y_thresh are allowed center offset thresholds, track_score i Score_thresh is the threshold for tracking, which is the confidence of tracking.
The device for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model is mainly characterized by comprising the following components:
a processor configured to execute computer-executable instructions;
and the memory stores one or more computer executable instructions which, when executed by the processor, implement the steps of the method for implementing 4D millimeter wave radar target detection, tracking and speed measurement based on the deep learning network model.
The processor for realizing the 4D millimeter wave radar target detection, tracking and speed measurement based on the deep learning network model is mainly characterized in that the processor is configured to execute computer executable instructions, and when the computer executable instructions are executed by the processor, the steps of the method for realizing the 4D millimeter wave radar target detection, tracking and speed measurement based on the deep learning network model are realized.
The computer readable storage medium is mainly characterized in that the computer program is stored thereon, and the computer program can be executed by a processor to realize the steps of the method for realizing 4D millimeter wave radar target detection, tracking and speed measurement based on the deep learning network model.
Compared with the traditional clustering tracking speed measurement method, the device, the processor and the computer readable storage medium for realizing the target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model, the method, the device and the processor for realizing the target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model have the advantages in cost control because the technical scheme adopts a data driving mode, does not need to design a large number of rules and complex post-processing operations for complex and changeable road scenes manually, and only uses the millimeter wave radar. Compared with the existing multiple networks with multiple tasks of target detection, tracking and speed measurement, the method uses a deep learning single-stage mode, the detection and tracking modules share model features, and the calculated amount of secondary feature extraction is reduced. In addition, the technical scheme uses the characteristics of the front frame and the rear frame after fusion to carry out target detection, so that the problem of target missing detection can be reduced to a certain extent. Meanwhile, the method uses the speed measuring function of the millimeter wave radar, obtains the speed characteristic of the current frame while detecting and tracking the target, can carry out tracking secondary matching, and has outstanding technical advantages in use.
Drawings
Fig. 1 is a schematic diagram of a process flow of the method for realizing target detection, tracking and speed measurement of a 4D millimeter wave radar based on a deep learning network model.
Fig. 2 is a schematic diagram of a point cloud structure of 4D millimeter wave radar data of a previous frame in an embodiment of the present invention.
Fig. 3 is a schematic view of a point cloud structure of the next frame of 4D millimeter wave radar data according to an embodiment of the present invention.
FIG. 4 is a diagram showing the result of tracking matching according to the present invention.
Detailed Description
In order to more clearly describe the technical contents of the present invention, a further description will be made below in connection with specific embodiments.
Before describing in detail embodiments that are in accordance with the present invention, it should be observed that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Referring to fig. 1, the method for implementing target detection, tracking and speed measurement of a 4D millimeter wave radar based on a deep learning network model includes the following steps:
(1) Performing point cloud clipping on the currently acquired front and rear frames of millimeter wave points Yun Lei, and taking the point cloud clipping as an input characteristic of target detection, tracking and speed measurement;
(2) Extracting the characteristics of the selected input characteristics by using a characteristic pyramid structure;
(3) Based on the currently extracted features, extracting target key points from the extracted features by using a detection head of the anchor-free frame CenterNet, and obtaining actual position information of the target points;
(4) Cross attention fusion matching processing is carried out on the obtained target key points by using a transducer attention mechanism so as to obtain the correlation degree between frames before and after the target;
(5) The target detection classification, regression, tracking confidence and tracking position information are obtained through the multi-task detection head, so that the detection, tracking and speed measurement of the target point are realized.
As a preferred embodiment of the present invention, the step (1) specifically includes:
and selecting front and back millimeter wave Lei Dadian clouds with a time interval of 100ms, performing point cloud clipping, performing voxelization on the intercepted point clouds, converting the voxelized point clouds into voxel grids with a height of 1, selecting the center offset of the point clouds in each voxel grid, the density of the point clouds in the voxels, the radial speed of the point clouds and the point cloud compensation speed as input features, and projecting the input features into a 2D BEV grid for subsequent processing.
As a preferred embodiment of the present invention, the step (2) specifically includes:
the feature pyramid structure includes a downsampling layer and an upsampling layer, wherein,
the feature extraction is carried out by the downsampling layer through a convolution layer and a resnet18, and the feature extracted by the downsampling layer is subjected to downsampling 16 times compared with the feature input to the downsampling layer; taking the features subjected to 2 times of downsampling, 4 times of downsampling, 8 times of downsampling and 16 times of downsampling in sequence as input features of the upsampling layer;
the up-sampling layer carries out first up-sampling on the feature layer which is subjected to 16 times of down-sampling treatment by the down-sampling layer to obtain a first up-sampling result; performing channel splicing and convolution processing on the first upsampling result and the downsampling layer downsampling 8 times characteristic to obtain a second upsampling result; performing channel splicing and convolution processing on the second upsampling result and the downsampling 4 times characteristic of the downsampling layer to obtain a third upsampling result;
directly performing up-sampling treatment on the first up-sampling result and the second up-sampling result to obtain a first layer output of the characteristic pyramid structure; outputting the third upsampling result as a second layer of the feature pyramid structure; performing channel splicing and convolution processing on the features of the up-sampling result of the third time and the down-sampling result of the down-sampling layer by 2 times to obtain a third layer output of the feature pyramid structure;
and performing channel splicing on the first layer output, the second layer output and the third layer output, and taking the channel splicing result as a final feature extraction result of the feature pyramid structure.
As a preferred embodiment of the present invention, the step (3) specifically includes the following steps:
(3.1) carrying out 1X 1 convolution processing on the extracted features twice to obtain the confidence coefficient of the key point of the target frame;
(3.2) extracting 512 key points with highest confidence coefficient of the key points as tracking target key points;
(3.3) extracting the feature position_feature of the feature map corresponding to the BEV grid position of the tracking target key point to obtain the projection feature of the aerial view to the key point;
(3.4) passing through the grid location
Figure SMS_16
Actual distance size of each grid
Figure SMS_17
And minimum point cloud range->
Figure SMS_18
Obtain the actual position information of the target point +.>
Figure SMS_19
The specific calculation formula is as follows:
Figure SMS_20
Figure SMS_21
Figure SMS_22
as a preferred embodiment of the present invention, the step (4) specifically includes the following steps:
(4.1) information on the actual position of the target point
Figure SMS_23
And inputting the feature position_feature of the feature map into a transducer attention mechanism;
(4.2) extracting the sequence features of the front and rear frames by using a self-Attention mechanism (Attention), wherein the self-Attention mechanism is specifically as follows: k (Key) =q (Query) =v (Valuse), calculated as follows:
first, the dot product between Q and K is calculated to prevent knotsThe excessive fruit will be divided by
Figure SMS_24
Wherein->
Figure SMS_25
Dimension K>
Figure SMS_26
Normalizing the result into probability distribution by using a softmax normalized exponential function as a transposed matrix of the matrix K, and multiplying the probability distribution by the matrix V to obtain a final weight summation result:
Figure SMS_27
(4.3) using the obtained weight summation result of the current frame and the weight summation result of the previous frame as the input of a cross attention mechanism, and using the key point position as the position embedding of the transducer attention mechanism, thereby obtaining the correlation degree between the frames before and after the target.
As a preferred embodiment of the present invention, the step (4.3) specifically includes:
and (3) taking the current frame weight summation result as K (Key) and V (Value), taking the previous frame weight summation result as Q (Query), and calculating by utilizing the step (4.2) to obtain the correlation degree of the front and rear frame targets.
The key point characteristics of the current frame are output as K and V of a cross attention mechanism after being subjected to a transducer self attention mechanism; taking the output of the key point feature of the previous frame after the self-attention mechanism of the transducer as the Q of the cross-attention mechanism module; the cross attention mechanism module calculates the correlation degree of the front frame target and the rear frame target by using an attention mechanism calculation formula based on the acquired K, Q and V.
As a preferred embodiment of the present invention, the step (5) specifically includes the steps of:
(5.1) performing channel splicing on the output characteristics of the cross attention mechanism and the position information of the key points of the current frame to obtain spliced characteristics;
(5.2) extracting the fused features from the spliced features by using 3 convolution layers, and taking the first two dimension features of the fused features as tracking center points; and taking a result obtained by using two layers of convolutions of the spliced features as tracking confidence;
and (5.3) carrying out normalization processing on the features of the non-first two dimensions of the fusion feature and the key point features of the current frame by using a normalization exponential function, and then carrying out channel splicing on the features after normalization processing and the key point features of the current frame to obtain splicing features, wherein the splicing features respectively obtain the category confidence coefficient, regression frame and speed of target detection by two-layer convolution.
As a preferred embodiment of the present invention, the cross-attention mechanism module further comprises:
predicting the position of a central point of a key point of a current frame in a previous frame from the characteristics subjected to fusion processing, flexibly adjusting a central offset threshold of the previous frame and taking the central offset threshold as a front frame and rear frame matching standard, and taking the central offset threshold as a final tracking matching mode, wherein the method comprises the following specific steps of:
Figure SMS_28
wherein track_id i For the id of the current frame, track_id i-1 For the tracking id of the previous frame,
Figure SMS_29
for the center point of the previous frame prediction, +.>
Figure SMS_30
For the center point of the target tracked by the current frame, x_thresh, y_thresh are allowed center offset thresholds, track_score i Score_thresh is the threshold for tracking, which is the confidence of tracking.
In practical applications, please refer to fig. 2 and 3, which are front and back frame millimeter wave Lei Dadian cloud data with a time interval of 100ms, the point cloud data is rendered at an absolute mirror speed, and the front and back frame rate of the millimeter wave Lei Dadian cloud data has strong correlation. According to the technical scheme, the speed is used as the main characteristic of tracking and target detection, the network is designed by fusing the front and rear frame data in a deep learning single-stage mode, and the tasks of target detection, tracking and speed measurement are realized. Wherein the RGB information is speed.
According To the technical scheme, during testing, the millimeter wave radar of the current frame is only required To be subjected To corresponding feature extraction and Bev To Point projection, and the last frame Point sequence feature is directly multiplexed with the result of target detection tracking at the last moment. With reference to fig. 1, the overall steps are as follows:
1) And selecting two frames of millimeter wave Lei Dadian clouds with a time interval of 100ms, performing point cloud clipping, voxelizing the intercepted point clouds, converting the voxelized point clouds into voxel grids with a height of 1, selecting the center offset of the point clouds of each voxel grid, the density of the point clouds in the voxels, the radial speed of the point clouds and the point cloud compensation speed as characteristics, and projecting the point clouds into the 2D BEV grid. According to the technical scheme, the point cloud mirror image speed and the point cloud compensation speed are used as input features and are used as main feature input for 4d millimeter wave tracking and speed measurement.
2) Feature extraction is performed by using a feature pyramid structure, and the network structure is shown in figure 1 and a feature extraction part. The first part downsamples the layer, which uses one convolution layer and resnet18 for feature extraction, a total of 16 times downsampling. A second part upsampling layer part, which upsamples the feature layer which is downsampled to 16 times to obtain a first upsampling feature, and the upsampling result and the 8 times downsampling result are subjected to channel splicing and convolution, and then upsampling to obtain a second upsampling result; and performing channel splicing and convolution on the second upsampling result and 4 times of downsampling characteristics, then upsampling to obtain a third upsampling result, directly upsampling the second upsampling result to obtain fpn (characteristic pyramid structure) first-layer output, outputting the third upsampling result as fpn second-layer output, performing channel splicing and convolution on the third upsampling result and 2 times of downsampling characteristic channel to obtain fpn third-layer output, and performing channel splicing on the fpn third-layer output to obtain fpn-layer final result. Different from the single-target tracking of the laser radar based on the search area and the matching, the technical scheme directly uses the extracted point cloud characteristics as the characteristics matched by the tracking module, and the same fpn characteristic layer result is used for detection and tracking, so that the point cloud is not sampled again and the characteristics are extracted.
3) The key point extraction uses a detection head of an anchor-free frame CenterNet, uses 1*1 convolution to replace full connection, and uses two convolutions to obtain the key point confidence coefficient of the target frame. The network structure is shown in figure 1, and the corresponding point sequence features and point sequence positions are obtained by decoding from bev grid positions in the technical scheme. Specifically, the following formula is adopted to extract the feature position_feature of the feature map corresponding to the bev grid position of the target key point, and the projection feature of the aerial view to the key point is obtained. From grid locations
Figure SMS_31
And the actual distance size of each grid +.>
Figure SMS_32
And minimum point cloud range->
Figure SMS_33
Obtain the actual position information of the target point +.>
Figure SMS_34
Figure SMS_35
Figure SMS_36
Figure SMS_37
4) Position of using 3) target x ,position y The position_feature is used as input, as shown in fig. 1, and the present technical solution uses a transform attention mechanism for feature matching. The attention mechanism (attention) is calculated by first calculating the dot product between Q and K, dividing by the product to prevent excessive results
Figure SMS_38
Wherein->
Figure SMS_39
And (3) normalizing the result into probability distribution by using a softmax normalized exponential function for the dimension of K, and multiplying the probability distribution by a matrix V to obtain a final weight summation result.
Figure SMS_40
The front and back point sequence features are extracted using a transducer self-attention mechanism module, which uses the key point features as K, Q, V of the transducer multi-head attention module and the key point positions as position embeds. The output of the current frame after the self-attention mechanism is used as K and V of a cross-attention mechanism module, the output of the previous frame after the self-attention mechanism is used as Q of the cross-attention mechanism module, the cross-attention mechanism module calculates the correlation degree of the front frame target and the rear frame target, and the output characteristic of the cross-attention mechanism and the position information of the key point of the current frame are subjected to channel splicing to obtain the spliced characteristic. The spliced features are sent to a voting layer, the 3 convolution layers are used for extracting the fused features, then tracking center points are obtained from the fused features, and the spliced features are subjected to two-layer convolution to obtain tracking confidence coefficients. Finally, the non-position feature part of the fused feature and the key point feature of the current frame are normalized by using the normalization index function, then the non-position feature part and the key point feature of the current frame are subjected to channel splicing, the category confidence of target detection is obtained through two-layer convolution, the regression frame and the speed are obtained through two-layer convolution, and the missing detection phenomenon can be reduced to a certain extent.
5) The technical scheme is that the post-processing part of the tracking part: the central point position of the key point of the current frame is predicted from the fusion characteristic and used as a matching mode of final tracking, and the central offset threshold of the previous frame can be flexibly adjusted to be used as a front-back frame matching standard. The matching formula is as follows, track_id i For the id of the current frame, track_id i-1 Tracking for previous frameid,
Figure SMS_41
Center point of last frame prediction, +.>
Figure SMS_42
For the center point of the target tracked by the current frame, x_thresh, y_thresh are allowed center offset thresholds, track_score i Score_thresh is the threshold for tracking, which is the confidence of tracking.
Figure SMS_43
The 4D millimeter wave radar outputs the speed of the vehicle and the speed of the target at the same time, so that the central position of the current frame target at the last frame can be further predicted to be used as secondary matching through the speed predicted by the target detection of the current frame and the direction angle of the target, and the tracking matching effect is improved.
As shown in fig. 4, the left side is the first frame prediction result, and the right side is the predicted tracking result, which includes the prediction frame and the tracking ID.
The device for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model comprises:
a processor configured to execute computer-executable instructions;
and the memory stores one or more computer executable instructions which, when executed by the processor, implement the steps of the method for implementing 4D millimeter wave radar target detection, tracking and speed measurement based on the deep learning network model.
The processor for realizing the 4D millimeter wave radar target detection, tracking and speed measurement based on the deep learning network model is configured to execute computer executable instructions, and when the computer executable instructions are executed by the processor, the steps of the method for realizing the 4D millimeter wave radar target detection, tracking and speed measurement based on the deep learning network model are realized.
The computer readable storage medium has stored thereon a computer program executable by a processor to perform the steps of the method for achieving 4D millimeter wave radar target detection, tracking and speed measurement based on a deep learning network model described above.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution device.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, and the program may be stored in a computer readable storage medium, where the program when executed includes one or a combination of the steps of the method embodiments.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "examples," "specific examples," or "embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.
Compared with the traditional clustering tracking speed measurement method, the device, the processor and the computer readable storage medium for realizing the target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model, the method, the device and the processor for realizing the target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model have the advantages in cost control because the technical scheme adopts a data driving mode, does not need to design a large number of rules and complex post-processing operations for complex and changeable road scenes manually, and only uses the millimeter wave radar. Compared with the existing multiple networks with multiple tasks of target detection, tracking and speed measurement, the method uses a deep learning single-stage mode, the detection and tracking modules share model features, and the calculated amount of secondary feature extraction is reduced. In addition, the technical scheme uses the characteristics of the front frame and the rear frame after fusion to carry out target detection, so that the problem of target missing detection can be reduced to a certain extent. Meanwhile, the method uses the speed measuring function of the millimeter wave radar, obtains the speed characteristic of the current frame while detecting and tracking the target, can carry out tracking secondary matching, and has outstanding technical advantages in use.
In this specification, the invention has been described with reference to specific embodiments thereof. It will be apparent, however, that various modifications and changes may be made without departing from the spirit and scope of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (11)

1. The method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model is characterized by comprising the following steps of:
(1) Performing point cloud clipping on the currently acquired front and rear frames of millimeter wave points Yun Lei, and taking the point cloud clipping as an input characteristic of target detection, tracking and speed measurement;
(2) Extracting the characteristics of the selected input characteristics by using a characteristic pyramid structure;
(3) Based on the currently extracted features, extracting target key points from the extracted features by using a detection head of the anchor-free frame CenterNet, and obtaining actual position information of the target points;
(4) Cross attention fusion matching processing is carried out on the obtained target key points by using a transducer attention mechanism so as to obtain the correlation degree between frames before and after the target;
(5) The target detection classification, regression, tracking confidence and tracking position information are obtained through the multi-task detection head, so that the detection, tracking and speed measurement of the target point are realized.
2. The method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model according to claim 1, wherein the step (1) is specifically as follows:
and selecting front and back millimeter wave Lei Dadian clouds with a time interval of 100ms, performing point cloud clipping, performing voxelization on the intercepted point clouds, converting the voxelized point clouds into voxel grids with a height of 1, selecting the center offset of the point clouds in each voxel grid, the density of the point clouds in the voxels, the radial speed of the point clouds and the point cloud compensation speed as input features, and projecting the input features into a 2D BEV grid for subsequent processing.
3. The method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model according to claim 2, wherein the step (2) is specifically:
the feature pyramid structure includes a downsampling layer and an upsampling layer, wherein,
the feature extraction is carried out by the downsampling layer through a convolution layer and a resnet18, and the feature extracted by the downsampling layer is subjected to downsampling 16 times compared with the feature input to the downsampling layer; taking the features subjected to 2 times of downsampling, 4 times of downsampling, 8 times of downsampling and 16 times of downsampling as input features of the upsampling layer;
the up-sampling layer carries out first up-sampling on the feature layer which is subjected to 16 times of down-sampling treatment by the down-sampling layer to obtain a first up-sampling result; performing channel splicing and convolution processing on the first upsampling result and the downsampling layer downsampling 8 times characteristic to obtain a second upsampling result; performing channel splicing and convolution processing on the second upsampling result and the downsampling 4 times characteristic of the downsampling layer to obtain a third upsampling result;
directly performing up-sampling treatment on the first up-sampling result and the second up-sampling result to obtain a first layer output of the characteristic pyramid structure; outputting the third upsampling result as a second layer of the feature pyramid structure; performing channel splicing and convolution processing on the features of the up-sampling result of the third time and the down-sampling result of the down-sampling layer by 2 times to obtain a third layer output of the feature pyramid structure;
and performing channel splicing on the first layer output, the second layer output and the third layer output, and taking the channel splicing result as a final feature extraction result of the feature pyramid structure.
4. The method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model according to claim 3, wherein the step (3) specifically comprises the following steps:
(3.1) carrying out 1X 1 convolution processing on the extracted features twice to obtain the confidence coefficient of the key point of the target frame;
(3.2) extracting 512 key points with highest confidence coefficient of the key points as tracking target key points;
(3.3) extracting the feature position_feature of the feature map corresponding to the BEV grid position of the tracking target key point to obtain the projection feature of the aerial view to the key point;
(3.4) passing through the grid location
Figure QLYQS_1
Actual distance size of each grid
Figure QLYQS_2
And minimum point cloud range->
Figure QLYQS_3
Obtain the actual position information of the target point +.>
Figure QLYQS_4
The specific calculation formula is as follows:
Figure QLYQS_5
Figure QLYQS_6
Figure QLYQS_7
5. the method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model as claimed in claim 4, wherein the step (4) specifically comprises the following steps:
(4.1) information on the actual position of the target point
Figure QLYQS_8
And inputting the feature position_feature of the feature map into a transducer attention mechanism;
(4.2) extracting the sequence features of the front and rear frames by using a self-Attention mechanism (Attention), wherein the self-Attention mechanism is specifically as follows: k (Key) =q (Query) =v (Valuse), calculated as follows:
first, the dot product between Q and K is calculated to prevent the result from being too largeWill be divided by
Figure QLYQS_9
Wherein->
Figure QLYQS_10
Dimension K>
Figure QLYQS_11
Normalizing the result into probability distribution by using a softmax normalized exponential function as a transposed matrix of the matrix K, and multiplying the probability distribution by the matrix V to obtain a final weight summation result:
Figure QLYQS_12
(4.3) using the obtained weight summation result of the current frame and the weight summation result of the previous frame as the input of a cross attention mechanism, and using the key point position as the position embedding of the transducer attention mechanism, thereby obtaining the correlation degree between the frames before and after the target.
6. The method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model according to claim 5, wherein the step (4.3) is specifically as follows:
and (3) taking the current frame weight summation result as K (Key) and V (Value), taking the previous frame weight summation result as Q (Query), and calculating by utilizing the step (4.2) to obtain the correlation degree of the front and rear frame targets.
7. The method for realizing target detection, tracking and speed measurement of the 4D millimeter wave radar based on the deep learning network model as claimed in claim 6, wherein the step (5) specifically comprises the following steps:
(5.1) performing channel splicing on the output characteristics of the cross attention mechanism and the position information of the key points of the current frame to obtain spliced characteristics;
(5.2) extracting the fused features from the spliced features by using 3 convolution layers, and taking the first two dimension features of the fused features as tracking center points; and taking a result obtained by using two layers of convolutions of the spliced features as tracking confidence;
and (5.3) carrying out normalization processing on the features of the non-first two dimensions of the fusion feature and the key point features of the current frame by using a normalization exponential function, and then carrying out channel splicing on the features after normalization processing and the key point features of the current frame to obtain splicing features, wherein the splicing features respectively obtain the category confidence coefficient, regression frame and speed of target detection by two-layer convolution.
8. The method for implementing 4D millimeter wave radar target detection, tracking and speed measurement based on deep learning network model according to claim 7, wherein the cross attention mechanism module further comprises:
predicting the position of a central point of a key point of a current frame in a previous frame from the characteristics subjected to fusion processing, flexibly adjusting a central offset threshold of the previous frame and taking the central offset threshold as a front frame and rear frame matching standard, and taking the central offset threshold as a final tracking matching mode, wherein the method comprises the following specific steps of:
Figure QLYQS_13
wherein track_id i For the id of the current frame, track_id i-1 For the tracking id of the previous frame,
Figure QLYQS_14
for the center point of the previous frame prediction, +.>
Figure QLYQS_15
For the center point of the target tracked by the current frame, x_thresh, y_thresh are allowed center offset thresholds, track_score i Score_thresh is the threshold for tracking, which is the confidence of tracking.
9. Device for realizing target detection, tracking and speed measurement of 4D millimeter wave radar based on deep learning network model, which is characterized in that the device comprises:
a processor configured to execute computer-executable instructions;
a memory storing one or more computer-executable instructions which, when executed by the processor, perform the steps of the method for implementing 4D millimeter wave radar target detection, tracking and speed measurement based on a deep learning network model of any one of claims 1 to 8.
10. A processor for implementing 4D millimeter wave radar target detection, tracking and speed measurement based on a deep learning network model, wherein the processor is configured to execute computer executable instructions that, when executed by the processor, implement the steps of the method for implementing 4D millimeter wave radar target detection, tracking and speed measurement based on a deep learning network model as claimed in any one of claims 1 to 8.
11. A computer readable storage medium having stored thereon a computer program executable by a processor to perform the steps of the method of achieving 4D millimeter wave radar target detection, tracking and speed measurement based on a deep learning network model as claimed in any one of claims 1 to 8.
CN202310647626.0A 2023-06-02 2023-06-02 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning Active CN116403180B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310647626.0A CN116403180B (en) 2023-06-02 2023-06-02 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310647626.0A CN116403180B (en) 2023-06-02 2023-06-02 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning

Publications (2)

Publication Number Publication Date
CN116403180A true CN116403180A (en) 2023-07-07
CN116403180B CN116403180B (en) 2023-08-15

Family

ID=87009015

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310647626.0A Active CN116403180B (en) 2023-06-02 2023-06-02 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning

Country Status (1)

Country Link
CN (1) CN116403180B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112990050A (en) * 2021-03-26 2021-06-18 清华大学 Monocular 3D target detection method based on lightweight characteristic pyramid structure
CN114067292A (en) * 2021-11-25 2022-02-18 纵目科技(上海)股份有限公司 Image processing method and device for intelligent driving
WO2022120901A1 (en) * 2020-12-09 2022-06-16 中国科学院深圳先进技术研究院 Image detection model training method based on feature pyramid, medium, and device
CN114898403A (en) * 2022-05-16 2022-08-12 北京联合大学 Pedestrian multi-target tracking method based on Attention-JDE network
CN115035565A (en) * 2022-05-06 2022-09-09 中国兵器工业计算机应用技术研究所 Visual cortex imitated multi-scale small target detection method, device and equipment
CN115327529A (en) * 2022-09-05 2022-11-11 中国科学技术大学 3D target detection and tracking method fusing millimeter wave radar and laser radar
US20220391621A1 (en) * 2021-06-04 2022-12-08 Microsoft Technology Licensing, Llc Occlusion-aware multi-object tracking
CN115690072A (en) * 2022-11-11 2023-02-03 楚雄师范学院 Chest radiography feature extraction and disease classification method based on multi-mode deep learning
CN115861884A (en) * 2022-12-06 2023-03-28 中南大学 Video multi-target tracking method, system, device and medium in complex scene

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022120901A1 (en) * 2020-12-09 2022-06-16 中国科学院深圳先进技术研究院 Image detection model training method based on feature pyramid, medium, and device
CN112990050A (en) * 2021-03-26 2021-06-18 清华大学 Monocular 3D target detection method based on lightweight characteristic pyramid structure
US20220391621A1 (en) * 2021-06-04 2022-12-08 Microsoft Technology Licensing, Llc Occlusion-aware multi-object tracking
CN114067292A (en) * 2021-11-25 2022-02-18 纵目科技(上海)股份有限公司 Image processing method and device for intelligent driving
CN115035565A (en) * 2022-05-06 2022-09-09 中国兵器工业计算机应用技术研究所 Visual cortex imitated multi-scale small target detection method, device and equipment
CN114898403A (en) * 2022-05-16 2022-08-12 北京联合大学 Pedestrian multi-target tracking method based on Attention-JDE network
CN115327529A (en) * 2022-09-05 2022-11-11 中国科学技术大学 3D target detection and tracking method fusing millimeter wave radar and laser radar
CN115690072A (en) * 2022-11-11 2023-02-03 楚雄师范学院 Chest radiography feature extraction and disease classification method based on multi-mode deep learning
CN115861884A (en) * 2022-12-06 2023-03-28 中南大学 Video multi-target tracking method, system, device and medium in complex scene

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ASHISH PANDHARIPANDE 等: "Sensing and Machine Learning for Automotive Perception: A Review", IEEE SENSORS JOURNAL, vol. 23, no. 11 *
周燕 等, 计算机科学与探索, pages 2695 - 2717 *

Also Published As

Publication number Publication date
CN116403180B (en) 2023-08-15

Similar Documents

Publication Publication Date Title
Lim et al. Radar and camera early fusion for vehicle detection in advanced driver assistance systems
Liang et al. Multi-task multi-sensor fusion for 3d object detection
CN111201451B (en) Method and device for detecting object in scene based on laser data and radar data of scene
CN113159151B (en) Multi-sensor depth fusion 3D target detection method for automatic driving
Dreher et al. Radar-based 2D car detection using deep neural networks
Deng et al. MLOD: A multi-view 3D object detection based on robust feature fusion method
US20210018615A1 (en) Methods and systems for object detection
Ulrich et al. Improved orientation estimation and detection with hybrid object detection networks for automotive radar
Li et al. A feature pyramid fusion detection algorithm based on radar and camera sensor
Song et al. End-to-end learning for inter-vehicle distance and relative velocity estimation in ADAS with a monocular camera
Dimitrievski et al. Weakly supervised deep learning method for vulnerable road user detection in FMCW radar
CN116681730A (en) Target tracking method, device, computer equipment and storage medium
CN116310673A (en) Three-dimensional target detection method based on fusion of point cloud and image features
CN116246119A (en) 3D target detection method, electronic device and storage medium
Li et al. Vehicle object detection based on rgb-camera and radar sensor fusion
Dimitrievski et al. Semantically aware multilateral filter for depth upsampling in automotive lidar point clouds
EP4152274A1 (en) System and method for predicting an occupancy probability of a point in an environment, and training method thereof
CN116403180B (en) 4D millimeter wave radar target detection, tracking and speed measurement method based on deep learning
CN117274036A (en) Parking scene detection method based on multi-view and time sequence fusion
Gu et al. Radar-enhanced image fusion-based object detection for autonomous driving
US20230281877A1 (en) Systems and methods for 3d point cloud densification
Kim et al. Rcm-fusion: Radar-camera multi-level fusion for 3d object detection
Cai et al. 3D vehicle detection based on LiDAR and camera fusion
Ma et al. Disparity estimation based on fusion of vision and LiDAR
Li et al. Attention-based radar and camera fusion for object detection in severe conditions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant