CN117333749A - Multimode hybrid automatic driving unified 3D detection and tracking method - Google Patents

Multimode hybrid automatic driving unified 3D detection and tracking method Download PDF

Info

Publication number
CN117333749A
CN117333749A CN202311382428.2A CN202311382428A CN117333749A CN 117333749 A CN117333749 A CN 117333749A CN 202311382428 A CN202311382428 A CN 202311382428A CN 117333749 A CN117333749 A CN 117333749A
Authority
CN
China
Prior art keywords
bev
detection
features
current frame
unified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311382428.2A
Other languages
Chinese (zh)
Inventor
丁勇
孙瑀
程华元
刘琳琳
牛乐乐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202311382428.2A priority Critical patent/CN117333749A/en
Publication of CN117333749A publication Critical patent/CN117333749A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Optical Radar Systems And Details Thereof (AREA)

Abstract

The invention discloses a multimode hybrid automatic driving unified 3D detection and tracking method, and belongs to the technical field of automatic driving. The invention mainly comprises the following steps: 1. generating BEV features under different modes; 2. generating an adaptively fused BEV feature; 3. generating a single-frame 3D target detection result; 4. generating a single-frame 3D target tracking result; 5. iteration of the target tracking results from frame to frame. Based on the unified 3D detection and tracking method provided by the invention, different sensor data can be fused into unified BEV characteristics, and 3D target detection and 3D target tracking are unified into a whole. Compared with the independent target detection and target tracking model, the unified model can improve instantaneity, accuracy and robustness, and the performance and safety of the automatic driving system are improved. Meanwhile, training cost and deployment difficulty of the model can be reduced.

Description

Multimode hybrid automatic driving unified 3D detection and tracking method
Technical Field
The invention belongs to the technical field of automatic driving, and particularly relates to a multimode hybrid automatic driving unified 3D detection and tracking method.
Background
With the rapid development of automatic driving technology, the realization of efficient and accurate target detection and tracking is critical to the safety and performance of an automatic driving system. Traditional target detection and tracking methods are based primarily on single modality data, such as images or point clouds. However, single modality data may have some limitations in some scenarios, for example, in complex environments, the image may be affected by illumination, occlusion, etc., while the point cloud may be limited by factors such as sensor resolution and noise.
To overcome the limitations of single modality data, researchers have begun exploring the mix of multi-modality data. Multimodal data is typically collected by different sensors, such as cameras and lidar. The image data provides rich visual information, while the point cloud data provides accurate geometric information. By mixing the image and the point cloud data, more comprehensive and accurate target detection and tracking results can be obtained.
Conventional target detection and tracking methods are usually performed separately, and target detection is performed first, and then target tracking is performed. However, this manner of separation may lead to loss and inconsistency of information. In order to achieve more accurate and consistent target detection and tracking, a unified fusion method needs to be provided. The method carries out joint modeling on the target detection and tracking process, and realizes the tight combination of target detection and tracking by sharing the characteristics and the context information, thereby improving the accuracy and the stability of the detection and the tracking.
Transformer is a powerful deep learning model, and has achieved remarkable results in the fields of natural language processing, computer vision and the like. By utilizing the self-attention mechanism and the global context modeling capability of the transducer, the relation and the dependency relationship between the multi-mode data can be better captured, so that the target detection and tracking performance is improved.
Disclosure of Invention
The invention aims to overcome the defects caused by a single mode and fully utilize the advantages of unification of target detection and target tracking, and provides an automatic driving unification 3D detection and tracking method based on the hybrid of the point cloud of a transducer and the multiple modes of an image, so that the training cost and the deployment difficulty can be reduced, and the effect of mutually improving the performance can be obtained by fully utilizing the relevance between the target detection and the target tracking task.
The technical scheme adopted by the invention is as follows:
a multimode hybrid automatic driving unified 3D detection and tracking method comprises the following steps:
step (1), inputting multi-mode data acquired by a laser radar and a camera from an automatic driving system, and respectively extracting BEV characteristics under different modes;
step (2), obtaining BEV characteristics under different modes through the BEV characteristics under the different modes obtained in the step (1), and generating self-adaptive fused BEV characteristics by fusing BEV characteristics under the different modes with self-adaptive fusion weights;
step (3), the BEV characteristics obtained in the step (2) are subjected to self-adaptive fusion, and a transducer encoder is adopted for encoding, so that the encoding characteristics of the current frame are obtained; meanwhile, the self-adaptive fused BEV features pass through a candidate region generation network to complete a 3D target detection task of the current frame, and a series of 3D candidate frames of the current frame are generated;
step (4), splicing a series of 3D candidate frames of the current frame and the processed target tracking result of the previous frame, and inputting the splicing result and the coding characteristics of the current frame into a transducer decoder together to obtain an initial target tracking result of the current frame;
and (5) generating a processed target tracking result of the current frame by using the initial target tracking result of the current frame acquired in the step (4), and finally outputting the target tracking result of the whole multi-frame input through continuous iteration between frames.
Further, the step (5) includes:
step (5.1) dividing the initial target tracking result of the current frame obtained in the step (4) into a new object set corresponding to the 3D target detection output result of the current frameOld object set corresponding to the target tracking result of the last frame processed +.>
Step (5.2) collecting the new objects obtained in step (5.1)And old object set->Respectively judging, and removing objects which do not meet the requirements from the collection;
step (5.3) combining the processed new object setsAnd old object set->And generating a target tracking result of the processed current frame.
Further, in the step (5.2), for a new object setIf the detection confidence of a certain object is larger than the set threshold value, the object is reserved, otherwise, the object is added from the new object set>Removing; for old object collection->If the detection confidence of a certain object lasting for 3 frames is smaller than the set threshold value, the object is added from the old object set>And eliminating, otherwise, keeping the object.
The invention has the beneficial effects that:
the invention designs a complete multimode hybrid automatic driving unified 3D detection and tracking method, which comprises a plurality of stages of multimode BEV feature generation and target detection and target tracking. In the generation stage of the multi-mode BEV features, the method can process the sensor data of the laser radar and the camera, and is fused into a unified BEV feature space, so that the method can flexibly adapt to the change of the number of sensors and can be used as the input features of subsequent target detection and target tracking tasks. In the realization stage of target detection and target tracking, the method designs an encoder and a decoder based on a transducer structure, and effectively combines a target detection task and a target tracking task together, so that the performance of the two tasks is improved by fully utilizing the relevance between the detection task and the tracking task, and meanwhile, the difficulty of improving the training difficulty of a plurality of independent models and the difficulty of model deployment are effectively reduced.
Compared with the independent target detection and target tracking model, the unified model can improve instantaneity, accuracy and robustness, and the performance and safety of the automatic driving system are improved.
Drawings
Fig. 1 is a flowchart of a multi-mode hybrid automatic driving unified 3D detection and tracking method according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The multi-mode hybrid automatic driving unified 3D detection and tracking method is used for realizing more accurate and stable target detection and tracking by mixing images and point cloud data and combining the capability of a transducer model by utilizing a unified fusion strategy, and provides better support for the safety and performance of an automatic driving system.
Variable subscripts "LiDAR" and "cam" are set to distinguish between two different sensors, liDAR and Camera, and variable subscripts "det" and "track" are set to distinguish between Object Detection (Object Detection) and Object Tracking (Object Tracking) tasks. As shown in fig. 1, the specific implementation steps of the present invention are:
step (1) laser radar input to automatic driving systemAnd camera->Is used to generate a network ψ using different BEVs, respectively lidar 、Ψ cam The method comprises the steps of converting two groups of different input modes into a unified BEV visual angle, wherein the generated BEV features under different modes have the same spatial resolution W multiplied by H and the same feature dimension C, and are used for realizing fusion of the features of a laser radar and a camera, and the specific calculation formula is as follows:
wherein,representing BEV characteristics in lidar mode, < >>Representing BEV features in the camera modality. In this embodiment, the BEV generates a network ψ lidar 、Ψ cam May be implemented using existing networks such as BEVFormer, bevFusion.
And (2) obtaining BEV characteristics under different modes obtained in the step (1) through self-adaptive BEV characteristic fusion network learning, and adaptively fusing the BEV characteristics under different modes together by adopting a proper mode to generate self-adaptive fused BEV characteristics. The method specifically comprises the following steps:
step (2.1) BEV characterization by lidar point cloud modalityGenerating corresponding lidar BEV feature adaptive fusion weights +.>
Wherein Γ is lidar And (3) obtaining the self-adaptive fusion weight with the same receptive field as the original BEV characteristics through downsampling and upsampling of a multi-layer convolutional neural network.
Step (2.2) BEV characterization by camera modalityGenerating corresponding camera BEV characteristic self-adaptive fusion weight +.>
Wherein Γ is cam And (3) obtaining the self-adaptive fusion weight with the same receptive field as the original BEV characteristics through downsampling and upsampling of a multi-layer convolutional neural network.
Step (2.3) BEV characteristics of fusion of a laser radar point cloud mode and a camera modeGenerating BEV characteristic self-adaptive fusion weight of fusion of corresponding laser radar point cloud and camera>
Wherein Γ is lidar-cam And (3) obtaining the self-adaptive fusion weight with the same receptive field as the original BEV characteristics through downsampling and upsampling of a multi-layer convolutional neural network.
Step (2.4) the fusion weights generated in the steps (2.1) to (2.3) are obtained by using a normalization function Is->And (3) carrying out numerical normalization:
wherein sigma is a normalization function, and can be realized by adopting a Softmax function;
step (2.5) BEV characterization of the three modalitiesIs->Corresponding adaptive fusion weight after normalization +.>Is->Processed to obtain an adaptively fused BEV characteristic->
Here, the BEV characteristics of the three modes,is->First, the self-adaptive fusion weights after normalization corresponding to the self-adaptive fusion weights are respectively added>Is->Multiplication results in three sets of BEV features of the same size as the original BEV features. />Representing a stitching operation, where three sets of C-dimensional BEV features are stitched into a 3C-dimensional feature; the MLP network is used for converting feature dimension into C dimension, and finally obtaining self-adaptive fused BEV feature with W×H×C>The sensor can flexibly adapt to the change of the number of the sensors. The adaptive BEV feature fusion network includes Γ lidar Network Γ cam Network Γ lidar-cam Networks and MLP networks.
Step (3) adaptively fused BEV features obtained by step (2)A transducer-based encoder is designed for encoding. Meanwhile, a candidate region generation network (RPN) is designed, and BEV characteristics which are adaptively fused are +.>Completing a 3D target detection task of a current frame through an RPN network, and generating a series of 3D candidate frames of the current frame, wherein the 3D candidate frames are specifically as follows:
step (3.1) BEV characterization by adaptive fusionGenerating a series of 3D candidate frames of the current frame through an RPN network as an output result of 3D target detection of the current frame:
wherein,and representing the 3D target detection result at the time t.
Step (3.2) BEV characterization by adaptive fusionCoding feature of the current frame is obtained via a transform structure based encoder ENC>
The encoder ENC is composed of a plurality of serial encoder modules, adjacent modules being connected in a residual manner. Each module is constructed with an attention mechanism, including the following main calculation processes:
wherein, ATT comprises the following main calculations:
wherein,is the input of the encoder module and will +.>Query, key and value as attention mechanisms, respectively; the sigma function is a Softmax function and is used for normalizing the correlation matrix; FFN is a feed-forward neural network with two layers; c is the feature dimension of the input adaptively fused BEV feature, which may be generally set to 128. The output of the encoder and the 3D candidate box output by the object detection task will be the input of the decoder of the transducer.
Designing a transducer decoder to obtain the encoding characteristics of the current frame output by the transducer encoder obtained in the step (3)Simultaneously processing the 3D candidate frame of the current frame acquired in the step (3)Splicing the target tracking results of the previous frame, and adding the coding characteristic of the current frame>The target tracking result of the current frame is obtained by inputting the target tracking result and the splicing result to a decoder of a transducer together, wherein the target tracking result is specifically as follows:
step (4.1) splice the processed target tracking result of the previous frame and the 3D candidate frames of the series of current frames generated in step (3.1):
wherein,representing a splicing operation->3D candidate box representing current frame +.>Representing the target tracking result of the last frame processed.
Step (4.2) the coding feature of the current frame obtained in the step (3.2) after being coded by the coderAnd +.A obtained in step (4.1)>Input to the decoder of the transform structure to obtain the initial target tracking result of the last current frame +.>The specific calculation formula is as follows:
ATT includes the following main calculations:
wherein,as a query to the decoder,>as a key of the decoder,>as a decoder value, the sigma function is a Softmax function for normalizing the correlation matrix; FFN is a feed-forward neural network with two layers; c is the characteristic dimension of the input, and can be generally set to 128;
step (5) designing a life cycle control network of the target, and utilizing the initial target tracking result of the current frame obtained in the step (4)Generating a processed target tracking result +.>The method is used for target tracking of the next frame, so that iteration is continuously carried out between frames, and finally a target tracking result of the whole multi-frame input is output, and the method is concretely as follows:
step (5.1) according to step (4.1)The initial target tracking result of the current frame acquired in the step (4.2) is +.>Divided into two groups->And->Wherein->Correspond to->Decoding result of->Correspond to->Decoding result of->The number of detection targets contained is respectively equal to->The corresponding target numbers are consistent. The expression is as follows:
wherein d isIs a function of the number of detected objects.
Step (5.2) the two groups of results obtained in the step (5.1) are treated respectively, and thenFor processing the originally tracked object and controlling the object to leave the current scene; will->For processing newly generated objects and controlling the objects to enter the current scene. For the departure of the object, when the detection confidence of the object is smaller than the set threshold value tau left And for 3 frames, the object is removed from the tracked object list. For the entry of the object, when the detection confidence of the object is greater than the set threshold value tau en At that point, the object is added to the tracked queue. The specific calculation formula is as follows:
wherein q k,old Representation ofThe kth object in the collection, q k,new Representation->Kth object in the collection, +.>Represents q k,old Detection confidence in the last three frames, +.>Represents q k,new Detection confidence in the current frame, +.>Representing the set of objects after handling the departure of the object, < > and>representing a set of objects after having processed the object into the scene, < >>Representing the resulting processed target tracking result to be used in the tracking network for the next frame.
The foregoing description is only illustrative of the specific embodiments of the present application and the principles of the technology being employed. It will be appreciated by persons skilled in the art that the scope of the invention referred to in this application is not limited to the specific combinations of features described above, but also covers other embodiments which may be formed by any combination of features described above or their equivalents without departing from the spirit of the application. Such as the above-described features and technical features having similar functions (but not limited to) disclosed in the present application are replaced with each other.

Claims (9)

1. The multi-mode hybrid automatic driving unified 3D detection and tracking method is characterized by comprising the following steps of:
step (1), inputting multi-mode data acquired by a laser radar and a camera from an automatic driving system, and respectively extracting BEV characteristics under different modes;
step (2), obtaining BEV characteristics under different modes through the BEV characteristics under the different modes obtained in the step (1), and generating self-adaptive fused BEV characteristics by fusing BEV characteristics under the different modes with self-adaptive fusion weights;
step (3), the BEV characteristics obtained in the step (2) are subjected to self-adaptive fusion, and a transducer encoder is adopted for encoding, so that the encoding characteristics of the current frame are obtained; meanwhile, the self-adaptive fused BEV features pass through a candidate region generation network to complete a 3D target detection task of the current frame, and a series of 3D candidate frames of the current frame are generated;
step (4), splicing a series of 3D candidate frames of the current frame and the processed target tracking result of the previous frame, and inputting the splicing result and the coding characteristics of the current frame into a transducer decoder together to obtain an initial target tracking result of the current frame;
and (5) generating a processed target tracking result of the current frame by using the initial target tracking result of the current frame acquired in the step (4), and finally outputting the target tracking result of the whole multi-frame input through continuous iteration between frames.
2. The method for unified 3D detection and tracking of multi-modal hybrid automatic driving according to claim 1, wherein in the step (1), multi-modal data collected by a laser radar and a camera from an automatic driving system are input, and converted to a consistent BEV viewing angle by respective BEV generating networks, respectively, and BEV features under different generated modalities have the same spatial resolution and the same feature dimension.
3. The multi-modal hybrid automatic driving unified 3D detection and tracking method of claim 1, wherein step (2) comprises:
generating adaptive fusion weights of corresponding laser radar BEV features from the BEV features of the laser radar modality;
generating adaptive fusion weights of corresponding camera BEV features from the BEV features of the camera modality;
generating self-adaptive fusion weights of the corresponding laser radar and camera fusion BEV features by the BEV features fused by the laser radar mode and the camera mode;
step (2.4) carrying out numerical normalization on the three self-adaptive fusion weights generated in the steps (2.1) to (2.3);
and (2.5) obtaining the BEV characteristics of the self-adaptive fusion through the BEV characteristics of the three modes and the corresponding normalized self-adaptive fusion weights.
4. The multi-modal hybrid automatic driving unified 3D detection and tracking method of claim 3, wherein step (2.5) specifically comprises: multiplying the BEV features of the three modes with the corresponding normalized self-adaptive fusion weights respectively to obtain three groups of BEV features with the same size as the original BEV features; and splicing the three groups of BEV features to obtain new BEV features with 3 times of the original BEV feature dimension, and converting the dimension of the new BEV features into the self-adaptive fusion BEV features with the same dimension as the original BEV features by using the MLP network.
5. The multi-modal hybrid automatic driving unified 3D detection and tracking method of claim 1, wherein step (3) comprises:
step (3.1) generating a series of 3D candidate frames of the current frame as an output result of 3D target detection of the current frame by adaptively fused BEV features through an RPN network;
and (3.2) obtaining the coding characteristic of the current frame from the adaptively fused BEV characteristic through an encoder based on a transducer structure.
6. The multi-modal hybrid automatic driving unified 3D detection and tracking method of claim 5, wherein the transducer structure based encoder is comprised of a plurality of serial encoder modules, adjacent modules being connected by means of residuals; each module is built with an attention mechanism whose queries, keys and values are the adaptively fused BEV features obtained in step (2).
7. The multi-modal hybrid automatic driving unified 3D detection and tracking method of claim 5, wherein step (4) comprises:
step (4.1) the processed target tracking result of the previous frame is spliced with the 3D target detection output result of the current frame generated in the step (3.1);
and (4.2) inputting the coding characteristics of the current frame obtained in the step (3.2) and the splicing result obtained in the step (4.1) into a decoder of a transducer structure to obtain an initial target tracking result of the current frame.
8. The multi-modal hybrid automatic driving unified 3D detection and tracking method of claim 1, wherein step (5) comprises:
step (5.1) dividing the initial target tracking result of the current frame obtained in the step (4) into a new object set corresponding to the 3D target detection output result of the current frameOld object set corresponding to the target tracking result of the last frame processed +.>
Step (5.2) collecting the new objects obtained in step (5.1)And old object set->Respectively judging, and removing objects which do not meet the requirements from the collection;
step (5.3) combining the processed new object setsAnd old object set->Generating the passing pointAnd (5) tracking the target of the current frame.
9. The unified 3D detection and tracking method for multi-modal hybrid automatic driving as recited in claim 8, wherein in step (5.2), for new object setsIf the detection confidence of a certain object is larger than the set threshold value, the object is reserved, otherwise, the object is added from the new object set>Removing; for old object collection->If the detection confidence of a certain object lasting for 3 frames is smaller than the set threshold value, the object is added from the old object set>And eliminating, otherwise, keeping the object.
CN202311382428.2A 2023-10-24 2023-10-24 Multimode hybrid automatic driving unified 3D detection and tracking method Pending CN117333749A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311382428.2A CN117333749A (en) 2023-10-24 2023-10-24 Multimode hybrid automatic driving unified 3D detection and tracking method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311382428.2A CN117333749A (en) 2023-10-24 2023-10-24 Multimode hybrid automatic driving unified 3D detection and tracking method

Publications (1)

Publication Number Publication Date
CN117333749A true CN117333749A (en) 2024-01-02

Family

ID=89279078

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311382428.2A Pending CN117333749A (en) 2023-10-24 2023-10-24 Multimode hybrid automatic driving unified 3D detection and tracking method

Country Status (1)

Country Link
CN (1) CN117333749A (en)

Similar Documents

Publication Publication Date Title
CN112347859B (en) Method for detecting significance target of optical remote sensing image
Yang et al. Spatio-temporal domain awareness for multi-agent collaborative perception
CN115223082A (en) Aerial video classification method based on space-time multi-scale transform
CN116385761A (en) 3D target detection method integrating RGB and infrared information
CN113326735A (en) Multi-mode small target detection method based on YOLOv5
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN115131281A (en) Method, device and equipment for training change detection model and detecting image change
CN111696136A (en) Target tracking method based on coding and decoding structure
CN115588237A (en) Three-dimensional hand posture estimation method based on monocular RGB image
Xie et al. YOLO-MS: Multispectral object detection via feature interaction and self-attention guided fusion
CN117788544A (en) Image depth estimation method based on lightweight attention mechanism
CN110942463B (en) Video target segmentation method based on generation countermeasure network
CN117058392A (en) Multi-scale Transformer image semantic segmentation method based on convolution local enhancement
CN116740480A (en) Multi-mode image fusion target tracking method
CN114693953B (en) RGB-D significance target detection method based on cross-mode bidirectional complementary network
CN115909408A (en) Pedestrian re-identification method and device based on Transformer network
CN117333749A (en) Multimode hybrid automatic driving unified 3D detection and tracking method
CN113920317A (en) Semantic segmentation method based on visible light image and low-resolution depth image
Yan et al. EMTNet: efficient mobile transformer network for real-time monocular depth estimation
CN111126310A (en) Pedestrian gender identification method based on scene migration
CN118229781B (en) Display screen foreign matter detection method, model training method, device, equipment and medium
Zhou et al. Underwater occluded object recognition with two-stage image reconstruction strategy
Huang et al. SOAda-YOLOR: Small Object Adaptive YOLOR Algorithm for Road Object Detection
CN116680656B (en) Automatic driving movement planning method and system based on generating pre-training converter
Zheng et al. A Dual Encoder-Decoder Network for Self-supervised Monocular Depth Estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination