CN116152579A - Point cloud 3D target detection method and model based on discrete Transformer - Google Patents

Point cloud 3D target detection method and model based on discrete Transformer Download PDF

Info

Publication number
CN116152579A
CN116152579A CN202310307131.3A CN202310307131A CN116152579A CN 116152579 A CN116152579 A CN 116152579A CN 202310307131 A CN202310307131 A CN 202310307131A CN 116152579 A CN116152579 A CN 116152579A
Authority
CN
China
Prior art keywords
voxel
discrete
point cloud
module
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310307131.3A
Other languages
Chinese (zh)
Inventor
李志恒
黄迪和
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen International Graduate School of Tsinghua University
Original Assignee
Shenzhen International Graduate School of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen International Graduate School of Tsinghua University filed Critical Shenzhen International Graduate School of Tsinghua University
Priority to CN202310307131.3A priority Critical patent/CN116152579A/en
Publication of CN116152579A publication Critical patent/CN116152579A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a point cloud 3D target detection method and model based on discrete transformers, wherein the method comprises the following steps: s1, acquiring a point cloud data frame of an object in real time; s2, carrying out point cloud voxelization on the point cloud data frame to obtain an initial voxel; s3, extracting voxel characteristics containing dynamic information and static information from the initial voxels through a 3D backbone network based on a discrete transducer; s4, mapping the voxel characteristics finally output in the step S3 to BEV space to obtain corresponding 2DBEV characteristics; and S5, the 2DBEV features are sent to a 3D target detector through a Neck network, 3D target detection is carried out, and object attribute information of an object in a 3D space is obtained.

Description

Point cloud 3D target detection method and model based on discrete Transformer
Technical Field
The invention relates to the field of automatic driving object perception, in particular to a point cloud 3D target detection method and model based on discrete transformers.
Background
An automatic driving automobile is a complex unmanned system which senses the environment and performs decision control by means of an on-board sensor. In order to realize the decision and control of automatic driving, the sensor (usually including a laser radar and a camera) is needed to sense the surrounding environment, and the sensor data is processed to obtain the 3D semantic information of the object in the surrounding environment.
The 3D object detection based on the laser radar is a key technology for solving the problem of environmental perception caused by automatic driving, and 3D semantic information of an object is obtained by encoding and decoding a point cloud data frame acquired in real time through a neural network. In order to achieve high efficiency, a voxel-based point cloud 3D target detection algorithm is currently commonly used in the field of autopilot. These methods first quantize the point cloud into voxels, and then extract the features of the voxels using a 3D sparse convolution based backbone network. However, since the point cloud data structure is a random unstructured discrete form, it is often difficult to extract the diverse geometric structure information of the object using static convolution with fixed weights. Recently VOTR has proposed the use of voxel transformers to extract the dynamic characteristics of voxels. The algorithm generates a voxel hash table from the voxel coordinates to index the voxel features and then advances the padding to a specified length into a conventional full-attention mechanism. However, this transducer ignores the sparse discrete nature of the point cloud, and requires a specified number of key voxel features for each query voxel feature, resulting in a significant increase in computation and increased time consumption. And the use of a large receptive field by the VOTR is detrimental to the detection of small objects. Second, the VOTR is entirely composed of transducers, which make it difficult to extract the static features of the point cloud. In order to improve the accuracy and recall rate of the point cloud 3D target detection algorithm, the existing 3D backbone network needs to be improved, and the dynamic characteristics and the static characteristics of the point cloud are effectively reserved.
Disclosure of Invention
In order to solve the problem of efficiently extracting dynamic and static characteristics of the point cloud in the existing point cloud 3D detection technology, the invention provides a point cloud 3D target detection method and model based on discrete transformers, so that a network can efficiently retain the static characteristics and dynamic characteristics of the point cloud.
According to an embodiment of the present invention, a point cloud 3D target detection method based on discrete transformers is provided, including the following steps: s1, acquiring a point cloud data frame of an object in real time; s2, carrying out point cloud voxelization on the point cloud data frame to obtain an initial voxel; s3, extracting voxel characteristics containing dynamic information and static information from the initial voxels through a 3D backbone network based on a discrete transducer; s4, mapping the voxel characteristics finally output in the step S3 to BEV space to obtain corresponding 2DBEV characteristics; and S5, the 2DBEV features are sent to a 3D target detector through a Neck network, 3D target detection is carried out, and object attribute information of an object in a 3D space is obtained.
According to another embodiment of the present invention, a point cloud 3D object detection model based on discrete transformers is provided, including: the point cloud voxelization module is used for carrying out point cloud voxelization on the point cloud data frame of the object and outputting an initial voxel; the 3D backbone network based on the discrete Transformer is connected to the output end of the point cloud voxelization module and is used for extracting voxel characteristics containing dynamic information and static information from the initial voxels; the voxel feature mapping module is connected with the 3D backbone network based on the discrete convertors and is used for mapping the voxel features containing dynamic information and static information to a BEV space to obtain corresponding 2DBEV features; and the Neck network is connected with the output end of the voxel feature mapping module and is used for sending the 2DBEV features to a 3D object detector for 3D object detection to obtain object attribute information of an object in a 3D space.
The invention provides a general grid point cloud feature extraction backbone network, which can be applied to all the existing grid-based point cloud 3D detectors. Compared with the prior art (algorithms such as CenterPoint, PV-RCNN, focals, voxel-RCNN, SST, pillarNet, pointPillar, and the like), the detection method provided by the invention can effectively extract dynamic characteristics and static characteristics of the point cloud, and retain more abundant 3D geometric information, thereby greatly improving the precision and recall rate of the 3D target detection algorithm and improving the perception capability of an automatic driving automobile to the surrounding environment.
Drawings
Fig. 1 is a schematic flow chart of performing point cloud 3D target detection based on a point cloud 3D target detection model of a discrete transducer according to an embodiment of the present invention.
FIG. 2 is a flow chart of processing voxels by a discrete transducer module according to an embodiment of the present invention.
FIG. 3 is a schematic flow chart of processing voxels by a multi-scale discrete transducer module according to an embodiment of the present invention.
FIG. 4 is a schematic diagram of a discrete attention mechanism of an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to the drawings and the detailed description. It should be understood that the examples are provided for the purpose of illustration only and are not intended to limit the scope of the invention.
An embodiment of the present invention provides a point cloud 3D target detection model based on discrete convertors, wherein a network architecture of the model please refer to fig. 1, and the network architecture is sequentially connected with: a point cloud voxelization module, a sub-manifold 3D sparse convolution 10, a discrete transform module 20, a multi-scale discrete transform module 30, a multi-scale discrete transform module 40, a BEV mapping module for voxel features, a negk network, and a 3D object detector. BEV is Bird's eye view.
The sub-manifold 3D sparse convolution 10, the discrete transform module 20, the multi-scale discrete transform module 30 and the multi-scale discrete transform module 40 form a 3D backbone network of the model, and are mainly responsible for extracting voxel characteristics containing dynamic information and static information from the voxelized point cloud. The 3D backbone network based on the discrete Transformer uses the discrete Transformer module to extract static information and dynamic information of the voxelized point cloud; the discrete transform module firstly obtains a down-sampled voxel through down-sampling 3D sparse convolution to serve as a query feature of a discrete attention mechanism, then extracts a static feature of the voxel through sub-manifold 3D sparse convolution, extracts a dynamic feature of the voxel through the discrete attention mechanism, and splices the dynamic feature and the static feature along a channel dimension to serve as an output feature.
Wherein the discrete transducer module 20, the multi-scale discrete transducer module 30 and the multi-scale discrete transducer module 40 are each composed of a downsampled 3D sparse convolution, a sub-manifold 3D sparse convolution and a 3D discrete attention mechanism, the main differences between the multi-scale discrete transducer module (30, 40) and the discrete transducer module 20 are that: 1) The inputs to the multi-scale discrete transducer module (30, 40) are voxels of two different scales; 2) Distinction in performing attention calculations (detailed later).
Another embodiment of the present invention provides a method for detecting a point cloud 3D target based on discrete convertors, where a flowchart of the method is shown in fig. 1, and the method includes: acquiring a point cloud data frame of an object in real time by using a laser radar; performing point cloud voxelization on the point cloud data frame through mean voxelization or dynamic voxelization based on a multi-layer perceptron to obtain an initial voxel V 0 (the voxels contain voxel features and voxel coordinates); initial voxel V 0 The receptive field is increased by sub-manifold 3D sparse convolution 10, the output voxels are denoted V 1 The method comprises the steps of carrying out a first treatment on the surface of the Voxel V 1 Sent to a discrete transducer module 20 to obtain a voxel V 2 The method comprises the steps of carrying out a first treatment on the surface of the Voxel V 1 And V is equal to 2 Feeding into a multi-scale discrete transducer module 30 to obtain a voxel V 3 The method comprises the steps of carrying out a first treatment on the surface of the Voxel V 2 And V is equal to 3 Feeding into a multi-scale discrete transducer module 40 to obtain a voxel V 4 The method comprises the steps of carrying out a first treatment on the surface of the Voxel V 4 Mapping to BEV space yields corresponding 2DBEV features, denoted F bev The method comprises the steps of carrying out a first treatment on the surface of the Finally F is arranged bev And sending the object information to a 3D object detector through a Neck network to detect a 3D object, and obtaining object attribute information such as the position of an object in a 3D space, the three-dimensional size of a boundary frame, the course angle of the object and the like.
In some embodiments of the present invention, in some embodiments, the convolution kernel size of the sub-manifold 3D sparse convolution 10 may be, for example, 3 x 3.
Voxel V 1 Sent to a discrete transducer module 20 to obtain a voxel V 2 The specific steps of (a) are as follows: please refer to fig. 2, according to voxel V 1 (corresponding to voxel v in FIG. 2) 0 Voxel coordinates with voxel feature dimensions denoted as mxc) generate a voxel hash table, each row of which stores a voxel value (calculation process of voxel value: the ith voxel coordinate is noted as (x i ,y i ,z i ) All voxel coordinates have a maximum value of (x max ,y max ,z max ) The value of the ith voxel is x i *y max *z max +y i *z max +z i ) Index id with the voxel, then voxel V 1 (corresponding to voxel v in FIG. 2) 0 ) Obtained by a downsampled 3D sparse convolutionVoxel v with voxel feature dimension of n×2c 1 Voxel v 1 Downsampled voxel v by means of a sub-manifold 3D sparse convolution 11 2 (voxel feature dimension is N C). Referring next to FIG. 4 in combination, voxel V is utilized 1 (i.e., v in FIG. 2) 0 ) And v 2 Attention calculations were performed: according to v 2 Is at voxel V with a search space of 3 x 3 1 Searching voxels in the corresponding range as key voxels for attention calculation in the hash table of (a) to obtain a key index table and a query index table (for v) 2 The (x) coordinates of the (i) th voxel i ,y i ,z i ) Then at voxel V 1 The internal search is that the coordinates are 2x i -1≤x≤2x i +1,2y i -1≤y≤2y i +1,2z i -1≤z≤2z i Voxels in the +1 range, then get the query index table and key index table); with continued reference to FIG. 4, v is removed from the query index table 2 Mid-index obtaining query features (the feature of the voxel obtained by indexing is K×C in dimension), removing voxel V according to key index table 1 The mid index obtains key features (dimension is K multiplied by C); then carrying out dot product on the query feature and the key feature, summing along the feature dimension to obtain voxel feature with dimension of Kx1, then carrying out discrete Softmax according to the query index table to obtain Kx1-dimensional attention score, carrying out dot product on the attention score and the query feature to obtain feature with dimension of KxC, and carrying out discrete summation according to the query index table to obtain feature F with dimension of NxC attention Finally, voxel v obtained by sub-manifold 3D sparse convolution 12 3 Splicing along the characteristic dimension to obtain an output voxel V with dimension of Nx2C 2 (corresponding to voxel v in FIG. 2) 4 )。
Voxel V 1 And V is equal to 2 Feeding into a multi-scale discrete transducer module 30 to obtain a voxel V 3 The specific steps of (a) are as follows: referring to FIG. 3, FIG. 3 is a schematic diagram of the processing inside a multi-scale discrete transducer module, voxel V 2 As voxel v in fig. 3 0 Voxel v in fig. 3 0 (V 2 ) The processing procedure of (2) is the same as that of fig. 2, and is not repeated here; voxel V 1 And V 2 As "Duoduo" in FIG. 3The main difference between the processing voxels, the internal network architecture of the multi-scale discrete transducer module 30, 40 (fig. 3) being identical to the internal network architecture of the discrete transducer module 20 (fig. 2), is that the input to the discrete attention mechanism calculation process also contains multi-scale voxels, which are identical to the aforementioned "pair of voxels V 1 Fed into a discrete transducer module 20 to obtain a voxel V 2 The difference in the attention calculation in the "step" is that: will be to V 1 And V is equal to 2 Calculating 2 key index tables and 2 query index tables, and obtaining dimensions K respectively 1 XC and K 2 Two key features of XC, and dimensions are K respectively 1 XC and K 2 The two query features of the XC are spliced, and then the two key features are spliced to obtain the two-dimensional (K) 1 +K 2 ) The key feature and the query feature of the xC at this time, the dimensions corresponding to the "key feature" and the "query feature" in FIG. 4 are (K 1 +K 2 ) XC corresponds to the value of K 1 +K 2 ) Instead of K in the "key feature" and "query feature" dimensions in FIG. 4, the subsequent attention calculation steps are then repeated to obtain feature F having dimensions NxC attention '. With continued reference to FIG. 3, feature F is obtained via a discrete attention mechanism attention ' and voxel v 3 Splicing along the characteristic dimension to obtain an output voxel V with dimension of Nx2C 3 (corresponding to voxel v in FIG. 3) 4 )。
Voxel V 2 And V is equal to 3 Feeding into a multi-scale discrete transducer module 40 to obtain a voxel V 4 The specific steps of (a) are as follows: since the present step and the previous step are implemented by using a multi-scale discrete transducer module, the processing steps and principles are the same, and still contrast to fig. 3, only in the present step: voxel V 2 And V 3 As "multiscale voxel" in fig. 3, voxel V 2 As voxel v in fig. 3 0 The feature output after the discrete attention mechanism is a feature F with dimension of N multiplied by C attention ", voxel V 4 Equivalent to voxel v in fig. 3 4
Voxel V 4 Mapping to BEV space yields the corresponding 2DBEV feature F bev The specific steps of (a) are as follows: voxel V 4 Further downsampling is performed through a 3D sparse convolution, then the high-dimensional features are spliced to the channel dimension, mapping from the 3D features to BEV features is completed, and 2D features F of the BEV space are obtained bev
Finally F is arranged bev And sending the object to a 3D object detector through a Neck network to detect a 3D object, and obtaining the object attributes such as the position of the object in the 3D space, the three-dimensional size of the bounding box, the heading angle of the object, the class of the object and the like. It should be understood that the 3D object detector according to the embodiments of the present invention may be a 3D detection head such as CenterHead, PV-RCNNHead, voxel-RCNNHead, which is not limited by the present invention, and different 3D detection heads may output different object attributes.
In the detection method and the model provided by the embodiment of the invention, the use of a discrete attention mechanism can enable the transducer to be efficiently applied to the point cloud; according to the detection method and the detection model, dynamic characteristics and static characteristics of the point cloud can be extracted efficiently through the discrete convertors; the detection method and the model of the embodiment of the invention can be applied to all point cloud 3D target detection algorithms based on grids.
The detection method and the model provided by the embodiment of the invention can be applied to the sensing of the environment in automatic control scenes such as automatic driving automobiles and robots, and the like, and can improve the performance of the existing point cloud 3D target detection algorithm on the premise of not adding extra time consumption, and improve the sensing capability of the automatic driving automobiles to the environment, the accuracy rate of target detection and the recall rate.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several equivalent substitutions and obvious modifications can be made without departing from the spirit of the invention, and the same should be considered to be within the scope of the invention.

Claims (10)

1. A point cloud 3D target detection method based on discrete transformers is characterized by comprising the following steps:
s1, acquiring a point cloud data frame of an object in real time;
s2, carrying out point cloud voxelization on the point cloud data frame to obtain an initial voxel;
s3, extracting voxel characteristics containing dynamic information and static information from the initial voxels through a 3D backbone network based on a discrete transducer;
s4, mapping the voxel characteristics finally output in the step S3 to BEV space to obtain corresponding 2DBEV characteristics;
and S5, the 2DBEV features are sent to a 3D target detector through a Neck network, 3D target detection is carried out, and object attribute information of an object in a 3D space is obtained.
2. The discrete Transformer-based point cloud 3D object detection method of claim 1, wherein: and in the step S2, the initial voxel is obtained through mean voxelization or dynamic voxelization based on a multi-layer perceptron.
3. The discrete transducer-based point cloud 3D object detection method as claimed in claim 1, wherein step S3 specifically comprises:
s31, inputting the initial voxel into a first sub-manifold 3D sparse convolution to obtain a first voxel;
s32, inputting the first voxel into a discrete transducer module to obtain a second voxel;
s33, inputting the first voxel and the second voxel into a first multi-scale discrete transducer module to obtain a third voxel;
s34, inputting the second voxel and the third voxel into a second multi-scale discrete transducer module to obtain a fourth voxel, wherein the fourth voxel is used as the voxel characteristic finally output in the step S3.
4. The discrete Transformer-based point cloud 3D target detection method of claim 3, wherein: the discrete transform module, the first multi-scale discrete transform module, and the second multi-scale discrete transform module are each comprised of a downsampled 3D sparse convolution, a second sub-manifold 3D sparse convolution, and a 3D discrete attention mechanism.
5. The discrete Transformer-based point cloud 3D object detection method of claim 1, wherein: the 3D backbone network based on the discrete Transformer uses the discrete Transformer module to extract static information and dynamic information of the voxelized point cloud;
the discrete transform module firstly obtains a down-sampled voxel through down-sampling 3D sparse convolution to serve as a query feature of a discrete attention mechanism, then extracts a static feature of the voxel through sub-manifold 3D sparse convolution, extracts a dynamic feature of the voxel through the discrete attention mechanism, and splices the dynamic feature and the static feature along a channel dimension to serve as an output feature.
6. The discrete Transformer-based point cloud 3D object detection method of claim 1, wherein the step S4 specifically includes:
and (3) splicing the height dimension features to the channel dimension after the voxel features finally output in the step (S3) are subjected to 3D sparse convolution, and completing the mapping from the 3D space to the BEV space to obtain the 2DBEV features.
7. The discrete Transformer-based point cloud 3D object detection method of claim 1, wherein: the object attribute information includes the position of the object in the 3D space, the three-dimensional size of the bounding box, the heading angle of the object and the category.
8. A discrete transducer-based point cloud 3D object detection model, comprising:
the point cloud voxelization module is used for carrying out point cloud voxelization on the point cloud data frame of the object and outputting an initial voxel;
the 3D backbone network based on the discrete Transformer is connected to the output end of the point cloud voxelization module and is used for extracting voxel characteristics containing dynamic information and static information from the initial voxels;
the voxel feature mapping module is connected with the 3D backbone network based on the discrete convertors and is used for mapping the voxel features containing dynamic information and static information to a BEV space to obtain corresponding 2DBEV features;
and the Neck network is connected with the output end of the voxel feature mapping module and is used for sending the 2DBEV features to a 3D object detector for 3D object detection to obtain object attribute information of an object in a 3D space.
9. The discrete Transformer-based point cloud 3D object detection model of claim 8, wherein: the discrete-transform-based 3D backbone network comprises a first sub-manifold 3D sparse convolution, a discrete transform module, a first multi-scale discrete transform module and a second multi-scale discrete transform module which are sequentially connected;
the first sub-manifold 3D sparse convolution takes the initial voxel as input and outputs a first voxel; the discrete transducer module takes the first voxel as input and outputs a second voxel; the first multi-scale discrete transducer module takes the first voxel and the second voxel as input and outputs a third voxel; the second multi-scale discrete transducer module takes the second voxel and the third voxel as input and outputs a fourth voxel.
10. The discrete Transformer-based point cloud 3D object detection model of claim 9, wherein: the discrete transform module, the first multi-scale discrete transform module, and the second multi-scale discrete transform module are each comprised of a downsampled 3D sparse convolution, a second sub-manifold 3D sparse convolution, and a 3D discrete attention mechanism.
CN202310307131.3A 2023-03-27 2023-03-27 Point cloud 3D target detection method and model based on discrete Transformer Pending CN116152579A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310307131.3A CN116152579A (en) 2023-03-27 2023-03-27 Point cloud 3D target detection method and model based on discrete Transformer

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310307131.3A CN116152579A (en) 2023-03-27 2023-03-27 Point cloud 3D target detection method and model based on discrete Transformer

Publications (1)

Publication Number Publication Date
CN116152579A true CN116152579A (en) 2023-05-23

Family

ID=86340892

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310307131.3A Pending CN116152579A (en) 2023-03-27 2023-03-27 Point cloud 3D target detection method and model based on discrete Transformer

Country Status (1)

Country Link
CN (1) CN116152579A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237830A (en) * 2023-11-10 2023-12-15 湖南工程学院 Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237830A (en) * 2023-11-10 2023-12-15 湖南工程学院 Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention
CN117237830B (en) * 2023-11-10 2024-02-20 湖南工程学院 Unmanned aerial vehicle small target detection method based on dynamic self-adaptive channel attention

Similar Documents

Publication Publication Date Title
CN111489358B (en) Three-dimensional point cloud semantic segmentation method based on deep learning
CN109685141B (en) Robot article sorting visual detection method based on deep neural network
CN113065546B (en) Target pose estimation method and system based on attention mechanism and Hough voting
CN112655001A (en) Method and device for classifying objects
US20220156483A1 (en) Efficient three-dimensional object detection from point clouds
US11615612B2 (en) Systems and methods for image feature extraction
Bobkov et al. Noise-resistant deep learning for object classification in three-dimensional point clouds using a point pair descriptor
Liu et al. 3D Point cloud analysis
CN114663514B (en) Object 6D attitude estimation method based on multi-mode dense fusion network
CN116152579A (en) Point cloud 3D target detection method and model based on discrete Transformer
CN114463736A (en) Multi-target detection method and device based on multi-mode information fusion
US11328170B2 (en) Unknown object identification for robotic device
US11348261B2 (en) Method for processing three-dimensional point cloud data
CN115830375A (en) Point cloud classification method and device
Sleaman et al. Indoor mobile robot navigation using deep convolutional neural network
CN115775214A (en) Point cloud completion method and system based on multi-stage fractal combination
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN114627183A (en) Laser point cloud 3D target detection method
CN115496859A (en) Three-dimensional scene motion trend estimation method based on scattered point cloud cross attention learning
Zhang et al. Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image
Verma et al. PATCH BASED STEREO MATCHING USING CONVOLUTIONAL NEURAL NETWORK.
Gao et al. A convolution-involution hybrid framework for monocular 3d object detection
Lu et al. Semantic feature mining for 3D object classification and segmentation
He et al. Room categorization using local receptive fields-based extreme learning machine
US20240104913A1 (en) Extracting features from sensor data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination