CN117437404A - Multi-mode target detection method based on virtual point cloud - Google Patents

Multi-mode target detection method based on virtual point cloud Download PDF

Info

Publication number
CN117437404A
CN117437404A CN202311400412.XA CN202311400412A CN117437404A CN 117437404 A CN117437404 A CN 117437404A CN 202311400412 A CN202311400412 A CN 202311400412A CN 117437404 A CN117437404 A CN 117437404A
Authority
CN
China
Prior art keywords
point cloud
network
target detection
virtual point
key points
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311400412.XA
Other languages
Chinese (zh)
Other versions
CN117437404B (en
Inventor
程腾
倪昊
张强
石琴
王文冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN202311400412.XA priority Critical patent/CN117437404B/en
Publication of CN117437404A publication Critical patent/CN117437404A/en
Application granted granted Critical
Publication of CN117437404B publication Critical patent/CN117437404B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of multi-mode target detection, in particular to a multi-mode target detection method based on virtual point cloud, which comprises the following detection steps: inputting the picture into a neural network, and extracting the characteristics of the picture to obtain key points of the picture; constructing a virtual point cloud through key point information in a virtual point cloud construction network; voxelized is carried out on the virtual point cloud and the real point cloud of the picture to obtain a voxel tissue; inputting the voxelized tissue into a target detection network to obtain a detection result; jointly updating parameters in the neural network, the virtual point cloud construction network and the target detection network to obtain a multi-mode target detection model consisting of the neural network, the virtual point cloud construction network and the target detection network; the method and the device for classifying the images input the images to be classified into the multi-mode target detection model to obtain the types of the images, and can effectively improve the accuracy of target detection.

Description

Multi-mode target detection method based on virtual point cloud
Technical Field
The invention relates to the technical field of multi-mode target detection, in particular to a multi-mode target detection method based on virtual point cloud.
Background
Multimodal target detection refers to a technique that utilizes a variety of different types of sensors or data sources, such as lidar, cameras, radar, etc., to fuse information for target detection and localization. The method aims to improve the accuracy and the robustness of target detection and simultaneously enable understanding of complex scenes to be more comprehensive.
Currently, three main multi-mode environmental sensing methods are: 1. acquiring each mode data by using a plurality of sensors, and superposing and fusing each mode data before sensing, which is also called pre-fusion; 2. designing a neural network for each modal data, extracting features by using the neural network to obtain required local features and global features, and superposing and fusing the modal features corresponding to each modal data at a feature level, which is also called feature fusion; 3. and logically accepting and rejecting the sensing results of the modal data to comprehensively obtain a final result, which is also called post fusion.
In the actual target detection process, the point cloud data are found to be sparse, and the positions of the point clouds are disordered, so that the problems of missing detection and false detection easily occur when the prior art is used, and the accuracy of target detection is greatly influenced.
Disclosure of Invention
In order to avoid and overcome the technical problems in the prior art, the invention provides a multi-mode target detection method based on virtual point cloud. The invention can effectively improve the accuracy of target detection.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a multi-mode target detection method based on virtual point cloud comprises the following detection steps:
s1, inputting a picture into a neural network, and extracting characteristics of the picture to obtain key points of the picture;
s2, constructing a virtual point cloud through key point information in a virtual point cloud construction network;
s3, voxelizing the virtual point cloud and the real point cloud of the picture to obtain a voxel tissue;
s4, inputting the voxelized tissue into a target detection network to obtain a detection result;
s5, jointly updating parameters in the neural network, the virtual point cloud construction network and the target detection network to obtain a multi-mode target detection model consisting of the neural network, the virtual point cloud construction network and the target detection network;
s6, inputting the pictures to be classified into the multi-mode target detection model to obtain the types of the pictures.
As still further aspects of the invention: the specific steps of step S1 are as follows:
s21, inputting the picture into a neural network serving as a DLA-34 network for feature extraction, and obtaining a corresponding feature map;
s22, acquiring camera coordinates of each point cloud on the feature map based on the CenterNet;
s23, converting camera coordinates of the point cloud into projection points on an XY plane in the camera coordinates through a conversion formula;
s24, calculating two-dimensional Gaussian probability distribution with the position of each projection point as the center, so as to generate Gao Situ;
s25, adding the Gaussian graphs generated by all the projection points to form a heat graph;
s26, selecting a pixel point with the maximum two-dimensional Gaussian probability in the heat map as a key point.
As still further aspects of the invention: the specific steps of step S2 are as follows:
s21, inputting a Gaussian diagram of the key points into a coordinate prediction network to obtain a predicted value of Gao Situ offset;
s22, calculating the mean value and the variance of the depths of all the key points in a mathematical statistics mode based on a Smake algorithm, and combining the predicted value of Gao Situ offset to obtain the three-dimensional coordinates of the key points;
s23, inputting the key points into a confidence coefficient network, and obtaining the confidence coefficient corresponding to each key point;
s24, selecting key points with confidence in a set range, and calculating to obtain set number of virtual point clouds in a point cloud space and coordinates of the virtual point clouds by combining depth values of the key points with an internal reference matrix of a camera.
As still further aspects of the invention: the specific steps of step S3 are as follows: carrying out voxelization on the obtained virtual point cloud and the real point cloud corresponding to the picture, and obtaining a voxel tissue after the voxelization; dividing the voxel tissue into voxel blocks equally; then, carrying out feature coding on point clouds in each voxel block; and finally, inputting the encoded voxel block into a target detection network, and predicting the category of the picture.
As still further aspects of the invention: combining a key point loss function in the coordinate prediction network and a target loss function in the target detection network to form a combined loss function; and updating parameters in the multi-modal target detection model consisting of the neural network, the virtual point cloud construction network and the target detection network through the joint loss function so as to obtain an optimal multi-modal target detection model.
Compared with the prior art, the invention has the beneficial effects that:
1. the invention provides a two-stage multi-mode target detection method based on virtual point cloud, namely, image detection target information is utilized to construct virtual point cloud to assist target detection based on point cloud. According to the method, firstly, the image detection target information is utilized to construct a virtual point cloud, and the density degree of the point cloud is increased, so that the performance of target characteristics is improved. And secondly, adding the feature dimension of the point cloud to distinguish the real point cloud from the virtual point cloud, and enhancing the relevance of the point cloud by using voxels containing confidence codes. Finally, a loss function is designed by adopting the proportionality coefficient of the virtual point cloud, the supervised training of image detection is increased, the training efficiency of the two-stage network is improved, the problem of model error accumulation of the two-stage end-to-end network model is avoided, and the accuracy and the robustness of the target detection system are effectively improved.
Drawings
FIG. 1 is a flow chart of the main detection steps of the present invention.
Fig. 2 is an overall block diagram of a model in accordance with the present invention.
Fig. 3 is a schematic view of a virtual point cloud structure position in a voxel in the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1 to 3, in an embodiment of the present invention, a method for detecting a multi-modal target based on a virtual point cloud mainly includes multi-modal data input of a sensor, a neural network, a virtual point cloud structure network, a target detection network, and a loss combined training. Firstly, sending an image into a backbone network DLA-34 for feature extraction, and obtaining coordinate predicted values and target confidence coefficients of a certain number of target 3D key points through a regression network; and constructing a corresponding virtual point cloud in the laser radar point cloud according to the generated key point information, adding the point cloud feature dimension to distinguish virtual point cloud from real point cloud, simultaneously incorporating the output target confidence into feature codes, and sending the feature codes and the real point cloud into 3D target detection based on voxels. Meanwhile, in order to avoid the problem of model error accumulation in the two-stage end-to-end series network, a loss function is designed by adopting the proportional relation of the virtual point cloud, and the supervised training of image detection is increased, so that the training efficiency of the image processing module is improved.
The detection network provided by the invention also carries out data expansion on the voxel block where the virtual point cloud is located, as shown in fig. 3. The method comprises the following steps: according to the position information of the virtual point cloud, the position of the corresponding voxel block can be determined, and marking is carried out by verifying whether the position has voxels. If present, the location voxel is labeled by adding a confidence value after the echo. If the position does not exist the true point cloud, the spatial distribution of the whole voxel block is considered, and the addition is carried out according to a uniform distribution strategy. When designing a single voxel, it is prescribed that a maximum of 5 point clouds are taken, and since the consideration of the voxel 3D object detection method in the height direction is not too great, a tangent plane of a rectangle is selected, four points are uniformly constructed, and added to the whole point cloud data together with virtual points.
The main content of the invention is as follows:
a. image-based keypoint detection. And (3) obtaining a feature map with a corresponding size after the monocular image passes through a feature extraction network, and directly predicting projection points of the target 3D key points on the 2D image based on the idea of CenterNet. And converting the true value of the 3D key point in the point cloud data into a camera plane projection through a camera formula, and then encoding the true value into a piece of 2D Gao Situ. The gaussian is a two-dimensional probability distribution function that assigns higher probability values to pixels near the center of the object and lower probability values to pixels farther from the center. For each keypoint, a gaussian graph is generated by computing a two-dimensional gaussian probability distribution centered on the keypoint location. The standard deviation of the gaussian is usually set to a fixed value, which determines the distribution of probability values around the keypoints. The gaussian maps of all the keypoints are then summed to generate a final heat map that represents the likelihood that each pixel belongs to a particular object class. The pixel with the highest probability value in each heat map is regarded as the position of the corresponding key point. Image characteristic diagram passes through a series of networks, and prediction head outputs prediction of Gao Situ offsetWherein K is the camera internal reference. Then, by taking reference to the thought of SMOKE, firstly adopting a mathematical statistics mode to calculate the mean value and variance of the depth of the 3D key point, and combining the offset of the predicted depth of the prediction head to obtain the 3D coordinate predicted value [ x ] of the key point p ,y p ,z p ]。
b. Constructing a virtual point cloud. Selecting N key points with highest confidence coefficient from the final feature map of the complex; taking the predicted depth z, combining with the camera internal reference transformation matrix to obtain N virtual 3D points [ x ] in the point cloud space vp ,y vp ,z vp ]The method comprises the steps of carrying out a first treatment on the surface of the In order to prevent the front view range of the real point cloud from being exceeded, N' points are obtained after screening and filtering the virtual point clouds and are added into point cloud data, and the reflection intensity of the point cloud data is replaced by the average value of the whole point cloud.
c. Target detection based on point cloud voxelization. The post-point cloud 3D object detection network adopts a form based on voxel characteristics. The main idea is to divide the whole 3D space into voxel blocks of the same size along the three axes x, y, z. And carrying out feature coding on the point cloud in each voxel block, fully considering global and local features to obtain voxel features, and then carrying out target detection in a 3D convolution mode.
Specific examples: the pictures are all adjusted to be uniform in size (1280 x 384 x 3), the pictures are input into a network, features are extracted through a DLA-34 backbone network to obtain a feature layer, and parameters required by 3D position prediction are obtained through a prediction head and a regression head. Wherein the prediction head predicts the 2D center and class of the target by generating a thermodynamic diagram, the regression head regresses the offset required for the 2D center to convert to 3D coordinates, etc. And taking some points with the highest characteristic values in the thermodynamic diagram as key points, calculating the mean value and the variance of the depth of the 3D key points in a mathematical statistics mode, and combining the offset of the predicted depth of the prediction head to obtain the 3D coordinate predicted value of the key points. The key points are not all central points, only one target central point is provided, and the key points comprise the central points and peripheral points. The coordinates of the key points and the characteristic values of the thermodynamic diagram key points, namely the confidence level, are used for constructing virtual point clouds according to the information, combining with an internal camera transformation matrix to obtain N virtual 3D points in a point cloud space, adding confidence data dimensions to each virtual point, and sending the N virtual points and the real point clouds into a point cloud 3D target detection network based on voxels together
Experiments were performed on the multi-modal detection model proposed by the present invention using the KITTI dataset and the results were compared with several laser-only radars and multi-modal 3D object detection methods. For vehicle detection, the network provided by the invention has excellent performance, the detection precision is superior to that of a classical 3D point cloud detection network and a certain multi-sensor information fusion network, and the vehicle detection precision reaches 86.9%.
The 3D detection network provided by the invention plays an excellent role in barrier-free target detection, and can well detect even if the target detection network is shielded. The method has the advantages that the method has good effect in remote target detection, and the accuracy is improved mainly because the image and the laser point cloud information are processed at the same time, and the virtual point cloud is constructed by acquiring key points from the image, so that the point cloud of the remote target in the point cloud space is not sparse, and the method has better detection effect on remote and small objects.
Different approaches have been tried in the training process of the network, including increasing the loss bias weight and directly adding the two partial losses, and comparing by the training convergence process. After the deviation weight is introduced into the loss function, the convergence rate of the model is obviously accelerated, and the detection effect is partially improved. The method not only can better balance the loss of the two parts, but also can better express the importance of different detection modes, and improves the performance of the model.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims (5)

1. The multi-mode target detection method based on the virtual point cloud is characterized by comprising the following detection steps of:
s1, inputting a picture into a neural network, and extracting characteristics of the picture to obtain key points of the picture;
s2, constructing a virtual point cloud through key point information in a virtual point cloud construction network;
s3, voxelizing the virtual point cloud and the real point cloud of the picture to obtain a voxel tissue;
s4, inputting the voxelized tissue into a target detection network to obtain a detection result;
s5, jointly updating parameters in the neural network, the virtual point cloud construction network and the target detection network to obtain a multi-mode target detection model consisting of the neural network, the virtual point cloud construction network and the target detection network;
s6, inputting the pictures to be classified into the multi-mode target detection model to obtain the types of the pictures.
2. The method for detecting a multi-modal object based on virtual point cloud as claimed in claim 1, wherein the specific steps of step S1 are as follows:
s21, inputting the picture into a neural network serving as a DLA-34 network for feature extraction, and obtaining a corresponding feature map;
s22, acquiring camera coordinates of each point cloud on the feature map based on the CenterNet;
s23, converting camera coordinates of the point cloud into projection points on an XY plane in the camera coordinates through a conversion formula;
s24, calculating two-dimensional Gaussian probability distribution with the position of each projection point as the center, so as to generate Gao Situ;
s25, adding the Gaussian graphs generated by all the projection points to form a heat graph, wherein the Gaussian kernel generation is shown as follows:
s26, selecting a pixel point with the maximum two-dimensional Gaussian probability in the heat map as a key point.
3. The method for detecting a multi-modal object based on virtual point cloud as claimed in claim 2, wherein the specific steps of step S2 are as follows:
s21, inputting a Gaussian diagram of the key points into a coordinate prediction network to obtain a predicted value of Gao Situ offset;
s22, calculating the mean value and the variance of the depths of all the key points by adopting a mathematical statistics mode based on a Smoke algorithm, and combining the predicted value of Gao Situ offset to obtain the three-dimensional coordinates of the key points, wherein the coordinate conversion formula is as follows:
z p =μ zz σ z
s23, inputting the key points into a confidence coefficient network, and obtaining the confidence coefficient corresponding to each key point;
s24, selecting key points with confidence in a set range, and calculating to obtain set number of virtual point clouds in a point cloud space and coordinates of the virtual point clouds by combining depth values of the key points with an internal reference matrix of a camera.
4. The method for detecting a multi-modal object based on virtual point cloud as claimed in claim 3, wherein the specific steps of step S3 are as follows: carrying out voxelization on the obtained virtual point cloud and the real point cloud corresponding to the picture, and obtaining a voxel tissue after the voxelization; dividing the voxel tissue into voxel blocks equally; then, carrying out feature coding on point clouds in each voxel block; and finally, inputting the encoded voxel block into a target detection network, and predicting the category of the picture.
5. The method for detecting a multi-modal target based on a virtual point cloud as claimed in claim 4, wherein the key points in the network are predicted by combining coordinatesThe loss function and a target loss function in the target detection network form a joint loss function; and updating parameters in the multi-modal target detection model consisting of the neural network, the virtual point cloud construction network and the target detection network through the joint loss function so as to obtain an optimal multi-modal target detection model. Recording the number of 3D key points needing to expand the virtual uniform point cloud, indirectly reflecting the accuracy of the monocular network, and giving a large loss weight mu when the number is small vp Thereby further improving the training efficiency of the first part of monocular network. The loss optimization calculation formula is as follows:
wherein: ΔLoss i And DeltaLoss i-1 For the loss values of the current round and the previous round, N is the training round number of the participated training, N is the number of virtual point clouds which are constructed by the training round and accord with the 3D space range, N max And setting and selecting the number of 3D key points for the key point network, wherein beta is an adjustable minimum value.
The total loss is the sum of the two losses as follows.
Loss=μ 1 *L 1 +(1-μ 2 )*L 2
Wherein: l (L) 1 For the positioning loss of the 3D key points, L 2 Loss as the final prediction result.
CN202311400412.XA 2023-10-26 2023-10-26 Multi-mode target detection method based on virtual point cloud Active CN117437404B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311400412.XA CN117437404B (en) 2023-10-26 2023-10-26 Multi-mode target detection method based on virtual point cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311400412.XA CN117437404B (en) 2023-10-26 2023-10-26 Multi-mode target detection method based on virtual point cloud

Publications (2)

Publication Number Publication Date
CN117437404A true CN117437404A (en) 2024-01-23
CN117437404B CN117437404B (en) 2024-07-19

Family

ID=89549356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311400412.XA Active CN117437404B (en) 2023-10-26 2023-10-26 Multi-mode target detection method based on virtual point cloud

Country Status (1)

Country Link
CN (1) CN117437404B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654492A (en) * 2015-12-30 2016-06-08 哈尔滨工业大学 Robust real-time three-dimensional (3D) reconstruction method based on consumer camera
CN113205466A (en) * 2021-05-10 2021-08-03 南京航空航天大学 Incomplete point cloud completion method based on hidden space topological structure constraint
US20210365712A1 (en) * 2019-01-30 2021-11-25 Baidu Usa Llc Deep learning-based feature extraction for lidar localization of autonomous driving vehicles
CN114359660A (en) * 2021-12-20 2022-04-15 合肥工业大学 Multi-modal target detection method and system suitable for modal intensity change
WO2022141720A1 (en) * 2020-12-31 2022-07-07 罗普特科技集团股份有限公司 Three-dimensional heat map-based three-dimensional point cloud target detection method and device
US20230080678A1 (en) * 2021-08-26 2023-03-16 The Hong Kong University Of Science And Technology Method and electronic device for performing 3d point cloud object detection using neural network
EP4194807A1 (en) * 2021-12-10 2023-06-14 Beijing Baidu Netcom Science Technology Co., Ltd. High-precision map construction method and apparatus, electronic device, and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105654492A (en) * 2015-12-30 2016-06-08 哈尔滨工业大学 Robust real-time three-dimensional (3D) reconstruction method based on consumer camera
US20210365712A1 (en) * 2019-01-30 2021-11-25 Baidu Usa Llc Deep learning-based feature extraction for lidar localization of autonomous driving vehicles
WO2022141720A1 (en) * 2020-12-31 2022-07-07 罗普特科技集团股份有限公司 Three-dimensional heat map-based three-dimensional point cloud target detection method and device
CN113205466A (en) * 2021-05-10 2021-08-03 南京航空航天大学 Incomplete point cloud completion method based on hidden space topological structure constraint
US20230080678A1 (en) * 2021-08-26 2023-03-16 The Hong Kong University Of Science And Technology Method and electronic device for performing 3d point cloud object detection using neural network
EP4194807A1 (en) * 2021-12-10 2023-06-14 Beijing Baidu Netcom Science Technology Co., Ltd. High-precision map construction method and apparatus, electronic device, and storage medium
CN114359660A (en) * 2021-12-20 2022-04-15 合肥工业大学 Multi-modal target detection method and system suitable for modal intensity change

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZECHEN LIU 等: "SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation", ARXIV:2002.10111V1, 24 February 2020 (2020-02-24), pages 1 - 10 *
王宏任 等: "基于关键点检测二阶段目标检测方法研究", 集成技术, vol. 10, no. 5, 30 September 2021 (2021-09-30), pages 34 - 42 *

Also Published As

Publication number Publication date
CN117437404B (en) 2024-07-19

Similar Documents

Publication Publication Date Title
Yi et al. Segvoxelnet: Exploring semantic context and depth-aware features for 3d vehicle detection from point cloud
CN108648161A (en) The binocular vision obstacle detection system and method for asymmetric nuclear convolutional neural networks
CN113052109A (en) 3D target detection system and 3D target detection method thereof
CN111998862B (en) BNN-based dense binocular SLAM method
Wang et al. VoPiFNet: Voxel-Pixel Fusion Network for Multi-Class 3D Object Detection
CN115512132A (en) 3D target detection method based on point cloud data and multi-view image data fusion
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN116612468A (en) Three-dimensional target detection method based on multi-mode fusion and depth attention mechanism
CN114067075A (en) Point cloud completion method and device based on generation of countermeasure network
CN115100741B (en) Point cloud pedestrian distance risk detection method, system, equipment and medium
CN114608522B (en) Obstacle recognition and distance measurement method based on vision
CN116664856A (en) Three-dimensional target detection method, system and storage medium based on point cloud-image multi-cross mixing
CN116703996A (en) Monocular three-dimensional target detection algorithm based on instance-level self-adaptive depth estimation
CN115511759A (en) Point cloud image depth completion method based on cascade feature interaction
CN117576665B (en) Automatic driving-oriented single-camera three-dimensional target detection method and system
Feng et al. Object detection and localization based on binocular vision for autonomous vehicles
CN116778262B (en) Three-dimensional target detection method and system based on virtual point cloud
CN117315518A (en) Augmented reality target initial registration method and system
Lyu et al. 3DOPFormer: 3D occupancy perception from multi-camera images with directional and distance enhancement
Miao et al. 3D Object Detection with Normal-map on Point Clouds.
Zhao et al. DHA: Lidar and vision data fusion-based on road object classifier
CN117437404B (en) Multi-mode target detection method based on virtual point cloud
CN115937520A (en) Point cloud moving target segmentation method based on semantic information guidance
CN113569803A (en) Multi-mode data fusion lane target detection method and system based on multi-scale convolution
Vatavu et al. Environment perception using dynamic polylines and particle based occupancy grids

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant