CN111145174B - 3D target detection method for point cloud screening based on image semantic features - Google Patents

3D target detection method for point cloud screening based on image semantic features Download PDF

Info

Publication number
CN111145174B
CN111145174B CN202010000186.6A CN202010000186A CN111145174B CN 111145174 B CN111145174 B CN 111145174B CN 202010000186 A CN202010000186 A CN 202010000186A CN 111145174 B CN111145174 B CN 111145174B
Authority
CN
China
Prior art keywords
point cloud
image
semantic
reg
bounding box
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010000186.6A
Other languages
Chinese (zh)
Other versions
CN111145174A (en
Inventor
吴飞
杨永光
荆晓远
葛琦
季一木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010000186.6A priority Critical patent/CN111145174B/en
Publication of CN111145174A publication Critical patent/CN111145174A/en
Application granted granted Critical
Publication of CN111145174B publication Critical patent/CN111145174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a 3D target detection method for point cloud screening based on image semantic features. The method comprises the following steps: firstly, a 2D semantic segmentation method is used for segmenting image data to obtain semantic prediction. And projecting the generated semantic prediction into a LIDAR point cloud space through a known projection matrix, so that each point in the point cloud can obtain the semantic category attribute of the corresponding image position. We extract points related to vehicles, pedestrians, cyclists from the original point cloud and form the viewing cones. Secondly, the viewing cones are used as input of the depth 3D target detector, and a loss function which accords with the characteristics of the viewing cones is designed to conduct network training. According to the invention, a 3D target detection algorithm for point cloud screening based on image semantic features is designed, so that the time and the calculation requirements of 3D detection are greatly reduced. Finally, the performance of the method on a reference data set KITTI of 3D target detection shows that the method has good real-time target detection performance.

Description

3D target detection method for point cloud screening based on image semantic features
Technical Field
The invention relates to 3D target detection, in particular to a 3D target detection algorithm for point cloud screening based on image semantic features, and belongs to the field of pattern recognition.
Background
Point cloud based 3D object detection serves as an important character in real life for many applications, such as autopilot, home robot, augmented reality, and virtual reality. Compared to traditional image-based target detection methods, LIDAR point clouds provide more accurate depth information that can be used to locate objects and delineate the shape of objects. However, LIDAR point clouds are more sparse and have large differences in density of parts, due to factors such as non-uniform 3D spatial sampling, effective range of the sensor, and object occlusion and relative position, unlike conventional images. To solve the above problem, many methods process a 3D point cloud into features that can be processed by a corresponding target detector using an artificially designed feature extraction method. However, these methods using all point clouds as input require a lot of computing resources and cannot achieve real-time detection.
Disclosure of Invention
The purpose of the invention is: aiming at the problems in the prior art, a 3D target detection algorithm for carrying out point cloud screening based on image semantic features is provided, the algorithm is an end-to-end depth 3D target detection method, a 2D image semantic segmentation method is simultaneously adopted to obtain the category attribute of each pixel in an image under the same scene, the prediction result is used as the prior category attribute, each point in the point cloud is marked through a known projection matrix, and the points of which the categories are automobiles, pedestrians and riders are all extracted from the point cloud to form a viewing cone which is used as the input of a 3D target detection network. At the same time we have designed a 3D object detector that handles cones. In addition to the basic components of the object detector, i.e., the point cloud feature extractor using the mesh, the convolutional intermediate extraction layer, and the regional pre-selection network (RPN), we also optimize the loss function to make the entire network more sensitive to objects in the view frustum that lack references. Our algorithm includes the following steps:
step (1): performing semantic segmentation on the two-dimensional image on the image data to obtain semantic prediction;
step (2): projecting the semantic prediction into a point cloud space, and screening points of a specific category to form a view cone;
and (3): a 3D target detection network is built, and a viewing cone is used as the input of a 3D target detector;
and (4): enhancing the sensitivity of the loss function to the position of the 3D target frame;
and (5): and obtaining a total objective function and carrying out algorithm optimization.
Further, the specific method for performing semantic segmentation on the image data in the step (1) is as follows:
images were segmented using the deplab v3+ semantic segmentation method: firstly, manually labeling the image part of a training set in a data set; then, 200 epochs are pre-trained on a Cityscapes data set by DeepLabv3+, and then 50 epochs are finely adjusted on a manually marked semantic label data set; the resulting semantic segmentation network is trained to classify each pixel in the picture as one of 19 classes.
Further, in the step (2), based on the result predicted by the 2D semantic segmentation method, projecting the region of each category in each image into the LIDAR point cloud space by using a known projection matrix, wherein the region corresponding to the LIDAR point cloud space has a category attribute consistent with the image region; points about vehicles, pedestrians and cyclists are then screened from the original point cloud and extracted to form viewing cones.
Further, in step (3), a deep object detection network is constructed by using the pytorech, and the network comprises three parts: point cloud feature extractor using mesh, convolution intermediate extraction layer and regional pre-selection network RPN:
in a grid point cloud feature extractor, orderly cutting the whole view cone by using a 3D grid with a set size, and sending all points in each grid to the grid feature extractor, wherein the grid feature extractor consists of a linear layer, a batch normalization layer BatchNorm and a nonlinear activation layer ReLU;
in the convolution intermediate layer, 3 convolution intermediate modules are used, each convolution intermediate module is formed by sequentially connecting a 3D convolution layer, a batch normalization layer and a nonlinear activation layer, the output of the grid point cloud extractor is used as the input, and the feature with the 3D structure is converted into a 2D pseudo-graph feature which is used as the output;
the input of the regional preselection network RPN is provided by a convolution intermediate layer, the architecture of the PRN consists of three full convolution modules, each module containing a downsampled convolution layer followed by two convolution layers corresponding to the characteristic image size, after each convolution layer the BatchNorm and ReLU operations are applied; then, the output of each block is up-sampled to feature maps with the same size, and the feature maps are connected into a whole; finally, three 1 × 1 2D convolutional layers are applied to the desired learning objective to generate: (1) probability score plot, (2) regression bias, and (3) directional prediction.
Further, in step (4), an overall loss function L is added to the model total As follows:
L total =β 1 L cls2 (L reg_θ +L reg_other )+β 3 L dir4 L corner
wherein L is cls To predict classification loss, L reg_θ Predicted angle loss for 3D bounding box, L reg_other To predict the loss of correction for the remaining parameters of the 3D bounding box, L dir To predict loss of direction, L corner To predict the vertex coordinate loss of the 3D bounding box; beta is a 12 β 34 Are hyper-parameters, set to 1.0, 2.0, 0.2 and 0.5, respectively;
for L reg_θ And L reg_other The following variables were used:
Figure GDA0003725931860000031
Figure GDA0003725931860000032
Δθ=θ ga
wherein
Figure GDA0003725931860000033
The parameters for each bounding box provided for the tag,
Figure GDA0003725931860000034
is a prediction parameter of an anchor point, where x c ,y c ,z c W, l, h and theta respectively refer to the central coordinate of the three-dimensional bounding box, the length, the width, the height and the overlooking course angle of the three-dimensional bounding box; wherein d is a =(l a ) 2 +(w a ) 2 As anchor point bottomThe length of the diagonal of the face; for the predicted angle theta p Angle loss L reg_θ The concrete expression is as follows:
L reg_θ =SmoothL1(sin(θ p -Δθ))
correcting the loss L for a parameter reg_other In particular the SmoothL1 function of the differences Δ x, Δ y, Δ z, Δ w, Δ L, Δ h, Δ θ, whereas the loss of coordinates of the vertices L of the 3 dboungbox is L corner The composition of (A) is as follows:
Figure GDA0003725931860000035
wherein NS, NH traverses all bounding boxes; p, P * ,P ** Denotes the predicted bounding box vertex, the vertex of the label bounding box, the vertex of the inverse bounding box, delta ij To balance the coefficients, i, j are the indices of the targets generated by the final profile.
Further, in step (4), the positive and negative anchor point balance is adjusted using focal length:
FL(p t )=-α t (1-p t ) γ log(p t )
wherein p is t Is the estimated probability of the model, α t And γ is a super parameter adjustment coefficient, set to 0.5 and 2, respectively.
Further, in step (5), the whole model is trained according to steps (2), (3) and (4), that is, the 3D target detection network is trained on the KITTI data set, and the specific parameters and implementation method are as follows: training 20 ten thousand times, 160epochs, on a 1080Ti GPU using a random gradient descent SGD and Adam optimizer; the initial learning rate was set to 0.0002, the exponential decay factor was 0.8 and decayed every 15 epochs.
Has the advantages that: in the invention, the image is subjected to semantic segmentation, the obtained prediction result is used as a prior classification mark, and points in the corresponding point cloud are screened out to form a viewing cone. The operation greatly reduces the characteristic of high complexity of the previous input, so that the 3D object detector can obtain good effect while keeping real-time detection.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention provides a 3D target detection method for point cloud screening based on image semantic features. The method comprises the following steps: firstly, a 2D semantic segmentation method is used for segmenting image data to obtain semantic prediction. And projecting the generated semantic prediction into a LIDAR point cloud space through a known projection matrix, so that each point in the point cloud can obtain the semantic category attribute of the corresponding image position. We extract points related to vehicles, pedestrians, cyclists from the original point cloud and form the viewing cones. Secondly, the viewing cones are used as input of the depth 3D target detector, and a loss function which accords with the characteristics of the viewing cones is designed to conduct network training. Due to the existence of a large amount of background and noise in the point cloud, the original unstructured point cloud data is very difficult to process, and a large amount of special consideration needs to be carried out, so that a large amount of computing resources and training reasoning time need to be consumed. The invention designs a 3D target detection algorithm for point cloud screening based on image semantic features, thereby greatly reducing the time and the calculation requirements of 3D detection. Finally, the performance of the method on a reference data set KITTI of 3D target detection shows that the method has good real-time target detection performance.
The invention is further described with reference to the following figures and examples.
Examples
The invention provides a 3D target detection algorithm for point cloud screening based on image semantic features, and the specific flow is shown in figure 1.
Step (1): we segmented the image using the currently outstanding semantic segmentation method, deplab v3 +. The image data of the 3D object detection data set does not contain markers for segmentation. We first hand label the image portion of the training set in the dataset. We pre-train the deplab v3+ now cityscaps dataset for 200 epochs and then perform a 50epoch fine-tuning on the manually labeled semantic tags. The resulting semantic segmentation network is trained to classify each pixel in the picture as one of 19 classes.
Step (2): projecting the semantic prediction into a point cloud space, and screening points of a specific category to form a view cone, wherein the specific method comprises the following steps: based on the result predicted by the 2D semantic segmentation method, the region of each category in each image is projected into the point cloud space by using a known projection matrix, so that the region of the corresponding point cloud space has the category attribute consistent with the image region. Then we screen the points about the car, pedestrian, cyclist from the original point cloud, forming the viewing cone.
And (3): we build a deep target detection network using the pytorech depth framework. The network comprises three parts: point cloud feature extractor using mesh, convolution intermediate extraction layer and regional pre-selection network (RPN).
In the grid point cloud feature extractor, a viewing cone is firstly cut in order by using a 3D grid with a set size. And all of each mesh is taken as input to the mesh feature extractor. Our mesh feature extractor consists of a linear layer, a batch normalization layer (BatchNorm) and a nonlinear activation layer (ReLU).
In the convolution middle layer we use 3 convolution middle blocks in order to increase the receptive field to get more context. Each convolution intermediate module consists of a 3D convolution layer, a batch normalization layer (BatchNorm) and a nonlinear activation layer (ReLU) connected in sequence. It takes the output of the mesh point cloud extractor as input and converts this feature with 3D structure into a 2D pseudo-graph feature, which is taken as the final output.
The input to the Regional Preselection Network (RPN) is provided by the convolution interlayer. The architecture of the PRN consists of three full-volume modules. Each module contains a downsampled convolutional layer followed by two convolutional layers corresponding to the feature image size. After each convolutional layer, we apply the BatchNorm and ReLU operations. Then we upsample the output of each block to a feature map of the same size and concatenate these feature maps into one whole. Finally, three 2D convolutional layers are applied to the desired learning objective to generate: (1) probability score plot, (2) regression bias, and (3) directional prediction.
And (4): the screening process of the point cloud causes the view cones not to have original context information. The target point cloud data without reference makes the detection task more difficult, and a special loss function is added into the model to strengthen the sensitivity of the model to the target. Overall loss function L total As follows:
L total =β 1 L cls2 (L reg_θ +L reg_other )+β 3 L dir4 L corner
wherein L is cls To classify the loss, L reg_θ Is the angle loss of the 3D bounding box, L reg_other Correcting for losses for the remaining parameters of the 3D bounding box, L dir For directional losses, L corner Is the loss of vertex coordinates for the 3D bounding box. Beta is a 12 β 34 For the hyper-parameter, are set to 1.0, 2.0, 0.2 and 0.5, respectively.
For L reg_θ And L reg_other It can be derived from the following variables:
Figure GDA0003725931860000051
Figure GDA0003725931860000052
Δθ=θ ga
wherein
Figure GDA0003725931860000061
The tag is provided with parameters describing the corresponding bounding box,
Figure GDA0003725931860000062
prediction parameters being anchor points, where x c ,y c ,z c W, l, h, theta respectively refer to the central coordinates of the three-dimensional bounding box,and its length, width, height and overlooking course angle. Wherein d is a =(l a ) 2 +(w a ) 2 The length of the diagonal of the frame is detected for the anchor point look-down direction. For the predicted angle theta p Angle loss L reg_θ Specifically, it can be expressed as:
L reg_θ =SmoothL1(sin(θ p -Δθ))
correcting the loss L for a parameter reg_other Is a SmoothL1 function of the differences Δ x, Δ y, Δ z, Δ w, Δ l, Δ h, Δ θ. Loss of vertex coordinates L of 3D bounding box corner The composition of (A) is as follows:
Figure GDA0003725931860000063
where NS, NH traverses all bounding boxes. P, P * ,P ** Indicating that the vertex of the bounding box is predicted, the vertex of the bounding box is labeled, and the label is inverted. In addition to the above loss function based on bounding box prediction, to solve the problem of imbalance of positive and negative anchor points existing in RPN, we add focal loss to solve these drawbacks:
FL(p t )=-α t (1-p t ) γ log(p t )
wherein p is t Is the estimated probability of the model, α t And gamma is a super parameter adjustment coefficient, which is set to 0.5 and 2 respectively; log (p) t ) The base numbers of the cross entropy are adopted, and the base numbers are both e and 10.
And (5): obtaining a total objective function and carrying out algorithm optimization:
the entire model is trained according to the previous steps 2, 3, 4. We train the 3D target detection network on the KITTI dataset. We trained on a 1080Ti GPU using random gradient descent (SGD) and Adam optimizers. Our model was trained 20 ten thousand times (160 epochs). The initial learning rate was set to 0.0002, the exponential decay factor was 0.8 and decayed every 15 epochs.
And (4) analyzing results:
to verify the superiority of the algorithm, we compared the proposed method with several of the most advanced target tests published recently, including MV3D, MV3D (LIDAR), F-PointNet, AVOD, AVOD-FCN and VoxelNet. As shown in tables 1 and 2, our method achieved the best performance in the most difficult target tests. Furthermore, table 3 provides a time-efficient comparison of the respective methods, and our method is also a real-time target detection method considering that it itself has used a 2D semantic segmentation method, which consumes too much time.
The experimental results are as follows:
table 1 counts the AP (%) values of 3D detection on the KITTI dataset.
Table 2 counts the AP (%) value of BEV detection on the KITTI dataset.
Table 3 counts the time(s) required for each method to process a scene over the KITTI dataset.
TABLE 1 AP-value comparison of 3D detection on KITTI data set
Figure GDA0003725931860000071
TABLE 2 AP-value comparison of BEV detection on KITTI data set
Figure GDA0003725931860000072
TABLE 3 comparison of time spent by each method on KITTI data sets
MV3D MV3D(LIDAR) F-PointNet AVOD AVOD-FCN VoxelNet ours
0.36 0.24 0.17 0.08 0.10 0.23 0.18
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (5)

1. A3D target detection method for carrying out point cloud screening based on image semantic features is characterized by comprising the following steps:
step (1): performing semantic segmentation on the two-dimensional image on the image data to obtain semantic prediction;
step (2): projecting the semantic prediction into a point cloud space, and screening points of a specific category to form a view cone;
and (3): building a 3D target detection network, and taking a viewing cone as the input of a 3D target detector;
and (4): enhancing the sensitivity of the loss function to the position of the 3D target frame;
and (5): obtaining a total objective function and carrying out algorithm optimization;
in step (3), a deep target detection network is constructed by using the pytorech, wherein the network comprises three parts: point cloud feature extractor using mesh, convolution intermediate extraction layer and regional pre-selection network RPN:
in a grid point cloud feature extractor, orderly cutting the whole view cone by using a 3D grid with a set size, and sending all points in each grid to the grid feature extractor, wherein the grid feature extractor consists of a linear layer, a batch normalization layer BatchNorm and a nonlinear activation layer ReLU;
in the convolution intermediate layer, 3 convolution intermediate modules are used, each convolution intermediate module is formed by sequentially connecting a 3D convolution layer, a batch normalization layer and a nonlinear activation layer, the output of the grid point cloud extractor is used as the input, and the feature with the 3D structure is converted into a 2D pseudo-graph feature which is used as the output;
the input of the regional preselection network RPN is provided by a convolution intermediate layer, the architecture of the PRN consists of three full convolution modules, each module containing a downsampled convolution layer followed by two convolution layers corresponding to the characteristic image size, after each convolution layer the BatchNorm and ReLU operations are applied; then, the output of each block is up-sampled to feature maps with the same size, and the feature maps are connected into a whole; finally, three 1 × 1 2D convolutional layers are applied to the desired learning objective to generate: (1) probability score plot, (2) regression bias and (3) directional prediction;
in step (4), adding an integral loss function L to the model total As follows:
L total =β 1 L cls2 (L reg_θ +L reg_other )+β 3 L dir4 L corner
wherein L is cls To predict the loss of classification, L reg_θ Predicted angle loss for 3D bounding box, L reg_other To predict the loss of correction for the remaining parameters of the 3D bounding box, L dir To predict directional lossesLose, L corner To predict the vertex coordinate loss of the 3D bounding box; beta is a 12 β 34 Are hyper-parameters, set to 1.0, 2.0, 0.2 and 0.5, respectively;
for L reg_θ And L reg_other The following variables were used:
Figure FDA0003725931850000021
Figure FDA0003725931850000022
Δθ=θ ga
wherein
Figure FDA0003725931850000023
The parameters for each bounding box provided for the tag,
Figure FDA0003725931850000024
is a prediction parameter of an anchor point, where x c ,y c ,z c W, l, h and theta respectively refer to the central coordinate of the three-dimensional bounding box, the length, the width, the height and the overlooking course angle of the three-dimensional bounding box; wherein d is a =(l a ) 2 +(w a ) 2 The length of the diagonal line of the anchor point floor; for the predicted angle theta p Angle loss L reg_θ The concrete expression is as follows:
L reg_θ =SmoothL1(sin(θ p -Δθ))
correcting the loss L for a parameter reg_other In particular the SmoothL1 function of the differences Δ x, Δ y, Δ z, Δ w, Δ L, Δ h, Δ θ, whereas the loss of coordinates of the vertices L of the 3D bounding box is L corner The composition of (A) is as follows:
Figure FDA0003725931850000025
wherein NS, NH traverses all bounding boxes; p, P * ,P ** Denotes the predicted bounding box vertex, the vertex of the label bounding box, the vertex of the inverse bounding box, delta ij To balance the coefficients, i, j are the indices of the targets generated by the final profile.
2. The 3D target detection method for point cloud screening based on image semantic features as claimed in claim 1, wherein: the specific method for performing semantic segmentation on the image data in the step (1) is as follows:
images were segmented using the deplab v3+ semantic segmentation method: firstly, manually labeling the image part of a training set in a data set; then, 200 epochs are pre-trained on a Cityscapes data set by DeepLabv3+, and then 50 epochs are finely adjusted on a manually marked semantic label data set; the resulting semantic segmentation network is trained to classify each pixel in the picture as one of 19 classes.
3. The 3D object detection method for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (2), based on the result predicted by the 2D semantic segmentation method, the region of each category in each image is projected into the LIDAR point cloud space by using a known projection matrix, and the region corresponding to the LIDAR point cloud space has a category attribute consistent with the image region; points about vehicles, pedestrians and cyclists are then screened from the original point cloud and extracted to form viewing cones.
4. The 3D target detection method for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (4), the positive and negative anchor point balance is adjusted using focal local:
FL(p t )=-α t (1-p t ) γ log(p t )
wherein p is t Is the estimated probability of the model, α t And gamma is a super parameter adjustment coefficient, and is set to 05 and 2.
5. The 3D object detection method for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (5), the whole model is trained according to steps (2), (3) and (4), that is, the 3D object detection network is trained on KITTI data set, and the specific parameters and implementation method are as follows: training 20 ten thousand times, 160epochs, on a 1080TiGPU using a random gradient descent SGD and Adam optimizer; the initial learning rate was set to 0.0002, the exponential decay factor was 0.8 and decayed every 15 epochs.
CN202010000186.6A 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features Active CN111145174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010000186.6A CN111145174B (en) 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010000186.6A CN111145174B (en) 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features

Publications (2)

Publication Number Publication Date
CN111145174A CN111145174A (en) 2020-05-12
CN111145174B true CN111145174B (en) 2022-08-09

Family

ID=70523228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010000186.6A Active CN111145174B (en) 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features

Country Status (1)

Country Link
CN (1) CN111145174B (en)

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DK201970115A1 (en) 2018-11-08 2020-06-09 Aptiv Technologies Limited Deep learning for object detection using pillars
GB2591171B (en) * 2019-11-14 2023-09-13 Motional Ad Llc Sequential fusion for 3D object detection
CN112200303B (en) * 2020-09-28 2022-10-21 杭州飞步科技有限公司 Laser radar point cloud 3D target detection method based on context-dependent encoder
CN112183358B (en) * 2020-09-29 2024-04-23 新石器慧通(北京)科技有限公司 Training method and device for target detection model
CN112184589B (en) 2020-09-30 2021-10-08 清华大学 Point cloud intensity completion method and system based on semantic segmentation
CN112464905B (en) * 2020-12-17 2022-07-26 湖南大学 3D target detection method and device
CN112598635B (en) * 2020-12-18 2024-03-12 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112541081B (en) * 2020-12-21 2022-09-16 中国人民解放军国防科技大学 Migratory rumor detection method based on field self-adaptation
CN112562093B (en) * 2021-03-01 2021-05-18 湖北亿咖通科技有限公司 Object detection method, electronic medium, and computer storage medium
CN113343886A (en) * 2021-06-23 2021-09-03 贵州大学 Tea leaf identification grading method based on improved capsule network
CN113378760A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Training target detection model and method and device for detecting target
CN113984037B (en) * 2021-09-30 2023-09-12 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate frame in any direction
CN114677568B (en) * 2022-05-30 2022-08-23 山东极视角科技有限公司 Linear target detection method, module and system based on neural network
CN116385452A (en) * 2023-03-20 2023-07-04 广东科学技术职业学院 LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph
CN116912238B (en) * 2023-09-11 2023-11-28 湖北工业大学 Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
US20190108639A1 (en) * 2017-10-09 2019-04-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Semantic Segmentation of 3D Point Clouds
CN109784333A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Based on an objective detection method and system for cloud bar power channel characteristics
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190108639A1 (en) * 2017-10-09 2019-04-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Semantic Segmentation of 3D Point Clouds
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
CN109784333A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Based on an objective detection method and system for cloud bar power channel characteristics
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium

Also Published As

Publication number Publication date
CN111145174A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
CN111145174B (en) 3D target detection method for point cloud screening based on image semantic features
Yu et al. A real-time detection approach for bridge cracks based on YOLOv4-FPM
CN111598030B (en) Method and system for detecting and segmenting vehicle in aerial image
CN110188705B (en) Remote traffic sign detection and identification method suitable for vehicle-mounted system
CN110728200B (en) Real-time pedestrian detection method and system based on deep learning
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN110222626B (en) Unmanned scene point cloud target labeling method based on deep learning algorithm
CN111914795B (en) Method for detecting rotating target in aerial image
CN111640125B (en) Aerial photography graph building detection and segmentation method and device based on Mask R-CNN
CN111695514B (en) Vehicle detection method in foggy days based on deep learning
CN110909623B (en) Three-dimensional target detection method and three-dimensional target detector
CN111681212B (en) Three-dimensional target detection method based on laser radar point cloud data
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN109801297B (en) Image panorama segmentation prediction optimization method based on convolution
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
CN112598635B (en) Point cloud 3D target detection method based on symmetric point generation
CN111738206A (en) Excavator detection method for unmanned aerial vehicle inspection based on CenterNet
CN115424017B (en) Building inner and outer contour segmentation method, device and storage medium
CN111640116B (en) Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN115393587A (en) Expressway asphalt pavement disease sensing method based on fusion convolutional neural network
CN114463736A (en) Multi-target detection method and device based on multi-mode information fusion
EP4174792A1 (en) Method for scene understanding and semantic analysis of objects
CN113191204B (en) Multi-scale blocking pedestrian detection method and system
CN113496480A (en) Method for detecting weld image defects

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 210000, 66 new model street, Gulou District, Jiangsu, Nanjing

Applicant after: NANJING University OF POSTS AND TELECOMMUNICATIONS

Address before: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210000

Applicant before: NANJING University OF POSTS AND TELECOMMUNICATIONS

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant