CN111145174A - 3D target detection method for point cloud screening based on image semantic features - Google Patents

3D target detection method for point cloud screening based on image semantic features Download PDF

Info

Publication number
CN111145174A
CN111145174A CN202010000186.6A CN202010000186A CN111145174A CN 111145174 A CN111145174 A CN 111145174A CN 202010000186 A CN202010000186 A CN 202010000186A CN 111145174 A CN111145174 A CN 111145174A
Authority
CN
China
Prior art keywords
point cloud
image
semantic
reg
target detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010000186.6A
Other languages
Chinese (zh)
Other versions
CN111145174B (en
Inventor
吴飞
杨永光
荆晓远
葛琦
季一木
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202010000186.6A priority Critical patent/CN111145174B/en
Publication of CN111145174A publication Critical patent/CN111145174A/en
Application granted granted Critical
Publication of CN111145174B publication Critical patent/CN111145174B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a 3D target detection method for point cloud screening based on image semantic features. The method comprises the following steps: firstly, a 2D semantic segmentation method is used for segmenting image data to obtain semantic prediction. And projecting the generated semantic prediction into a LIDAR point cloud space through a known projection matrix, so that each point in the point cloud can obtain the semantic category attribute of the corresponding image position. We extract points related to vehicles, pedestrians, cyclists from the original point cloud and form the viewing cones. Secondly, the viewing cones are used as input of the depth 3D target detector, and a loss function which accords with the characteristics of the viewing cones is designed to conduct network training. The invention designs a 3D target detection algorithm for point cloud screening based on image semantic features, thereby greatly reducing the time and the calculation requirements of 3D detection. Finally, the performance of the method on a reference data set KITTI of 3D target detection shows that the method has good real-time target detection performance.

Description

3D target detection method for point cloud screening based on image semantic features
Technical Field
The invention relates to 3D target detection, in particular to a 3D target detection algorithm for point cloud screening based on image semantic features, and belongs to the field of pattern recognition.
Background
Point cloud based 3D object detection serves as an important character in real life for many applications, such as autopilot, home robot, augmented reality, and virtual reality. Compared to traditional image-based target detection methods, LIDAR point clouds provide more accurate depth information that can be used to locate objects and delineate the shape of objects. However, LIDAR point clouds are more sparse and have large differences in density of parts, due to factors such as non-uniform 3D spatial sampling, effective range of the sensor, and object occlusion and relative position, unlike conventional images. To solve the above problem, many methods process a 3D point cloud into features that can be processed by a corresponding target detector using an artificially designed feature extraction method. However, these methods using all point clouds as input require a lot of computing resources and cannot achieve real-time detection.
Disclosure of Invention
The purpose of the invention is: aiming at the problems in the prior art, a 3D target detection algorithm for carrying out point cloud screening based on image semantic features is provided, the algorithm is an end-to-end depth 3D target detection method, a 2D image semantic segmentation method is simultaneously adopted to obtain the category attribute of each pixel in an image under the same scene, the prediction result is used as the prior category attribute, each point in the point cloud is marked through a known projection matrix, and the points of which the categories are automobiles, pedestrians and riders are all extracted from the point cloud to form a viewing cone which is used as the input of a 3D target detection network. At the same time we have designed a 3D object detector that handles cones. In addition to the basic components of the object detector, i.e., the point cloud feature extractor using the mesh, the convolutional intermediate extraction layer, and the regional pre-selection network (RPN), we also optimize the loss function to make the entire network more sensitive to objects in the view frustum that lack references. Our algorithm includes the following steps:
step (1): performing semantic segmentation on the two-dimensional image on the image data to obtain semantic prediction;
step (2): projecting the semantic prediction into a point cloud space, and screening points of a specific category to form a view cone;
and (3): building a 3D target detection network, and taking a viewing cone as the input of a 3D target detector;
and (4): enhancing the sensitivity of the loss function to the position of the 3D target frame;
and (5): and obtaining a total objective function and carrying out algorithm optimization.
Further, the specific method for performing semantic segmentation on the image data in the step (1) is as follows:
images were segmented using the deplab v3+ semantic segmentation method: firstly, manually labeling the image part of a training set in a data set; then, 200 epochs are pre-trained on a Cityscapes data set by DeepLabv3+, and then 50 epochs are finely adjusted on a manually marked semantic label data set; the resulting semantic segmentation network is trained to classify each pixel in the picture as one of 19 classes.
Further, in the step (2), based on the result predicted by the 2D semantic segmentation method, projecting the region of each category in each image into the LIDAR point cloud space by using a known projection matrix, wherein the region corresponding to the LIDAR point cloud space has a category attribute consistent with the image region; points about vehicles, pedestrians and cyclists are then screened from the original point cloud and extracted to form viewing cones.
Further, in step (3), a deep object detection network is constructed by using the pytorech, and the network comprises three parts: point cloud feature extractor using mesh, convolution intermediate extraction layer and regional pre-selection network RPN:
in a grid point cloud feature extractor, orderly cutting the whole view cone by using a 3D grid with a set size, and sending all points in each grid to the grid feature extractor, wherein the grid feature extractor consists of a linear layer, a batch normalization layer BatchNorm and a nonlinear activation layer ReLU;
in the convolution intermediate layer, 3 convolution intermediate modules are used, each convolution intermediate module is formed by sequentially connecting a 3D convolution layer, a batch normalization layer and a nonlinear activation layer, the output of the grid point cloud extractor is used as the input, and the feature with the 3D structure is converted into a 2D pseudo-graph feature which is used as the output;
the input of the regional preselection network RPN is provided by a convolution intermediate layer, the architecture of the PRN consists of three full convolution modules, each module containing a downsampled convolution layer followed by two convolution layers corresponding to the characteristic image size, after each convolution layer the BatchNorm and ReLU operations are applied; then, the output of each block is up-sampled to feature maps with the same size, and the feature maps are connected into a whole; finally, three 1 × 1 2D convolutional layers are applied to the desired learning objective to generate: (1) probability score plot, (2) regression bias, and (3) directional prediction.
Further, in step (4), an overall loss function L is added to the modeltotalAs follows:
Ltotal=β1Lcls2(Lreg_θ+Lreg_other)+β3Ldir4Lcorner
wherein L isclsTo predict the loss of classification, Lreg_θPredicted angle loss for 3D bounding box, Lreg_otherTo predict the loss of correction for the remaining parameters of the 3D bounding box, LdirTo predict loss of direction, LcornerTo predict the loss of vertex coordinates for the 3D bounding box β12β34Are hyper-parameters, set to 1.0, 2.0, 0.2 and 0.5, respectively;
for Lreg_θAnd Lreg_otherThe following variables were used:
Figure BDA0002352695660000031
Figure BDA0002352695660000032
Δθ=θga
wherein
Figure DEST_PATH_FDA0002398050340000023
wg,lg,hggThe parameters for each bounding box provided for the tag,
Figure BDA0002352695660000034
is a prediction parameter of an anchor point, where xc,yc,zcW, l, h and theta respectively refer to the central coordinate of the three-dimensional bounding box, the length, the width, the height and the overlooking course angle of the three-dimensional bounding box; wherein d isa=(la)2+(wa)2The length of the diagonal line of the anchor point floor; for the predicted angle thetapAngle loss Lreg_θThe concrete expression is as follows:
Lreg_θ=SnoothL1(sin(θp-Δθ))
correcting the loss L for a parameterreg_otherIn particular, it is the SmoothL1 function of the differences Δ x, Δ y, Δ z, Δ w, Δ L, Δ h, Δ θ, while the loss of coordinates of the vertices L of the 3D bounding boxcornerThe composition of (A) is as follows:
Figure BDA0002352695660000035
wherein NS, NH traverses all bounding boxes; p, P*,P**Denotes the predicted bounding box vertex, the vertex of the label bounding box, the vertex of the inverse bounding box, deltaijTo balance the coefficients, i, j are the indices of the targets generated by the final profile.
Further, in step (4), the positive and negative anchor point balance is adjusted using focal length:
FL(pt)=-αt(1-pt)γlog(pt)
wherein p istIs the estimated probability of the model, αtAnd γ is a super parameter adjustment coefficient, set to 0.5 and 2, respectively.
Further, in step (5), the whole model is trained according to steps (2), (3) and (4), that is, the 3D target detection network is trained on the KITTI data set, and the specific parameters and implementation method are as follows: training 20 ten thousand times, 160epochs, on a 1080Ti GPU using a random gradient descent SGD and Adam optimizer; the initial learning rate was set to 0.0002, the exponential decay factor was 0.8 and decayed every 15 epochs.
Has the advantages that: in the invention, the image is subjected to semantic segmentation, the obtained prediction result is used as a prior classification mark, and points in the corresponding point cloud are screened out to form a viewing cone. The operation greatly reduces the characteristic of high complexity of the previous input, so that the 3D object detector can obtain good effect while keeping real-time detection.
Drawings
FIG. 1 is a flow chart of the present invention.
Detailed Description
The invention provides a 3D target detection method for point cloud screening based on image semantic features. The method comprises the following steps: firstly, a 2D semantic segmentation method is used for segmenting image data to obtain semantic prediction. And projecting the generated semantic prediction into a LIDAR point cloud space through a known projection matrix, so that each point in the point cloud can obtain the semantic category attribute of the corresponding image position. We extract points related to vehicles, pedestrians, cyclists from the original point cloud and form the viewing cones. Secondly, the viewing cones are used as input of the depth 3D target detector, and a loss function which accords with the characteristics of the viewing cones is designed to conduct network training. Due to the existence of a large amount of background and noise in the point cloud, the original unstructured point cloud data is very difficult to process, and a large amount of special consideration needs to be carried out, so that a large amount of computing resources and training reasoning time need to be consumed. The invention designs a 3D target detection algorithm for point cloud screening based on image semantic features, thereby greatly reducing the time and the calculation requirements of 3D detection. Finally, the performance of the method on a reference data set KITTI of 3D target detection shows that the method has good real-time target detection performance.
The invention is further described with reference to the following figures and examples.
Examples
The invention provides a 3D target detection algorithm for point cloud screening based on image semantic features, and the specific flow is shown in figure 1.
Step (1): we segmented the image using the currently outstanding semantic segmentation method, deplab v3 +. The image data of the 3D object detection data set does not contain markers for segmentation. We first hand label the image portion of the training set in the dataset. We pre-train the deplab v3+ now cityscaps dataset for 200 epochs and then perform a 50epoch fine-tuning on the manually labeled semantic tags. The resulting semantic segmentation network is trained to classify each pixel in the picture as one of 19 classes.
Step (2): projecting the semantic prediction into a point cloud space, and screening points of a specific category to form a view cone, wherein the specific method comprises the following steps: based on the result predicted by the 2D semantic segmentation method, the region of each category in each image is projected into the point cloud space by using a known projection matrix, so that the region of the corresponding point cloud space has the category attribute consistent with the image region. Then we screen the points about the car, pedestrian, cyclist from the original point cloud, forming the viewing cone.
And (3): we build a deep target detection network using the pytorech depth framework. The network comprises three parts: point cloud feature extractor using mesh, convolution intermediate extraction layer and regional pre-selection network (RPN).
In the grid point cloud feature extractor, a viewing cone is firstly cut in order by using a 3D grid with a set size. And all of each mesh is taken as input to the mesh feature extractor. Our mesh feature extractor consists of a linear layer, a batch normalization layer (BatchNorm) and a nonlinear activation layer (ReLU).
In the convolution middle layer we use 3 convolution middle blocks in order to increase the receptive field to get more context. Each convolution intermediate module consists of a 3D convolution layer, a batch normalization layer (BatchNorm) and a nonlinear activation layer (ReLU) connected in sequence. It takes the output of the mesh point cloud extractor as input and converts this feature with 3D structure into a 2D pseudo-graph feature, which is taken as the final output.
The input to the Regional Preselection Network (RPN) is provided by the convolution interlayer. The architecture of the PRN consists of three full-volume modules. Each module contains a downsampled convolutional layer followed by two convolutional layers corresponding to the feature image size. After each convolutional layer, we apply the BatchNorm and ReLU operations. Then we upsample the output of each block to a feature map of the same size and concatenate these feature maps into one whole. Finally, three 2D convolutional layers are applied to the desired learning objective to generate: (1) probability score plot, (2) regression bias, and (3) directional prediction.
And (4): the screening process of the point cloud causes the view cones not to have original context information. The target point cloud data without reference makes the detection task more difficult, and a special loss function is added into the model to strengthen the sensitivity of the model to the target. Overall loss function LtotalAs follows:
Ltotal=β1Lcls2(Lreg_θ+Lreg_other)+β3Ldir4Lcorner
wherein L isclsTo classify the loss, Lreg_θIs the angle loss of the 3D bounding box, Lreg_otherCorrection of losses for the remaining parameters of the 3Dbounding box, LdirFor directional loss, LcornerLoss of vertex coordinates for 3D bounding box β12β34For the hyper-parameter, are set to 1.0, 2.0, 0.2 and 0.5, respectively.
For Lreg_θAnd Lreg_otherCan beTo be determined from the following variables:
Figure BDA0002352695660000051
Figure BDA0002352695660000052
Δθ=θga
wherein
Figure 462170DEST_PATH_FDA0002398050340000023
wg,lg,hggThe tag is provided with parameters describing the corresponding bounding box,
Figure BDA0002352695660000054
prediction parameters being anchor points, where xc,yc,zcAnd w, l, h and theta respectively refer to the central coordinate of the three-dimensional bounding box, the length, the width, the height and the overlooking course angle. Wherein d isa=(la)2+(wa)2The length of the diagonal of the frame is detected for the anchor point look-down direction. For the predicted angle thetapAngle loss Lreg_θSpecifically, it can be expressed as:
Lreg_θ=SnoothL1(sin(θp-Δθ))
correcting the loss L for a parameterreg_otherIs a SmoothL1 function of the differences Δ x, Δ y, Δ z, Δ w, Δ l, Δ h, Δ θ. Loss of vertex coordinates L of 3D bounding boxcornerThe composition of (A) is as follows:
Figure BDA0002352695660000061
where NS, NH traverses all bounding boxes. P, P*,P**Indicating that the vertex of the bounding box is predicted, the vertex of the bounding box is labeled, and the label is inverted. In addition to the above loss function based on bounding box prediction, to solve the problem of imbalance of positive and negative anchor points existing in the RPNWe add focal length to address these drawbacks:
FL(pt)=-αt(1-pt)γlog(pt)
wherein p istIs the estimated probability of the model, αtAnd gamma is a super parameter adjustment coefficient, which is set to 0.5 and 2 respectively; log (p)t) The base numbers of the cross entropy are adopted, and the base numbers are both e and 10.
And (5): obtaining a total objective function and carrying out algorithm optimization:
the entire model is trained according to the previous steps 2, 3, 4. We train the 3D target detection network on the KITTI dataset. We trained on a 1080Ti GPU using random gradient descent (SGD) and Adam optimizers. Our model was trained 20 ten thousand times (160 epochs). The initial learning rate was set to 0.0002, the exponential decay factor was 0.8 and decayed every 15 epochs.
And (4) analyzing results:
to verify the superiority of the algorithm, we compared the proposed method with several of the most advanced target tests published recently, including MV3D, MV3D (LIDAR), F-PointNet, AVOD, AVOD-FCN and VoxelNet. As shown in tables 1 and 2, our method achieved the best performance in the most difficult target tests. Furthermore, table 3 provides a time-efficient comparison of the respective methods, and our method is also a real-time target detection method considering that it itself has used a 2D semantic segmentation method, which consumes too much time.
The experimental results are as follows:
table 1 counts the AP (%) values of 3D detection on the KITTI dataset.
Table 2 counts the AP (%) value of BEV detection on the KITTI dataset.
Table 3 counts the time(s) required for each method to process a scene over the KITTI dataset.
TABLE 1 AP-value comparison of 3D detection on KITTI data set
Figure BDA0002352695660000071
TABLE 2 AP-value comparison of BEV detection on KITTI data set
Figure BDA0002352695660000072
TABLE 3 comparison of time spent by each method on KITTI data sets
MV3D MV3D(LIDAR) F-PointNet AVOD AVOD-FCN VoxelNet ours
0.36 0.24 0.17 0.08 0.10 0.23 0.18
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (7)

1. A3D target detection algorithm for carrying out point cloud screening based on image semantic features is characterized by comprising the following steps:
step (1): performing semantic segmentation on the two-dimensional image on the image data to obtain semantic prediction;
step (2): projecting the semantic prediction into a point cloud space, and screening points of a specific category to form a view cone;
and (3): building a 3D target detection network, and taking a viewing cone as the input of a 3D target detector;
and (4): enhancing the sensitivity of the loss function to the position of the 3D target frame;
and (5): and obtaining a total objective function and carrying out algorithm optimization.
2. The 3D target detection algorithm for point cloud screening based on image semantic features as claimed in claim 1, wherein: the specific method for performing semantic segmentation on the image data in the step (1) is as follows:
images were segmented using the deplab v3+ semantic segmentation method: firstly, manually labeling the image part of a training set in a data set; then, 200 epochs are pre-trained on a Cityscapes data set by DeepLabv3+, and then 50 epochs are finely adjusted on a manually marked semantic label data set; the resulting semantic segmentation network is trained to classify each pixel in the picture as one of 19 classes.
3. The 3D object detection algorithm for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (2), based on the result predicted by the 2D semantic segmentation method, the region of each category in each image is projected into the LIDAR point cloud space by using a known projection matrix, and the region corresponding to the LIDAR point cloud space has a category attribute consistent with the image region; points about vehicles, pedestrians and cyclists are then screened from the original point cloud and extracted to form viewing cones.
4. The 3D object detection algorithm for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (3), a deep object detection network is constructed by using a pytorech, and the network comprises three parts: point cloud feature extractor using mesh, convolution intermediate extraction layer and regional pre-selection network RPN:
in a grid point cloud feature extractor, orderly cutting the whole view cone by using a 3D grid with a set size, and sending all points in each grid to the grid feature extractor, wherein the grid feature extractor consists of a linear layer, a batch normalization layer BatchNorm and a nonlinear activation layer ReLU;
in the convolution intermediate layer, 3 convolution intermediate modules are used, each convolution intermediate module is formed by sequentially connecting a 3D convolution layer, a batch normalization layer and a nonlinear activation layer, the output of the grid point cloud extractor is used as the input, and the feature with the 3D structure is converted into a 2D pseudo-graph feature which is used as the output;
the input of the regional preselection network RPN is provided by a convolution intermediate layer, the architecture of the PRN consists of three full convolution modules, each module containing a downsampled convolution layer followed by two convolution layers corresponding to the characteristic image size, after each convolution layer the BatchNorm and ReLU operations are applied; then, the output of each block is up-sampled to feature maps with the same size, and the feature maps are connected into a whole; finally, three 1 × 1 2D convolutional layers are applied to the desired learning objective to generate: (1) probability score plot, (2) regression bias, and (3) directional prediction.
5. The 3D target detection algorithm for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (4), an overall loss function L is added to the modeltotalAs follows:
Ltotal=β1Lcls2(Lreg_θ+Lreg_other)+β3Ldir4Lcorner
wherein L isclsTo predict the loss of classification, Lreg_θPredicted angle loss for 3D bounding box, Lreg_otherTo predict the loss of correction for the remaining parameters of the 3Dbounding box, LdirTo predict loss of direction, LcornerTo predict the loss of coordinates of the vertices of the 3D bounding box β12β34Are hyper-parameters, set to 1.0, 2.0, 0.2 and 0.5, respectively;
for Lreg_θAnd Lreg_otherThe following variables were used:
Figure RE-FDA0002398050340000021
Figure RE-FDA0002398050340000022
Δθ=θga
wherein the content of the first and second substances,
Figure RE-FDA0002398050340000023
wg,lg,hggthe parameters for each bounding box provided for the tag,
Figure RE-FDA0002398050340000024
wa,la,haais a prediction parameter of an anchor point, where xc,yc,zcW, l, h and theta respectively refer to the central coordinate of the three-dimensional bounding box, the length, the width, the height and the overlooking course angle of the three-dimensional bounding box; wherein d isa=(la)2+(wa)2The length of the diagonal line of the anchor point floor; for the predicted angle thetapAngle loss Lreg_θThe concrete expression is as follows:
Lreg_θ=SnoothL1(sin(θp-Δθ))
correcting the loss L for a parameterreg_otherIn particular, it is the SmoothL1 function of the differences Δ x, Δ y, Δ z, Δ w, Δ L, Δ h, Δ θ, while the loss of coordinates of the vertices L of the 3D bounding boxcornerThe composition of (A) is as follows:
Figure RE-FDA0002398050340000025
wherein NS, NH traverses all bounding boxes; p, P*,P**Denotes the predicted bounding box vertex, the vertex of the label bounding box, the vertex of the inverse bounding box, deltaijTo balance the coefficients, i, j are the indices of the targets generated by the final profile.
6. The 3D target detection algorithm for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (4), the positive and negative anchor point balance is adjusted using focal local:
FL(pt)=-αt(1-pt)γlog(pt)
wherein p istIs the estimated probability of the model, αtAnd γ is a super parameter adjustment coefficient, set to 0.5 and 2, respectively.
7. The 3D object detection algorithm for point cloud screening based on image semantic features as claimed in claim 1, wherein in step (5), the whole model is trained according to steps (2), (3), (4), that is, the 3D object detection network is trained on KITTI data set, and the specific parameters and implementation method are as follows: training 20 ten thousand times, 160epochs, on a 1080Ti GPU using a random gradient descent SGD and Adam optimizer; the initial learning rate was set to 0.0002, the exponential decay factor was 0.8 and decayed every 15 epochs.
CN202010000186.6A 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features Active CN111145174B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010000186.6A CN111145174B (en) 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010000186.6A CN111145174B (en) 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features

Publications (2)

Publication Number Publication Date
CN111145174A true CN111145174A (en) 2020-05-12
CN111145174B CN111145174B (en) 2022-08-09

Family

ID=70523228

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010000186.6A Active CN111145174B (en) 2020-01-02 2020-01-02 3D target detection method for point cloud screening based on image semantic features

Country Status (1)

Country Link
CN (1) CN111145174B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183358A (en) * 2020-09-29 2021-01-05 新石器慧拓(北京)科技有限公司 Training method and device for target detection model
CN112184589A (en) * 2020-09-30 2021-01-05 清华大学 Point cloud intensity completion method and system based on semantic segmentation
CN112200303A (en) * 2020-09-28 2021-01-08 杭州飞步科技有限公司 Laser radar point cloud 3D target detection method based on context-dependent encoder
CN112464905A (en) * 2020-12-17 2021-03-09 湖南大学 3D target detection method and device
CN112541081A (en) * 2020-12-21 2021-03-23 中国人民解放军国防科技大学 Migratory rumor detection method based on field self-adaptation
CN112562093A (en) * 2021-03-01 2021-03-26 湖北亿咖通科技有限公司 Object detection method, electronic medium, and computer storage medium
CN112598635A (en) * 2020-12-18 2021-04-02 武汉大学 Point cloud 3D target detection method based on symmetric point generation
GB2591171A (en) * 2019-11-14 2021-07-21 Motional Ad Llc Sequential fusion for 3D object detection
CN113343886A (en) * 2021-06-23 2021-09-03 贵州大学 Tea leaf identification grading method based on improved capsule network
CN113378760A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Training target detection model and method and device for detecting target
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN114677568A (en) * 2022-05-30 2022-06-28 山东极视角科技有限公司 Linear target detection method, module and system based on neural network
US11500063B2 (en) 2018-11-08 2022-11-15 Motional Ad Llc Deep learning for object detection using pillars
CN116385452A (en) * 2023-03-20 2023-07-04 广东科学技术职业学院 LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph
CN116912238A (en) * 2023-09-11 2023-10-20 湖北工业大学 Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
US20190108639A1 (en) * 2017-10-09 2019-04-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Semantic Segmentation of 3D Point Clouds
CN109784333A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Based on an objective detection method and system for cloud bar power channel characteristics
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190108639A1 (en) * 2017-10-09 2019-04-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and Methods for Semantic Segmentation of 3D Point Clouds
CN109523552A (en) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 Three-dimension object detection method based on cone point cloud
CN109784333A (en) * 2019-01-22 2019-05-21 中国科学院自动化研究所 Based on an objective detection method and system for cloud bar power channel characteristics
CN110032962A (en) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 A kind of object detecting method, device, the network equipment and storage medium

Cited By (26)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11500063B2 (en) 2018-11-08 2022-11-15 Motional Ad Llc Deep learning for object detection using pillars
US11214281B2 (en) 2019-11-14 2022-01-04 Motional Ad Llc Sequential fusion for 3D object detection
GB2591171B (en) * 2019-11-14 2023-09-13 Motional Ad Llc Sequential fusion for 3D object detection
US11634155B2 (en) 2019-11-14 2023-04-25 Motional Ad Llc Sequential fusion for 3D object detection
GB2591171A (en) * 2019-11-14 2021-07-21 Motional Ad Llc Sequential fusion for 3D object detection
CN112200303A (en) * 2020-09-28 2021-01-08 杭州飞步科技有限公司 Laser radar point cloud 3D target detection method based on context-dependent encoder
CN112200303B (en) * 2020-09-28 2022-10-21 杭州飞步科技有限公司 Laser radar point cloud 3D target detection method based on context-dependent encoder
CN112183358B (en) * 2020-09-29 2024-04-23 新石器慧通(北京)科技有限公司 Training method and device for target detection model
CN112183358A (en) * 2020-09-29 2021-01-05 新石器慧拓(北京)科技有限公司 Training method and device for target detection model
US11315271B2 (en) 2020-09-30 2022-04-26 Tsinghua University Point cloud intensity completion method and system based on semantic segmentation
CN112184589A (en) * 2020-09-30 2021-01-05 清华大学 Point cloud intensity completion method and system based on semantic segmentation
CN112464905A (en) * 2020-12-17 2021-03-09 湖南大学 3D target detection method and device
CN112464905B (en) * 2020-12-17 2022-07-26 湖南大学 3D target detection method and device
CN112598635A (en) * 2020-12-18 2021-04-02 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112598635B (en) * 2020-12-18 2024-03-12 武汉大学 Point cloud 3D target detection method based on symmetric point generation
CN112541081A (en) * 2020-12-21 2021-03-23 中国人民解放军国防科技大学 Migratory rumor detection method based on field self-adaptation
CN112541081B (en) * 2020-12-21 2022-09-16 中国人民解放军国防科技大学 Migratory rumor detection method based on field self-adaptation
CN112562093A (en) * 2021-03-01 2021-03-26 湖北亿咖通科技有限公司 Object detection method, electronic medium, and computer storage medium
CN113343886A (en) * 2021-06-23 2021-09-03 贵州大学 Tea leaf identification grading method based on improved capsule network
CN113378760A (en) * 2021-06-25 2021-09-10 北京百度网讯科技有限公司 Training target detection model and method and device for detecting target
CN113984037B (en) * 2021-09-30 2023-09-12 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate frame in any direction
CN113984037A (en) * 2021-09-30 2022-01-28 电子科技大学长三角研究院(湖州) Semantic map construction method based on target candidate box in any direction
CN114677568A (en) * 2022-05-30 2022-06-28 山东极视角科技有限公司 Linear target detection method, module and system based on neural network
CN116385452A (en) * 2023-03-20 2023-07-04 广东科学技术职业学院 LiDAR point cloud panorama segmentation method based on polar coordinate BEV graph
CN116912238A (en) * 2023-09-11 2023-10-20 湖北工业大学 Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion
CN116912238B (en) * 2023-09-11 2023-11-28 湖北工业大学 Weld joint pipeline identification method and system based on multidimensional identification network cascade fusion

Also Published As

Publication number Publication date
CN111145174B (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN111145174B (en) 3D target detection method for point cloud screening based on image semantic features
CN111598030B (en) Method and system for detecting and segmenting vehicle in aerial image
CN108647585B (en) Traffic identifier detection method based on multi-scale circulation attention network
CN109886066B (en) Rapid target detection method based on multi-scale and multi-layer feature fusion
CN111832655B (en) Multi-scale three-dimensional target detection method based on characteristic pyramid network
CN111640125B (en) Aerial photography graph building detection and segmentation method and device based on Mask R-CNN
CN111461212B (en) Compression method for point cloud target detection model
CN111695514B (en) Vehicle detection method in foggy days based on deep learning
CN110084817B (en) Digital elevation model production method based on deep learning
CN113160062B (en) Infrared image target detection method, device, equipment and storage medium
CN110909623B (en) Three-dimensional target detection method and three-dimensional target detector
CN104134234A (en) Full-automatic three-dimensional scene construction method based on single image
CN112560675B (en) Bird visual target detection method combining YOLO and rotation-fusion strategy
CN109583483A (en) A kind of object detection method and system based on convolutional neural networks
CN109801297B (en) Image panorama segmentation prediction optimization method based on convolution
CN111640116B (en) Aerial photography graph building segmentation method and device based on deep convolutional residual error network
CN113191204B (en) Multi-scale blocking pedestrian detection method and system
CN111738206A (en) Excavator detection method for unmanned aerial vehicle inspection based on CenterNet
CN114463736A (en) Multi-target detection method and device based on multi-mode information fusion
CN114519819B (en) Remote sensing image target detection method based on global context awareness
CN115424017B (en) Building inner and outer contour segmentation method, device and storage medium
CN108074232A (en) A kind of airborne LIDAR based on volume elements segmentation builds object detecting method
CN111738114A (en) Vehicle target detection method based on anchor-free accurate sampling remote sensing image
EP4174792A1 (en) Method for scene understanding and semantic analysis of objects
CN115115917A (en) 3D point cloud target detection method based on attention mechanism and image feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 210000, 66 new model street, Gulou District, Jiangsu, Nanjing

Applicant after: NANJING University OF POSTS AND TELECOMMUNICATIONS

Address before: Yuen Road Qixia District of Nanjing City, Jiangsu Province, No. 9 210000

Applicant before: NANJING University OF POSTS AND TELECOMMUNICATIONS

GR01 Patent grant
GR01 Patent grant