CN114882224B - Model structure, model training method, singulation method, device and medium - Google Patents

Model structure, model training method, singulation method, device and medium Download PDF

Info

Publication number
CN114882224B
CN114882224B CN202210629730.2A CN202210629730A CN114882224B CN 114882224 B CN114882224 B CN 114882224B CN 202210629730 A CN202210629730 A CN 202210629730A CN 114882224 B CN114882224 B CN 114882224B
Authority
CN
China
Prior art keywords
feature
feature vector
point cloud
ground
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210629730.2A
Other languages
Chinese (zh)
Other versions
CN114882224A (en
Inventor
谭可成
刘昊
何维
刘承照
许强红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PowerChina Zhongnan Engineering Corp Ltd
Original Assignee
PowerChina Zhongnan Engineering Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PowerChina Zhongnan Engineering Corp Ltd filed Critical PowerChina Zhongnan Engineering Corp Ltd
Priority to CN202210629730.2A priority Critical patent/CN114882224B/en
Publication of CN114882224A publication Critical patent/CN114882224A/en
Application granted granted Critical
Publication of CN114882224B publication Critical patent/CN114882224B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a model structure, a model training method, a singulation method, equipment and a medium, wherein the model training method comprises the steps of obtaining original three-dimensional point cloud data of ground objects of a large scene; manufacturing the original three-dimensional point cloud data into a standard sample format file; preprocessing the point cloud sample in the standard sample format file to generate a PKL format sample file; constructing a large-scene ground object monomer model, wherein the large-scene ground object monomer model comprises a coding module, a backbone network, a target generation module, a feature fusion module, a Point-RoIAlign module and an instance prediction network; and training the large-scene ground feature monomerization model by using the point cloud sample in the PKL format sample file to obtain a trained large-scene ground feature monomerization model. According to the invention, the prediction of a single ground object is realized by minimizing the matching cost function, and the final ground object segmentation is realized by the point mask prediction, so that the defects of the traditional processing means such as clustering and the like are effectively eliminated.

Description

Model structure, model training method, singulation method, device and medium
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a model structure, a large-scene ground feature monomer model training method, a large-scene ground feature monomer method, electronic equipment and a computer readable storage medium.
Background
The three-dimensional modeling of oblique photography has become an important means for large-scale large-scene three-dimensional reconstruction due to the advantages of high efficiency, high sense of reality and low production cost, but the oblique photography three-dimensional model cannot be used for independently selecting single ground objects due to the limitation of a data structure, so that the value and the practicability of model data are reduced. Taking the land shifting as an example, statistics of areas such as houses, farmlands, forest lands and the like are mainly carried out actual measurement by a large number of field investigation personnel through field visit and measurement or manual sketching is carried out through satellite images, and the statistics is extremely difficult. Therefore, the oblique photography singulation technique is a bottleneck that needs to be broken through.
The ground object identification mode based on the remote sensing image has the problems that the house can only extract the roof area of the eave, the cement roof and the cement ground are difficult to distinguish, the roof is covered by the tree, and the like; and the two-dimensional image only contains RGB color information, and can not be linked with the three-dimensional model during use.
Compared with a two-dimensional image, the three-dimensional point cloud has richer spatial structure information, and is more advantageous for acquiring local detail features in the oblique photography singulation process. With the application of deep learning in the field of three-dimensional point cloud, the monomerization based on the point cloud data becomes a new solution idea.
In patent literature with application publication number CN113822914a, named oblique photogrammetry model singulation method, computer device, product, and medium, three-dimensional point cloud large scene ground feature singulation is realized by clustering, but in fact, it is very difficult to directly cluster a point cloud into multiple instance objects, which is caused by the following reasons:
(1) A point cloud typically contains a large number of points, resulting in extremely slow clustering efficiency;
(2) The number of instances in different 3D scenes typically varies greatly, and the clustering algorithm cannot adaptively adjust parameters;
(3) The scale difference of the examples is obvious, some of the similar ground features are very small in size, some of the similar ground features are huge in volume, and the examples are difficult to extract in the integrity of the clustering algorithm;
(4) Each point has only one very weak feature, namely 3D coordinates and color; resulting in a huge semantic gap between point and instance definitions.
Therefore, the singulation method is generally easy to over-cut or under-cut for large-scene ground feature segmentation, and the technical route is too ideal to realize application.
Disclosure of Invention
The invention aims to provide a model structure, a model training method, a monomerization method, equipment and a medium, so as to solve the problem that small target ground objects are difficult to be monomerized and segmented under a large three-dimensional point cloud scene, and the problem that the ground object monomerization efficiency is low and the accuracy is poor under the large scene by a clustering algorithm.
The invention solves the technical problems by the following technical scheme: a structure of a model, comprising:
the encoding module is used for encoding the large scene ground feature point cloud in the PKL format into an input vector;
the backbone network is used for extracting the characteristics of the input vector to obtain a first characteristic vector;
the target generation module is used for carrying out feature extraction on the first feature vector to obtain a global feature vector, and carrying out feature extraction on the global feature vector to obtain a second feature vector; calculating the second feature vector to obtain a third feature vector, and carrying out normalization processing on each element in the third feature vector to obtain the confidence score of each candidate frame; calculating the second feature vector to obtain a fifth feature vector, wherein each (1, 6) dimension of the fifth feature vector represents a maximum coordinate point and a minimum coordinate point of a candidate frame; splicing the maximum coordinate point, the minimum coordinate point and the corresponding confidence score of the candidate frame to obtain a parameter vector of the candidate frame;
the feature fusion module is used for extracting features of the first feature vector to obtain a sixth feature vector; splicing the sixth feature vector and the global feature vector, and extracting features to obtain an eighth feature vector;
the Point-RoIAlign module is used for carrying out coordinate mapping processing on the parameter vector and the eighth feature vector of the candidate frame to obtain a Point cloud set corresponding to each candidate frame;
and the example prediction network is used for outputting a prediction Point cloud set of a single ground feature according to the Point cloud set of each candidate frame output by the Point-RoIAlign module.
Further, the backbone network adopts a RandLA-Net structure.
Further, the target generation module comprises a first feature extraction layer, a second feature extraction layer, a prediction branch, a regression branch and a splicing layer;
the first feature extraction layer comprises 1 MLP layer, and the first feature extraction layer performs feature extraction on the first feature vector by using the 1 MLP layer to obtain a global feature vector;
the second feature extraction layer comprises 2 MLP layers, and the second feature extraction layer performs feature extraction on the global feature vector by using the 2 MLP layers to obtain a second feature vector;
the prediction branch comprises a first full-connection layer and a first activation layer, the second feature vector is calculated through the first full-connection layer to obtain a third feature vector, and each element in the third feature vector is normalized through the first activation layer to obtain the confidence score of each candidate frame;
the regression branch comprises a second full-connection layer, and the second feature vector is calculated through the second full-connection layer to obtain a fifth feature vector;
and the splicing layer is used for splicing the maximum coordinate point, the minimum coordinate point and the corresponding confidence score of the candidate frame to obtain the parameter vector of the candidate frame.
Further, the feature fusion module comprises a third feature extraction layer, a splicing layer and a fourth feature extraction layer;
the third feature extraction layer comprises 2 MLP layers, the third feature extraction layer performs feature extraction on the first feature vector by using 1 MLP layer to obtain a point feature vector, and then performs feature extraction on the point feature vector by using another 1 MLP layer to obtain a sixth feature vector;
the splicing layer is used for splicing the sixth feature vector and the global feature vector to obtain a seventh feature vector;
the fourth feature extraction layer comprises 2 MLP layers, and the fourth feature extraction layer performs depth feature extraction on the seventh feature vector by using the 2 MLP layers to obtain an eighth feature vector.
Further, the instance prediction network comprises a fifth feature extraction layer, a Mask prediction branch and an instance output layer;
the fifth feature extraction layer adopts a PointNet network structure, and performs feature extraction on the Point cloud set of the candidate frame output by the Point-RoIALign module by using the PointNet network structure to obtain a ninth feature vector;
the Mask prediction branch comprises an MLPs layer and a second activation layer, and the ninth feature vector is calculated through the MLPs layer and the second activation layer to obtain a prediction Mask of the ground object;
the example output layer is configured to reject noise points in the ninth feature vector by using a prediction mask to obtain a tenth feature vector; and calculating the tenth feature vector through the MLPs layer and the third activation layer to obtain the confidence score of each ground feature, selecting the category with the highest confidence score as the prediction category of the ground feature, and outputting the prediction point cloud set of the ground features of different categories.
The invention also provides a large-scene ground feature monomer model training method, which comprises the following steps:
acquiring original three-dimensional point cloud data of a large-scene ground object;
manufacturing the original three-dimensional point cloud data into a standard sample format file;
preprocessing the point cloud sample in the standard sample format file to generate a PKL format sample file;
constructing a large scene ground feature monomalization model, wherein the large scene ground feature monomalization model comprises:
the encoding module is used for encoding the large scene ground feature point cloud in the PKL format into an input vector;
the backbone network is used for extracting the characteristics of the input vector to obtain a first characteristic vector;
the target generation module is used for carrying out feature extraction on the first feature vector to obtain a global feature vector, and carrying out feature extraction on the global feature vector to obtain a second feature vector; calculating the second feature vector to obtain a third feature vector, and carrying out normalization processing on each element in the third feature vector to obtain the confidence score of each candidate frame; calculating the second feature vector to obtain a fifth feature vector, wherein each (1, 6) dimension of the fifth feature vector represents a maximum coordinate point and a minimum coordinate point of a candidate frame; splicing the maximum coordinate point, the minimum coordinate point and the corresponding confidence score of the candidate frame to obtain a parameter vector of the candidate frame;
the feature fusion module is used for extracting features of the first feature vector to obtain a sixth feature vector; splicing the sixth feature vector and the global feature vector, and extracting features to obtain an eighth feature vector;
the Point-RoIAlign module is used for carrying out coordinate mapping processing on the parameter vector and the eighth feature vector of the candidate frame to obtain a Point cloud set corresponding to each candidate frame;
an example prediction network, configured to output a prediction Point cloud set of a single ground feature according to a Point cloud set of each candidate frame output by the Point-RoIAlign module;
and training the large-scene ground feature monomerization model by using the point cloud sample in the PKL format sample file to obtain a trained large-scene ground feature monomerization model.
Further, the specific implementation process of manufacturing the original three-dimensional point cloud data into the standard sample format file comprises the following steps:
importing the original three-dimensional point cloud data into CloudCompare software, and manually dividing each real ground object by utilizing a cutting function of the CloudCompare software;
labeling a classification label mask for each real ground object, merging all real ground objects with the classification label mask, and deriving a txt format point cloud file;
and converting the txt format point cloud file into a Senmantic3d data set format to obtain the standard sample format file.
Further, the specific implementation process of preprocessing the point cloud sample of the standard sample format file is as follows:
performing grid sampling on the point cloud samples in the standard sample format file;
and carrying out normalization processing on the sampled sample data, and establishing a data index structure on the sample data subjected to normalization processing by using a Kd tree algorithm to generate a PKL format sample file.
Further, the specific implementation process for training the large-scene ground feature monomerization model comprises the following steps:
constructing an objective function, and solving an optimal matching index matrix, wherein the specific expression of the objective function is as follows:
wherein A is an optimal allocation index matrix, H is the number of candidate frames, T is the number of boundary frames of a real ground object, A ij For the matching coefficient of the ith candidate frame and the jth boundary frame, when A ij When=1, it means that the ith candidate box is associated with the jth bounding box, when a ij When=0, the i candidate box is not associated with the j boundary box, C ij To assign the ith candidate to the associated cost of the jth bounding box;
searching corresponding candidate frames for each boundary frame according to the optimal matching index matrix to obtain T candidate frames matched with each boundary frame;
parameter optimization is carried out on the T candidate frames through a loss function, so that the coordinate value of each candidate frame approximates to the coordinate value of the boundary frame matched with the candidate frame, and the loss function expression is as follows:
wherein C is tt To assign a t candidate to an associated cost of the t bounding box;
optimizing the confidence scores of the T candidate frames to enable the confidence scores of the T candidate frames to approach 1, and setting the confidence scores of the rest H-T candidate frames to be 0, wherein the confidence score optimizing function has the following expression:
wherein,a confidence score assigned to the first candidate box;
training the predicted mask according to the predicted mask and the classification label mask obtained by the example prediction network calculation to obtain a trained mask; the predictive mask training loss function expression is as follows:
wherein N is ins For the number of ground object examples, N i Points, iou, for the ith feature instance i Intersection as the ith ground object exampleThe mixing ratio, L mask Loss value of mask, y j The label of the point in the ground object example is that the positive label is 1, the negative label is 0,the probability of the point prediction being a positive label for the ground object example; sign () is a sign function, when iou i At > 0.5 sign (iou i > 0.5) =1; when iou i At a value of less than or equal to 0.5, sign (iou) i >0.5)=0;
And removing noise points by using the trained mask, calculating the confidence score of the ground object, selecting the category with the highest confidence score as the prediction category of the ground object, and outputting the prediction point cloud set of the ground objects of different categories.
The invention also provides a large-scene ground feature monomerization method, which comprises the following steps:
acquiring original three-dimensional point cloud data of a ground object of a target scene;
converting and preprocessing the original three-dimensional point cloud data to generate a PKL format file;
classifying and predicting the point clouds in the PKL format file by using the large-scene ground feature monomerization model trained by the large-scene ground feature monomerization model training method to obtain a classification label of each point cloud;
and outputting a point cloud set of a single ground object according to the classification label of each point cloud, so as to realize the ground object individualization.
Further, the specific implementation process of converting and preprocessing the original three-dimensional point cloud data is as follows:
converting the original three-dimensional point cloud data into a Senmantic3d data set format;
grid sampling and normalization processing are carried out on three-dimensional point cloud data in a Senmanic 3d data set format, an index structure is built on the data after normalization processing by using a Kd tree algorithm, and PKL format files are generated.
The invention also provides electronic equipment, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the large-scene ground feature monomalization model training method when running the computer program.
The present invention also provides a computer readable storage medium, which is a non-volatile storage medium or a non-transitory storage medium, having stored thereon a computer program which, when executed by a processor, performs the steps of the large-scene ground feature monomers model training method as described above.
Advantageous effects
Compared with the prior art, the invention has the advantages that:
according to the model structure, the model training method, the singulation method, the device and the medium, the single ground feature is predicted by minimizing the matching cost function, and the final ground feature segmentation is realized by the point mask prediction, so that defects of traditional processing means such as clustering are effectively eliminated, and compared with the traditional means, the method has higher precision and efficiency.
Drawings
In order to more clearly illustrate the technical solutions of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawing in the description below is only one embodiment of the present invention, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a large scene ground feature monomer model training method in an embodiment of the invention;
FIG. 2 is a network structure diagram of a large scene ground feature monomer model in an embodiment of the invention;
FIG. 3 is a network configuration diagram of a target generation module in an embodiment of the invention;
FIG. 4 is a diagram of an example predictive network architecture in an embodiment of the invention;
FIG. 5 is original three-dimensional point cloud data for scene one in an embodiment of the invention;
FIG. 6 is original three-dimensional point cloud data for scene two in an embodiment of the invention;
FIG. 7 is a diagram showing the recognition result of scene one by the method according to the embodiment of the present invention;
FIG. 8 is a diagram showing the recognition result of scene two by the method according to the embodiment of the present invention;
FIG. 9 is an enlarged view of the recognition result of scene two in the embodiment of the present invention;
fig. 10 is an enlarged view of the connected object recognition result of scene two in the embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made more apparent and fully by reference to the accompanying drawings, in which it is shown, however, only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The technical scheme of the present application is described in detail below with specific examples. The following embodiments may be combined with each other, and some embodiments may not be repeated for the same or similar concepts or processes.
As shown in fig. 1, the method for training the large-scene ground feature monomerized model provided by the embodiment of the invention comprises the following steps:
s1, acquiring original three-dimensional point cloud data of the ground object of the large scene.
S2, manufacturing the original three-dimensional point cloud data obtained in the step S1 into a standard sample format file.
The original three-dimensional point cloud data is in PLY format, each point comprises (x, y, z, r, g, b) six-dimensional information, x, y, z represents three-dimensional coordinates of the point, and r, g, b represents RGB information of the point. In order to train a model by utilizing the original three-dimensional point cloud data, a standard sample (namely, a sample of the ground object monomers) is firstly manufactured according to the original three-dimensional point cloud data, and the specific implementation process is as follows:
s11, importing original three-dimensional point cloud data into CloudCompare software, manually dividing each real ground object by utilizing a cutting function of the CloudCompare software, namely, drawing a boundary frame by utilizing the cutting function, and dividing each real ground object;
s12, labeling a classification label mask for each real ground object, merging all real ground objects with the classification label mask, and deriving a txt format point cloud file;
s13, extracting the front 7 columns of data of the txt format point cloud file, storing 1-6 columns of data in the data into a txt format point cloud data file, storing the 7 th column of data into a txt format tag file, namely converting the txt format point cloud file into a Senmanic 3d data set format, and obtaining a standard sample format file. Each row of the txt format point cloud file represents a point, each row has N columns, the first 7 columns of data respectively represent x, y, z, r, g, b, label, and label represents class labels, and the labels are represented by numbers 1 to N.
S3, preprocessing the point cloud sample in the standard sample format file in the step S2 to generate a PKL format sample file in order to adapt to the input data format requirement of the model, wherein the specific implementation process is as follows:
s31, performing grid sampling on point cloud samples in a standard sample format file; in this embodiment, the sampling rate is set to 0.06;
s31, carrying out normalization processing on the sampled sample data, and establishing an index structure on the sample data after normalization processing by using a Kd tree algorithm to generate a PKL format sample file.
In this embodiment, the sample data after normalization is processed by using the Kd-Tree algorithm, and a PKL format sample file is generated as in the prior art, which can be seen in Hu, qingyong, et al, "Randla-net: efficient semantic segmentation of large-scale point groups," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Reconnaistion.2020.
S4, constructing a large-scene ground feature monomerization model
As shown in fig. 2, the structure of the large-scene ground feature monomerization model includes an encoding module, a backbone network, a target generation module, a feature fusion module, a Point-RoIAlign module, and an instance prediction network.
The encoding module encodes the large-scene ground feature point cloud samples in the PKL format into input vectors (N, d), wherein N is the number of point clouds, and d is the characteristic dimension of each point cloud; in this embodiment, d is 6, i.e., (x, y, z, r, g, b) for each point cloud.
The backbone network adopts a RandLA-Net structure (the RandLA-Net network structure can be seen in Hu, qingyong, et al, "RandLANet: efficient semantic segmentation of large-scale point groups," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern recovery.2020.), so as to extract the point cloud characteristics, and the random sampling strategy and the characteristic aggregation module of the RandLA-Net structure are effectively applicable to the characteristic extraction of large-scale point cloud data. The backbone network adopts a RandLA-Net structure to perform feature extraction on the input vector (N, 6) to obtain a first feature vector (N/256, 512).
As shown in fig. 3, the target generating module includes 3 MLP layers, a prediction branch, a regression branch and a splicing layer, and the target generating module performs feature extraction on the first feature vector (N/256, 512) by using 1 MLP layer to obtain a global feature vector (1, k), wherein the global feature vector is a one-dimensional vector of 1×k, k is a feature dimension, and the value of k depends on the structure of the MLP layer; extracting features of the global feature vector (1, k) by using 2 MLP layers to obtain a second feature vector (1, 256), respectively inputting the second feature vector (1, 256) into a prediction branch and a regression branch, and predicting confidence scores of all candidate frames through the prediction branchNamely confidence score of single prediction feature +.>And determining the range of the corresponding candidate frame by the maximum coordinate point and the minimum coordinate point of the regression branch candidate frame. The prediction branch comprises a full connection layer and an activation layer, and is communicated withCalculating the second feature vector (1, 256) through the full connection layer fc to obtain a third feature vector (1, H), wherein H is the number of candidate frames (or the number of predicted features), and normalizing each element in the third feature vector (1, H) to [0,1 ] through the activation layer sigmoid]Interval, get confidence score of each candidate frame +.>The regression branch comprises a full connection layer, and the second feature vector (1, 256) is calculated through the full connection layer fc to obtain a fifth feature vector (1,6H), wherein each (1, 6) dimension of the fifth feature vector (1,6H) represents a maximum coordinate point and a minimum coordinate point of the candidate frame. The splicing layer splices the maximum coordinate point, the minimum coordinate point and the corresponding confidence score of each candidate frame to obtain a parameter vector +_of the candidate frame>Wherein (1)>Coordinates of the maximum coordinate point of the candidate frame, +.>Is the coordinates of the smallest coordinate point of the candidate frame.
The feature fusion module comprises 5 MLP layers and a splicing layer, wherein the feature extraction is firstly carried out on a first feature vector (N/256, 512) by using 1 MLP layer to obtain a point feature vector (N/256, k), and then the feature extraction is carried out on the point feature vector (N/256, k) by using 1 MLP layer to obtain a sixth feature vector (N/256,256); and splicing the sixth feature vector (N/256,256) and the global feature vector (1, k) (k=256 in the embodiment) by using a splicing layer to obtain a seventh feature vector (N/256, 512), and performing deep feature extraction on the seventh feature vector (N/256, 512) through two MLPs to obtain an eighth feature vector (N/256,128).
The Point-RoIAlign module pairs parameter vectors of candidate boxesAnd the eighth feature vector (N/256,128) is subjected to coordinate mapping processing to obtain a point cloud set of each candidate frame, namely, a point cloud set of each predicted ground feature.
In this example, the coordinate mapping process is prior art, see Li Yi, "GSPN: generative Shape Proposal Network for 3D Instance Segmentation in Point Cloud",2019IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp.3942-3951.
As shown in fig. 4, the example prediction network fifth feature extraction layer, mask prediction branches, and example output layer. The fifth feature extraction layer adopts a PointNet network structure (the network structure can be seen in Qi C R, su H, mo K, et al PointNet: deep learning on point sets for 3d classification and segmentation[C)]The method comprises the steps of (1) performing feature extraction on a Point cloud set (N, 6) of each candidate frame output by a Point-RoIAlign module by utilizing a PointNet network structure to obtain a ninth feature vector (N, 256); the Mask prediction branch comprises an MLPs layer and an activation layer sigmoid, and a ninth feature vector (N, 256) is calculated through the MLPs layer and a second activation layer to obtain a prediction Mask of the ground object; an example output layer performs Hadamard product operation by using the prediction mask and a ninth feature vector (N, 256) to remove noise points to obtain a tenth feature vector (N) 1 256), finally, the tenth feature vector (N) is sigmoid through the MLPs layer and the activation layer 1 256) calculating to obtain the confidence coefficient score of each ground feature, selecting the category with the highest confidence coefficient score as the prediction category of the ground feature, and outputting the prediction point cloud set of the ground feature of different categories.
Each MLP layer comprises a plurality of fully connected layers fc and an activation function LRelu, and each MLPs layer comprises a plurality of fully connected layers fc.
And S5, training the large-scene ground feature monomerization model by using the point cloud sample in the PKL format sample file to obtain a trained large-scene ground feature monomerization model.
Training the large-scene ground feature monomerized model comprises optimizing and adjusting candidate frames, optimizing an instance mask and optimizing instance category confidence coefficient, so that each candidate frame predicted by the model is matched with each candidate frame predicted by the modelThe bounding boxes of the real ground objects are correspondingly associated, namely, the training process is converted into the optimal matching problem. Let a be a binary matching index matrix, a= { a ij I=1, 2,3, …, H; j=1, 2,3, …, T }, H is the number of candidate boxes, T is the number of bounding boxes of the real ground object, a if and only if the ith candidate box is assigned to the jth bounding box (i.e., the ith candidate box is associated with or matches the jth bounding box) ij =1, otherwise a ij =0 (the i-th candidate box is not associated or matched with the j-th bounding box). Let C be a binary associated cost matrix, c= { C ij I=1, 2,3, …, H; j=1, 2,3, …, T }, each element C in C ij With each element A in A ij One-to-one correspondence, C ij To assign the ith candidate to the associated cost of the jth bounding box. The closer the i candidate box is to the j boundary box, the associated cost C ij The smaller the representation, the associated cost is calculated by:
wherein,representing a maximum coordinate point of the candidate frame; />Representing a minimum coordinate point of the candidate frame; />Representing a maximum coordinate point of the bounding box;representing the minimum coordinate point of the bounding box.
The problem of optimal matching between the candidate frame and the boundary frame is converted into the problem of searching an optimal matching index matrix A with the minimum total association cost, so that the constructed objective function is as follows:
wherein,indicating that each bounding box must have a candidate box associated or matched with it, +.>Indicating that there are bounding boxes in all candidate boxes that are not associated with them.
Searching corresponding candidate frames for each boundary frame according to the obtained optimal matching index matrix to obtain T candidate frames matched with each boundary frame; the number of the found candidate frames is the same as that of the boundary frames, and the number of the found candidate frames is T, and parameter tuning is carried out on the T found candidate frames through a loss function in the formula (4):
wherein C is tt Represented as an associated cost for assigning the t candidate to the t bounding box. By minimizing l box (loss value) such that the coordinate value of each candidate frame approximates to the coordinate value of the bounding box to which it matches.
Confidence score B obtained by predicting branches during model training s i is allocated to the H candidate frames of the regression branch prediction one by one, confidence scores of T candidate frames associated with the T boundary frames in the H candidate frames are optimized by using a formula (5), the confidence scores of the T candidate frames approach 1, the confidence scores of the rest H-T candidate frames are set to 0, and the T candidate frames with high confidence scores are reserved and used as subsequent inputs of a Point-RoIAlign module.
Wherein,confidence score for the t-th candidate box by minimizing l scores The value of (loss value) approximates the confidence score of T candidate boxes to 1.
The use of candidate boxes to intercept the corresponding point cloud set may incorrectly incorporate portions belonging to other feature instances, resulting in inaccurate instance predictions. Therefore, an example prediction network for further refining the example is proposed, a point cloud set (N, 6) intercepted by a candidate box is taken as an input of the example prediction network, and semantic features are extracted by using a PointNet++ network to obtain a ninth feature vector (N, 256); and a ninth feature vector (N, 256) is used as input of a mask branch, a predictive mask is obtained through the MLPs layer and the active layer sigmoid, and the predictive mask is a binary vector. Training of the predictive mask by calculating IoU values (i.e., the ratio of the intersection of the predictive mask and the label mask to the union of the predictive mask and the label mask) between the predictive mask and the class label mask as constraints, predictive masks with IoU values higher than 0.5 are used as training samples, overlapping portions between the predictive mask and the label mask are assigned positive labels, and other portions are assigned negative labels. While IoU below 0.5 is ignored and does not participate in the predictive mask training process. The predictive mask training loss function is as follows:
wherein N is ins For the number of ground object examples, N i Points, iou, for the ith feature instance i Is the intersection ratio of the ith ground object example, L mask Loss value of mask, y j The label of the point in the ground object example is that the positive label is 1, the negative label is 0,presumably, the point is predicted to be a positive label for the ground object instanceA rate; sign () is a sign function, when iou i At > 0.5 sign (iou i > 0.5) =1; when iou i At a value of less than or equal to 0.5, sign (iou) i >0.5)=0。
Performing Hadamard product operation according to the prediction mask and the ninth feature vector (N, 256) to remove noise points to obtain a tenth feature vector (N 1 256) (i.e., example feature vector), tenth feature vector (N) 1 256) obtaining confidence scores of the ground object examples through the MLPs layer and the activating layer sigmoid, taking IoU values between the predictive mask and the label mask as measurement of the quality of the predictive mask, and improving the accuracy of the confidence scores of the ground object examples by using the predictive mask.
Removing noise points according to the prediction mask, and performing category prediction training on the residual characteristics through the following formula:
wherein L is cls Confidence score for ground object example, N ins Representing the number of ground object examples, iou i Is the intersection ratio of the ith ground object example,the prediction probability of belonging to the category C for the ith ground object example is given, and M is the number of the categories; y is iC For the sign function (0 or 1), if the true category of the ith ground object instance is equal to C, 1 is taken, otherwise 0 is taken.
And sorting the prediction categories of the ground object examples according to the confidence scores, and selecting the category with the highest confidence score as the prediction category of the ground object examples, namely outputting the point cloud set of the target ground object examples.
The embodiment of the invention also provides a large-scene ground feature monomerization method, which comprises the following steps:
step 1: the original three-dimensional point cloud data of the ground object of the target scene is obtained, as shown in fig. 5 and 6, fig. 5 is the original three-dimensional point cloud data of the first scene, and fig. 6 is the original three-dimensional point cloud data of the second scene.
Step 2: converting and preprocessing original three-dimensional point cloud data to generate PKL format files, wherein the specific implementation process is as follows:
step 2.1: importing the original three-dimensional point cloud data into CloudCompare software, and then exporting txt format point cloud files;
step 2.2: extracting the first 6 columns of data of the txt format point cloud file, and storing the extracted data into a txt format point cloud data file, namely converting the txt format point cloud file into a Senmantic3d data set format;
step 2.3: grid sampling and normalization processing are carried out on three-dimensional point cloud data in a Senmanic 3d data set format, an index structure is built on the data after normalization processing by using a Kd tree algorithm, and PKL format files are generated.
Step 3: and (3) carrying out classification prediction on the point clouds in the PKL format file in the step (2) by using the large-scene ground feature monomerization model trained by the large-scene ground feature monomerization model training method to obtain classification labels of each point cloud.
Step 4: and outputting a point cloud set of a single ground object according to the classification label of each point cloud to realize ground object singulation, wherein as shown in fig. 7 and 8, fig. 7 is a recognition result of a first scene, and fig. 8 is a recognition result of a second scene. Fig. 9 is an enlarged view of the recognition result, and the black frame in fig. 9 represents the small target, which indicates that the invention can accurately recognize and monomer the small target, and solves the problem of poor recognition effect of the small target object in the prior algorithm. The black dotted line box in fig. 10 indicates that the method can effectively monomer the adhesion target, and solves the problem that the conventional clustering algorithm has poor segmentation effect on the connected objects.
The foregoing disclosure is merely illustrative of specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art will readily recognize that changes and modifications are possible within the scope of the present invention.

Claims (13)

1. A structure of a model, comprising:
the encoding module is used for encoding the large scene ground feature point cloud in the PKL format into an input vector;
the backbone network is used for extracting the characteristics of the input vector to obtain a first characteristic vector;
the target generation module is used for carrying out feature extraction on the first feature vector to obtain a global feature vector, and carrying out feature extraction on the global feature vector to obtain a second feature vector; calculating the second feature vector to obtain a third feature vector, and carrying out normalization processing on each element in the third feature vector to obtain the confidence score of each candidate frame; calculating the second feature vector to obtain a fifth feature vector, wherein each (1, 6) dimension of the fifth feature vector represents a maximum coordinate point and a minimum coordinate point of a candidate frame; splicing the maximum coordinate point, the minimum coordinate point and the corresponding confidence score of the candidate frame to obtain a parameter vector of the candidate frame;
the feature fusion module is used for extracting features of the first feature vector to obtain a sixth feature vector; splicing the sixth feature vector and the global feature vector, and extracting features to obtain an eighth feature vector;
the Point-RoIAlign module is used for carrying out coordinate mapping processing on the parameter vector and the eighth feature vector of the candidate frame to obtain a Point cloud set corresponding to each candidate frame;
and the example prediction network is used for outputting a prediction Point cloud set of a single ground feature according to the Point cloud set of each candidate frame output by the Point-RoIAlign module.
2. The structure of the model of claim 1, wherein: the backbone network adopts a RandLA-Net structure.
3. The structure of the model of claim 1, wherein: the target generation module comprises a first feature extraction layer, a second feature extraction layer, a prediction branch, a regression branch and a splicing layer;
the first feature extraction layer comprises 1 MLP layer, and the first feature extraction layer performs feature extraction on the first feature vector by using the 1 MLP layer to obtain a global feature vector;
the second feature extraction layer comprises 2 MLP layers, and the second feature extraction layer performs feature extraction on the global feature vector by using the 2 MLP layers to obtain a second feature vector;
the prediction branch comprises a first full-connection layer and a first activation layer, the second feature vector is calculated through the first full-connection layer to obtain a third feature vector, and each element in the third feature vector is normalized through the first activation layer to obtain the confidence score of each candidate frame;
the regression branch comprises a second full-connection layer, and the second feature vector is calculated through the second full-connection layer to obtain a fifth feature vector;
and the splicing layer is used for splicing the maximum coordinate point, the minimum coordinate point and the corresponding confidence score of the candidate frame to obtain the parameter vector of the candidate frame.
4. The structure of the model of claim 1, wherein: the feature fusion module comprises a third feature extraction layer, a splicing layer and a fourth feature extraction layer;
the third feature extraction layer comprises 2 MLP layers, the third feature extraction layer performs feature extraction on the first feature vector by using 1 MLP layer to obtain a point feature vector, and then performs feature extraction on the point feature vector by using another 1 MLP layer to obtain a sixth feature vector;
the splicing layer is used for splicing the sixth feature vector and the global feature vector to obtain a seventh feature vector;
the fourth feature extraction layer comprises 2 MLP layers, and the fourth feature extraction layer performs depth feature extraction on the seventh feature vector by using the 2 MLP layers to obtain an eighth feature vector.
5. The structure of a model according to any one of claims 1 to 4, characterized in that: the instance prediction network comprises a fifth feature extraction layer, mask prediction branches and an instance output layer;
the fifth feature extraction layer adopts a PointNet network structure, and performs feature extraction on the Point cloud set of the candidate frame output by the Point-RoIALign module by using the PointNet network structure to obtain a ninth feature vector;
the Mask prediction branch comprises an MLPs layer and a second activation layer, and the ninth feature vector is calculated through the MLPs layer and the second activation layer to obtain a prediction Mask of the ground object;
the example output layer is configured to reject noise points in the ninth feature vector by using a prediction mask to obtain a tenth feature vector; and calculating the tenth feature vector through the MLPs layer and the third activation layer to obtain the confidence score of each ground feature, selecting the category with the highest confidence score as the prediction category of the ground feature, and outputting the prediction point cloud set of the ground features of different categories.
6. The large-scene ground feature monomer model training method is characterized by comprising the following steps of:
acquiring original three-dimensional point cloud data of a large-scene ground object;
manufacturing the original three-dimensional point cloud data into a standard sample format file;
preprocessing the point cloud sample in the standard sample format file to generate a PKL format sample file;
constructing a large scene ground feature monomalization model, wherein the large scene ground feature monomalization model comprises:
the encoding module is used for encoding the large scene ground feature point cloud in the PKL format into an input vector;
the backbone network is used for extracting the characteristics of the input vector to obtain a first characteristic vector;
the target generation module is used for carrying out feature extraction on the first feature vector to obtain a global feature vector, and carrying out feature extraction on the global feature vector to obtain a second feature vector; calculating the second feature vector to obtain a third feature vector, and carrying out normalization processing on each element in the third feature vector to obtain the confidence score of each candidate frame; calculating the second feature vector to obtain a fifth feature vector, wherein each (1, 6) dimension of the fifth feature vector represents a maximum coordinate point and a minimum coordinate point of a candidate frame; splicing the maximum coordinate point, the minimum coordinate point and the corresponding confidence score of the candidate frame to obtain a parameter vector of the candidate frame;
the feature fusion module is used for extracting features of the first feature vector to obtain a sixth feature vector; splicing the sixth feature vector and the global feature vector, and extracting features to obtain an eighth feature vector;
the Point-RoIAlign module is used for carrying out coordinate mapping processing on the parameter vector and the eighth feature vector of the candidate frame to obtain a Point cloud set corresponding to each candidate frame;
an example prediction network, configured to output a prediction Point cloud set of a single ground feature according to a Point cloud set of each candidate frame output by the Point-RoIAlign module;
and training the large-scene ground feature monomerization model by using the point cloud sample in the PKL format sample file to obtain a trained large-scene ground feature monomerization model.
7. The method for training the large-scene ground feature monomalization model according to claim 6, wherein the specific implementation process of making the original three-dimensional point cloud data into a standard sample format file is as follows:
importing the original three-dimensional point cloud data into CloudCompare software, and manually dividing each real ground object by utilizing a cutting function of the CloudCompare software;
labeling a classification label mask for each real ground object, merging all real ground objects with the classification label mask, and deriving a txt format point cloud file;
and converting the txt format point cloud file into a Senmantic3d data set format to obtain the standard sample format file.
8. The training method of the large-scene ground feature monomalization model according to claim 6, wherein the specific implementation process of preprocessing the point cloud sample of the standard sample format file is as follows:
performing grid sampling on the point cloud samples in the standard sample format file;
and carrying out normalization processing on the sampled sample data, and establishing a data index structure on the sample data subjected to normalization processing by using a Kd tree algorithm to generate a PKL format sample file.
9. The large-scene ground feature monomalization model training method according to any one of claims 6 to 8, wherein the training of the large-scene ground feature monomalization model is specifically implemented as follows:
constructing an objective function, and solving an optimal matching index matrix, wherein the specific expression of the objective function is as follows:
wherein A is an optimal allocation index matrix, H is the number of candidate frames, T is the number of boundary frames of a real ground object, A ij For the matching coefficient of the ith candidate frame and the jth boundary frame, when A ij When=1, it means that the ith candidate box is associated with the jth bounding box, when a ij When=0, the i candidate box is not associated with the j boundary box, C ij To assign the ith candidate to the associated cost of the jth bounding box;
searching corresponding candidate frames for each boundary frame according to the optimal matching index matrix to obtain T candidate frames matched with each boundary frame;
parameter optimization is carried out on the T candidate frames through a loss function, so that the coordinate value of each candidate frame approximates to the coordinate value of the boundary frame matched with the candidate frame, and the loss function expression is as follows:
wherein C is tt To assign a t candidate to an associated cost of the t bounding box;
optimizing the confidence scores of the T candidate frames to enable the confidence scores of the T candidate frames to approach 1, and setting the confidence scores of the rest H-T candidate frames to be 0, wherein the confidence score optimizing function has the following expression:
wherein,a confidence score assigned to the first candidate box;
training the predicted mask according to the predicted mask and the classification label mask obtained by the example prediction network calculation to obtain a trained mask; the predictive mask training loss function expression is as follows:
wherein N is ins For the number of ground object examples, N i Points, iou, for the ith feature instance i Is the intersection ratio of the ith ground object example, L mask Loss value of mask, y j The label of the point in the ground object example is that the positive label is 1, the negative label is 0,the probability of the point prediction being a positive label for the ground object example; sign () is a sign function, when iou i At > 0.5 sign (iou i > 0.5) =1; when iou i At a value of less than or equal to 0.5, sign (iou) i >0.5)=0;
And removing noise points by using the trained mask, calculating the confidence score of the ground object, selecting the category with the highest confidence score as the prediction category of the ground object, and outputting the prediction point cloud set of the ground objects of different categories.
10. The method for singulating the ground objects in the large scene is characterized by comprising the following steps of:
acquiring original three-dimensional point cloud data of a ground object of a target scene;
converting and preprocessing the original three-dimensional point cloud data to generate a PKL format file;
classifying and predicting point clouds in the PKL format file by using a large-scene ground feature monomerization model trained by the large-scene ground feature monomerization model training method according to any one of claims 6 to 9 to obtain a classification label of each point cloud;
and outputting a point cloud set of a single ground object according to the classification label of each point cloud, so as to realize the ground object individualization.
11. The method for monomerizing the ground feature of the large scene as set forth in claim 10, wherein the specific implementation process of converting and preprocessing the original three-dimensional point cloud data is as follows:
converting the original three-dimensional point cloud data into a Senmantic3d data set format;
grid sampling and normalization processing are carried out on three-dimensional point cloud data in a Senmanic 3d data set format, an index structure is built on the data after normalization processing by using a Kd tree algorithm, and PKL format files are generated.
12. An electronic device, characterized in that: the method comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor executes the steps of the large-scene ground feature monomer model training method according to any one of claims 6 to 9 when the processor runs the computer program.
13. A computer-readable storage medium, the computer-readable storage medium being a non-volatile storage medium or a non-transitory storage medium, having a computer program stored thereon, characterized by: the computer program, when run by a processor, performs the steps of the large scene ground feature monomers model training method according to any of claims 6 to 9.
CN202210629730.2A 2022-06-06 2022-06-06 Model structure, model training method, singulation method, device and medium Active CN114882224B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210629730.2A CN114882224B (en) 2022-06-06 2022-06-06 Model structure, model training method, singulation method, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210629730.2A CN114882224B (en) 2022-06-06 2022-06-06 Model structure, model training method, singulation method, device and medium

Publications (2)

Publication Number Publication Date
CN114882224A CN114882224A (en) 2022-08-09
CN114882224B true CN114882224B (en) 2024-04-05

Family

ID=82679613

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210629730.2A Active CN114882224B (en) 2022-06-06 2022-06-06 Model structure, model training method, singulation method, device and medium

Country Status (1)

Country Link
CN (1) CN114882224B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660062A (en) * 2019-08-31 2020-01-07 南京理工大学 Point cloud instance segmentation method and system based on PointNet
CN111968121A (en) * 2020-08-03 2020-11-20 电子科技大学 Three-dimensional point cloud scene segmentation method based on instance embedding and semantic fusion
CN114120067A (en) * 2021-12-03 2022-03-01 杭州安恒信息技术股份有限公司 Object identification method, device, equipment and medium
WO2022088676A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11004202B2 (en) * 2017-10-09 2021-05-11 The Board Of Trustees Of The Leland Stanford Junior University Systems and methods for semantic segmentation of 3D point clouds

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110660062A (en) * 2019-08-31 2020-01-07 南京理工大学 Point cloud instance segmentation method and system based on PointNet
CN111968121A (en) * 2020-08-03 2020-11-20 电子科技大学 Three-dimensional point cloud scene segmentation method based on instance embedding and semantic fusion
WO2022088676A1 (en) * 2020-10-29 2022-05-05 平安科技(深圳)有限公司 Three-dimensional point cloud semantic segmentation method and apparatus, and device and medium
CN114120067A (en) * 2021-12-03 2022-03-01 杭州安恒信息技术股份有限公司 Object identification method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于多尺度特征和PointNet的LiDAR点云地物分类方法;赵中阳;程英蕾;释小松;秦先祥;李鑫;;激光与光电子学进展;20181007(05);全文 *
基于点云数据的三维目标识别和模型分割方法;牛辰庚;刘玉杰;李宗民;李华;;图学学报;20190415(02);全文 *

Also Published As

Publication number Publication date
CN114882224A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN111310861B (en) License plate recognition and positioning method based on deep neural network
CN111062282B (en) Substation pointer instrument identification method based on improved YOLOV3 model
CN109919108B (en) Remote sensing image rapid target detection method based on deep hash auxiliary network
CN108564097B (en) Multi-scale target detection method based on deep convolutional neural network
CN112418117B (en) Small target detection method based on unmanned aerial vehicle image
CN104599275B (en) The RGB-D scene understanding methods of imparametrization based on probability graph model
CN105701502B (en) Automatic image annotation method based on Monte Carlo data equalization
CN106845430A (en) Pedestrian detection and tracking based on acceleration region convolutional neural networks
CN113033520B (en) Tree nematode disease wood identification method and system based on deep learning
CN110033002A (en) Detection method of license plate based on multitask concatenated convolutional neural network
CN102324038B (en) Plant species identification method based on digital image
CN105956560A (en) Vehicle model identification method based on pooling multi-scale depth convolution characteristics
CN110222767B (en) Three-dimensional point cloud classification method based on nested neural network and grid map
CN103984953A (en) Cityscape image semantic segmentation method based on multi-feature fusion and Boosting decision forest
CN114821014B (en) Multi-mode and countermeasure learning-based multi-task target detection and identification method and device
CN110334584B (en) Gesture recognition method based on regional full convolution network
CN112016605A (en) Target detection method based on corner alignment and boundary matching of bounding box
CN112200846A (en) Forest stand factor extraction method fusing unmanned aerial vehicle image and ground radar point cloud
Lin et al. Building damage assessment from post-hurricane imageries using unsupervised domain adaptation with enhanced feature discrimination
CN112396655B (en) Point cloud data-based ship target 6D pose estimation method
CN110827312A (en) Learning method based on cooperative visual attention neural network
CN104050460B (en) The pedestrian detection method of multiple features fusion
CN111652273A (en) Deep learning-based RGB-D image classification method
CN115861619A (en) Airborne LiDAR (light detection and ranging) urban point cloud semantic segmentation method and system of recursive residual double-attention kernel point convolution network
CN114140665A (en) Dense small target detection method based on improved YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant