CN114792372B - Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention - Google Patents

Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention Download PDF

Info

Publication number
CN114792372B
CN114792372B CN202210709918.8A CN202210709918A CN114792372B CN 114792372 B CN114792372 B CN 114792372B CN 202210709918 A CN202210709918 A CN 202210709918A CN 114792372 B CN114792372 B CN 114792372B
Authority
CN
China
Prior art keywords
point cloud
point
plant
dimensional
semantic segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210709918.8A
Other languages
Chinese (zh)
Other versions
CN114792372A (en
Inventor
潘丹
罗琳
曾安
廖清青
杨宝瑶
张逸群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202210709918.8A priority Critical patent/CN114792372B/en
Publication of CN114792372A publication Critical patent/CN114792372A/en
Application granted granted Critical
Publication of CN114792372B publication Critical patent/CN114792372B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/005Tree description, e.g. octree, quadtree

Abstract

The invention provides a three-dimensional point cloud semantic segmentation method and a system based on multi-head two-stage attention, which comprises the following steps: acquiring a 2D sequence image of a plant and performing three-dimensional reconstruction to obtain a 3D point cloud of the plant; preprocessing and manually labeling the 3D point cloud data of the plants; in addition, in consideration of the complexity of the plant structural morphology, a multi-head two-stage attention semantic segmentation network is constructed based on an attention mechanism and used for acquiring the geometric features of the point cloud in a layering manner, predicting the semantic label of each point directly from the completely labeled point cloud data, and finally obtaining the segmentation result of the plant organ. According to the method and the system for three-dimensional point cloud semantic segmentation based on multi-head two-stage attention, a semantic segmentation network is constructed, so that a deep learning model for directly processing unordered 3D point cloud from end to end based on data driving is provided in the agricultural field, and the plant three-dimensional point cloud can be automatically and efficiently subjected to organ-level segmentation.

Description

Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention
Technical Field
The invention relates to the technical field of three-dimensional point cloud segmentation, in particular to a multi-head two-stage attention-based three-dimensional point cloud semantic segmentation method and system.
Background
The current segmentation tasks for plants include segmentation methods based on 2D images of plants and based on 3D point cloud data of plants. Methods based on 2D images are mainly classified into segmentation based on color index, segmentation based on threshold, segmentation based on learning (including supervised and unsupervised machine learning methods). Compared with computer vision and machine learning methods relying on manual features, deep learning models are applied to many agricultural phenotype researches due to strong feature extraction and autonomous learning capabilities, such as classical SegNet, U-Net, mask R-CNN models and the like, are used for semantic segmentation of plants and are superior to the feature-based methods. In order to solve the problems of dimensional constraint of a 2D image and incapability of processing overlapping and shielding between leaves, a three-dimensional reconstruction technology of plant point cloud is used for acquiring complete and shielding-free three-dimensional geometric information.
In the previous three-dimensional plant segmentation research, the segmentation of the plant three-dimensional point cloud is mostly based on the following methods: segmentation methods based on local surface features, such as local covariance matrices, tensors, surface curvatures, and the like; performing semantic segmentation on the plant 3D point cloud by using a supervised learning method, such as a support vector machine and a random forest; the plant 3D point cloud segmentation method based on the deep learning comprises a multi-view-based method, wherein a plurality of two-dimensional projections are generated from three-dimensional point cloud, a deep learning segmentation method based on two-dimensional images is applied, and then two-dimensional segmentation results at different angles are combined to obtain a final three-dimensional point cloud segmentation result; voxel-based methods such as VCNN were designed for the classification and segmentation of corn stems and leaves; point-based methods for directly processing unordered three-dimensional point clouds such as PointNet and PointNet + +, although point-based 3D deep learning develops rapidly, there is little research on plant segmentation.
The prior art discloses a point cloud example segmentation method and a point cloud example segmentation system based on PointNet, wherein a point cloud data preprocessing module carries out partitioning, sampling, translation and normalization operations; extracting a point cloud characteristic matrix by a PointNet neural network training module through a PointNet neural network; the matrix calculation module comprises a training similar network, a confidence network and a semantic segmentation network, and extracts a similar matrix, a confidence matrix and a semantic segmentation matrix of point cloud features through three network branches; and after determining the effective segmentation example group, the clustering and merging module carries out denoising and de-duplication operation to complete the segmentation of the example object. Although the scheme realizes instance segmentation of point cloud data, the scheme is not suitable for the field of three-dimensional plant segmentation, and the application of the scheme in three-dimensional plant phenotype analysis has obvious defects.
Disclosure of Invention
In order to solve at least one technical defect, the invention provides a three-dimensional point cloud semantic segmentation method and a three-dimensional point cloud semantic segmentation system based on multi-head two-stage attention, which can automatically and efficiently realize high-precision segmentation of plant three-dimensional point cloud organ levels.
In order to solve the technical problems, the technical scheme of the invention is as follows:
a three-dimensional point cloud semantic segmentation method based on multi-head two-stage attention comprises the following steps:
s1: constructing an image acquisition platform, and acquiring a high-precision and multi-angle plant 2D sequence image through a camera;
s2: performing three-dimensional reconstruction according to the collected 2D sequence image of the plant to obtain a 3D point cloud of the plant;
s3: preprocessing and manually labeling the 3D point cloud of the plant to obtain a labeled point cloud;
s4: and constructing a multi-head two-stage attention three-dimensional point cloud semantic segmentation network, taking the marked point cloud as input, and performing semantic label prediction by the three-dimensional point cloud semantic segmentation network to finish the segmentation of the three-dimensional point cloud semantic.
In the scheme, the image acquisition platform consists of a camera shooting shed, a support, a white circular turntable and a camera; wherein: camera shooting shed (0.8)m×0.8m) The LED lamp tube device comprises a white background plate with the same size and 3 LED lamp tube devices for supplementing light when light is insufficient; the bracket is used for fixing the position of the camera; the white circular turntable is used for preventing a three-dimensional target plant to be reconstructed, and the plant rotates on a specific motion track along with the turntable; the camera is used for shooting and collecting plantsAnd 2D sequence images are taken, in order to ensure the shooting stability, the camera is set to be in an automatic focusing mode in the whole shooting process, and meanwhile, all parameters of the camera are kept unchanged. The scheme builds a set of platform for automatically acquiring 2D sequence images of plants, three-dimensional reconstruction is carried out on the 2D sequence images which are captured by a plurality of viewpoints and are not needed, environmental constraints are small, self-correction is achieved, the requirement on the camera is low, the method is different from a Kinect 3D camera, only a common RGB camera is needed, and robustness is high.
In the scheme, the semantic segmentation network is constructed, so that a deep learning model for directly processing disordered 3D point cloud from end to end based on data driving is provided in the agricultural field, and organ-level segmentation can be automatically and efficiently performed on the plant three-dimensional point cloud.
Wherein, the step S2 specifically includes the following steps:
s21: extracting feature points from the plant 2D sequence images by adopting a Scale Invariant Feature Transform (SIFT) operator, establishing a K-dimensional space binary tree model by using a nearest field search algorithm, and calculating Euclidean distance between the feature points of every two plant 2D sequence images through the K-dimensional space binary tree model to perform stereo matching of the feature points to obtain matching points;
s22: solving the camera attitude by adopting a consistency algorithm, screening the matching points, and providing wrong or outlier matching points;
s23: based on the obtained camera pose, recovering three-dimensional point coordinates corresponding to the matching points by using a triangulation algorithm, and then performing iterative optimization on the camera pose and the three-dimensional point coordinates by using a beam adjustment algorithm to obtain a sparse point cloud;
s24: and expanding pixels around the feature points of the sparse point cloud by adopting CMVS (Cluster Multi View selector) and PMVS (batch-based Multi View selector) algorithms to form dense point cloud, namely obtaining the 3D point cloud of the plant.
In the scheme, the CMVS algorithm is a multi-view three-dimensional clustering algorithm which can cluster the hash images and reduce the data volume of the images; the PMVS algorithm is a multi-view stereo vision algorithm and can generate dense point cloud through three steps of matching, expanding and filtering.
In the scheme, in the point cloud obtaining process, noise is easily generated under the influences of environmental factors, acquisition equipment, artificial disturbance and the like, and deviation is generated on the segmentation and measurement results. Therefore, the acquired 3D point cloud needs to be preprocessed, and the accuracy of subsequent prediction is improved. Therefore, in the step S3, the preprocessing specifically includes a background removal process and a point cloud filtering process; wherein:
in the background removing process, the color features of the 3D point cloud of the plant obtained in the step S2 are taken as different bases, a method based on a color threshold value is used, a green plant point cloud and a red paper point cloud are extracted by utilizing an RGB channel, and irrelevant background parts of the point cloud are removed; the color threshold is specifically set as: G-B is more than or equal to 15 or R-B is more than or equal to 15 and R is more than or equal to 120 and R-G is more than or equal to 45 and R-B is more than or equal to 45; then, in the point cloud filtering process, filtering the point cloud with the irrelevant background part removed by using the statistical outlierremove in the PCL point cloud library, and the specific process is as follows:
traversing all point data in the point cloud, and calculating the average distance between each point and K nearest neighbor points of each point; then calculate the mean of all average distances
Figure 285682DEST_PATH_IMAGE001
And standard deviation of
Figure 757115DEST_PATH_IMAGE002
(ii) a It is assumed here that there is a normal distribution whose shape is given by the mean
Figure 749342DEST_PATH_IMAGE001
And standard deviation of
Figure 648028DEST_PATH_IMAGE002
Determining, thus defining as outliers, points whose average distance exceeds the threshold, the distance thresholdthresholdThe specific calculation formula is as follows:
Figure 358495DEST_PATH_IMAGE003
wherein the content of the first and second substances,
Figure 882886DEST_PATH_IMAGE004
is a constant; finally, traversing all the point clouds again, and removing the point clouds with the average distance between the point clouds and the K adjacent points larger than the distance threshold valuethresholdPoint (2) of (c).
In the step S3, the manual labeling specifically includes:
inputting point cloud needing manual marking
Figure 995198DEST_PATH_IMAGE005
In software, a segmentation tool is adopted to carry out a semantic segmentation and labeling process on the point cloud, and each point in the point cloud is assigned with a category label of leaves, stems or non-plants, so that manual labeling is completed.
In the above scheme, since the subsequently constructed semantic segmentation network needs to be trained before use, a data set for training the joint semantic-instance segmentation network can be obtained by preprocessing and manual labeling before actual segmentation, the data set is divided into a training set and a test set according to a ratio of 3.
In the step S4, the three-dimensional point cloud semantic segmentation network adopts an encoder-decoder structure, and specifically performs the following steps:
s41: inputting the manually marked point cloud into a semantic segmentation network, and performing dimensionality increasing operation on the point cloud by a Full Connected (FC) layer, wherein dimensionality is changed into (9 → 32);
s42: encoding the point clouds subjected to dimensionality increase through four encoders, gradually reducing the number of the point clouds and increasing the dimensionality of each point cloud; each encoder consists of a down-sampling module and a multi-head two-stage attention module, the point cloud is down-sampled at four times of sampling rate, each layer only retains 25% of point characteristics, namely the cardinality of the generated point cloud is changed intoN→N/4→N/16→N/64→N/256,NThe number of representative points; meanwhile, the multi-head two-stage attention module is used for acquiring the geometric features of the plant point cloud in a layered mode, the feature dimension of each layer is gradually increased to reserve more information, namely the feature transformation is 32 → 64 → 128 → 256 → 512;
s43: after passing through the encoder, four decoders are used to restore the point number of the point cloud intoN(ii) a For each layer in the decoder, an up-sampling module and a multi-layer perceptron are included; the up-sampling module firstly uses a KNN algorithm to inquire K nearest neighbor points of each point cloud, and then up-samples the point cloud through a nearest neighbor interpolation algorithm; then, splicing the up-sampled features and the intermediate features generated by the corresponding encoders through jump connection to obtain a fused feature map; finally, inputting the fused feature map into a multilayer perceptron to obtain output;
s44: with three shared fully connected layers, namely dimension change to (N, 128) → (N, 32) → (N, C), and a random loss rate set applied after the first fully connected layer
Figure 799206DEST_PATH_IMAGE006
The drop layer(s) of (a),
Figure 262549DEST_PATH_IMAGE006
is a natural number less than 1; the output of the semantic segmentation network isN×CThe semantic tag of (1), whereinCRepresenting the number of categories.
In the scheme, the down-sampling module can perform down-sampling on the input point cloud, so that the number of the point clouds is reduced, and the dimension of the characteristic of each point is increased. According to the scheme, farthest point sampling is used, farthest points from an existing sampling point set are selected continuously and iteratively to obtain sampled point clouds, then ball query is used for each point, a radius is fixed to find all the points in the radius, a plurality of local areas are generated, each local area passes through a multilayer perceptron, and finally maximum pooling is carried out, so that the obtained characteristic is the output of down sampling.
In the scheme, in order to extract deep semantic features of point cloud, the scheme designs a multi-head two-stage attention module: local attention and global attention modules. The local attention module uses ball query to search local areas, focusing on important neighbor features and then giving them more weight. The global attention module focuses on feature dependencies of all points based on a self-attention mechanism. And finally, splicing the output characteristics of the local attention module and the global attention module to obtain the output of the whole attention module. In addition, a multi-attention mechanism is introduced in order to obtain more complete feature information and improve the generalization capability of the network.
In the above scheme, for dense tasks such as semantic segmentation and the like which classify each point in the point cloud, the point set needs to be up-sampled and restored to the original point number. Similar to the deconvolution operation in a convolutional neural network, the point cloud is upsampled using an upsampling module to transition features from the shape level to the point level. And finding K adjacent points for each point by using a KNN algorithm, and interpolating in a three-dimensional space based on Euclidean distances between the point and the K adjacent points.
The scheme also provides a three-dimensional point cloud semantic segmentation system based on multi-head two-stage attention, which comprises an image acquisition platform, a three-dimensional reconstruction module, a preprocessing module, a manual labeling module and a semantic segmentation network construction module; wherein:
the image acquisition platform acquires a high-precision and multi-angle plant 2D sequence image through a camera;
the three-dimensional reconstruction module is used for performing three-dimensional reconstruction according to the collected 2D sequence images of the plants to obtain 3D point clouds of the plants;
the preprocessing module is used for preprocessing the 3D point cloud of the plant;
the manual marking module is used for manually marking the preprocessed point cloud;
the semantic segmentation network construction module is used for constructing a semantic segmentation network, the marked point cloud is used as input, and the semantic label prediction is carried out by the semantic segmentation network to obtain the segmentation result of the plant organ level.
The three-dimensional reconstruction module comprises a matching point acquisition unit, a matching point screening unit, a sparse point cloud acquisition unit and a dense point cloud acquisition unit; wherein:
the matching point obtaining unit adopts a scale invariant local feature transform (SIFT) operator to extract feature points from the plant 2D sequence images, and establishes a K-dimensional space binary tree model to calculate the Euclidean distance between the feature points of every two plant 2D sequence images by using a nearest neighbor search algorithm so as to carry out three-dimensional matching on the feature points, so as to obtain matching points;
the matching point screening unit adopts a consistency algorithm to solve the camera attitude, screens the matching points and eliminates wrong or outlier matching points;
the sparse point cloud obtaining unit recovers three-dimensional point coordinates corresponding to the matching points by using a triangulation algorithm based on the obtained camera pose, and then performs iterative optimization on the camera pose and the three-dimensional point coordinates by using a light beam adjustment algorithm to obtain a sparse point cloud;
and the dense point cloud obtaining unit adopts CMVS and PMVS algorithms to expand the pixels around the feature points of the sparse point cloud to form dense point cloud, namely obtaining the 3D point cloud of the plant.
The preprocessing module comprises a background removing unit and a point cloud filtering unit; wherein:
in a background removing unit, extracting green plant point cloud and red paper point cloud from 3D point cloud of plants by using RGB (red, green and blue) channels by using a method based on color threshold values according to different color characteristics, and removing irrelevant background parts of the point cloud;
in the point cloud filtering unit, filtering the point cloud with the irrelevant background part removed by using a statistical outlierRemoval in a PCL point cloud library, wherein the specific process is as follows:
traversing all point data in the point cloud, and calculating the average distance between each point and K nearest neighbor points of each point; then calculate the mean of all average distances
Figure 810205DEST_PATH_IMAGE001
Sum standard deviation
Figure 29221DEST_PATH_IMAGE002
(ii) a It is assumed here thatA normal distribution whose shape is defined by the mean
Figure 269709DEST_PATH_IMAGE001
And standard deviation of
Figure 220348DEST_PATH_IMAGE002
Determining, and thus defining, as outliers the points whose average distance exceeds a threshold value, by which the points to be filtered out can be determined, the distance threshold valuethresholdThe specific calculation formula is as follows:
Figure 571695DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 658600DEST_PATH_IMAGE008
is a constant; finally, traversing the point cloud again, and removing the points with the average distance between the points and the K adjacent points larger than the threshold valuethresholdPoint (2) of (c).
Wherein the manual labeling module adopts
Figure 319257DEST_PATH_IMAGE005
The software implementation specifically comprises:
inputting point cloud needing manual markingcloudcompareIn software, a segmentation tool is adopted to carry out a semantic segmentation and labeling process on the point clouds, and one type of label of leaves, stems or non-plants is distributed to each point cloud, so that manual labeling is completed.
The semantic segmentation network constructed by the semantic segmentation network construction module specifically executes the following operations: predicting the category labels of the points in the point cloud to complete semantic label prediction, wherein:
the three-dimensional point cloud semantic segmentation network adopts an encoder-decoder structure and specifically executes the following steps:
inputting the manually marked point cloud into a semantic segmentation network, and performing a 32 → 64 dimensionality operation on the point cloud by a full Connected layer (FC);
encoding the point clouds subjected to dimensionality increase through four encoders, gradually reducing the number of the point clouds and increasing the dimensionality of each point cloud; each encoder consists of a down-sampling module and a multi-head two-stage attention module, the point cloud is down-sampled at four times of sampling rate, each layer only retains 25% of point characteristics, namely the cardinality of the generated point cloud is changed intoN→N/4→N/16→N/64→N/256,NThe number of representative points; meanwhile, the multi-head two-stage attention module is used for acquiring the geometric features of the plant point cloud in a layered mode, the feature dimension of each layer is gradually increased to keep more information, namely, the feature is transformed into 32 → 64 → 128 → 256 → 512;
after passing through the encoder, four decoders are used to restore the point number of the point cloud toN(ii) a The method comprises the steps that an upsampling module and a multi-layer perceptron are contained for each layer in a decoder; the up-sampling module firstly uses a KNN algorithm to inquire K nearest neighbor points of each point cloud, and then up-samples the point cloud through a nearest neighbor interpolation algorithm; then, splicing the up-sampled features and the intermediate features generated by the corresponding encoders through jump connection to obtain a fused feature map; finally, inputting the fused feature map into a multilayer perceptron to obtain output;
the output of a dropout layer semantic segmentation network with three shared FC layers, namely dimension change of (N, 128) → (N, 32) → (N, C), and a random loss rate set to 0.5 applied after the first fully connected layer isN×CThe semantic tag of (1), whereinCRepresenting the number of categories.
Compared with the prior art, the technical scheme of the invention has the beneficial effects that:
the invention provides a three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention, and provides a deep learning model for directly processing disordered 3D point cloud from end to end based on data drive in the agricultural field by constructing a semantic segmentation network, so that organ-level segmentation can be automatically and efficiently carried out on plant three-dimensional point cloud.
Drawings
FIG. 1 is a schematic flow chart of the method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of the step S2 according to an embodiment of the present invention;
FIG. 3 is a diagram of an image acquisition platform constructed in the step S1 according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of the manual labeling in the step S3 according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a semantic segmentation network according to an embodiment of the present invention;
FIG. 6 is a diagram of a multi-head two-level attention module in the semantic segmentation network according to an embodiment of the present invention;
fig. 7 is a visualization diagram of the plant three-dimensional point cloud segmentation result in an embodiment of the present invention.
Detailed Description
The drawings are for illustrative purposes only and are not to be construed as limiting the patent;
the embodiment is a complete use example and has rich content
For the purpose of better illustrating the embodiments, certain features of the drawings may be omitted, enlarged or reduced, and do not represent the size of an actual product;
it will be understood by those skilled in the art that certain well-known structures in the drawings and descriptions thereof may be omitted.
The technical solution of the present invention is further described below with reference to the accompanying drawings and examples.
Example 1
As shown in fig. 1, a three-dimensional point cloud semantic segmentation method based on multi-head two-stage attention includes the following steps:
s1: constructing an image acquisition platform, and acquiring a high-precision and multi-angle plant 2D sequence image through a camera;
s2: performing three-dimensional reconstruction according to the collected 2D sequence image of the plant to obtain a 3D point cloud of the plant;
s3: preprocessing and manually marking the 3D point cloud of the plant;
s4: and constructing a semantic segmentation network, taking the marked point cloud as input, and performing semantic label prediction by the semantic segmentation network to obtain a segmentation result of the plant organs.
In a specific implementation process, as shown in fig. 3, the image acquisition platform consists of a camera shed, a support, a white circular turntable and a camera; wherein: camera shooting shed (0.8)m
Figure DEST_PATH_IMAGE009
0.8m) The LED lamp tube device comprises a white background plate with the same size and 3 LED lamp tube devices for supplementing light when light is insufficient; the bracket is used for fixing the position of the camera; the white circular turntable is used for preventing a three-dimensional target plant to be reconstructed, and the plant rotates on a specific motion track along with the turntable; the camera is used for shooing the plant, gathers plant 2D sequence image, and in order to guarantee the stability of shooing, whole shooting in-process all sets up the camera and is the auto focus mode, guarantees simultaneously that each parameter of camera keeps unchangeable. A set of platform that is used for automatic acquisition plant 2D sequence image is built to this scheme, carries out three-dimensional reconstruction with what a plurality of viewpoints captured need not 2D sequence image, receives that environmental constraint is less and possess the self-correction, and is lower to the requirement of camera, different with this kind of 3D camera of Kinect, only need ordinary RGB camera can, the robustness is stronger.
In the specific implementation process, the method provides a deep learning model for directly processing unordered 3D point clouds from end to end based on data driving by constructing a semantic segmentation network in the agricultural field, and can automatically and efficiently perform organ-level segmentation on the plant three-dimensional point clouds.
More specifically, as shown in fig. 2, the step S2 specifically includes the following steps:
s21: extracting feature points from the plant 2D sequence images by adopting a Scale Invariant Feature Transform (SIFT) operator, then establishing a K-dimensional space binary tree model by using a nearest neighbor search algorithm to calculate the Euclidean distance between the feature points of every two plant 2D sequence images so as to perform stereo matching on the feature points, and obtaining matching points;
s22: solving the camera attitude by adopting a consistency algorithm, screening the matching points, and eliminating wrong or outlier matching points;
s23: based on the obtained camera pose, recovering three-dimensional point coordinates corresponding to the matching points by using a triangulation algorithm, and performing iterative optimization on the camera pose and the three-dimensional point coordinates by using a beam adjustment algorithm to obtain sparse point cloud;
s24: and expanding pixels around the feature points of the sparse point cloud by adopting CMVS (Cluster Multi View selector) and PMVS (batch-based Multi View selector) algorithms to form dense point cloud, namely obtaining the 3D point cloud of the plant.
In a specific implementation process, the CMVS algorithm is a multi-view three-dimensional clustering algorithm, and can cluster hash images to reduce the data volume of the images; the PMVS algorithm is a multi-view stereo vision algorithm and can generate dense point cloud through three steps of matching, expanding and filtering.
In the specific implementation process, due to the fact that in the point cloud obtaining process, noise is easily generated under the influence of environmental factors, collecting equipment, artificial disturbance and the like, and deviation is generated on the segmentation and measurement results. Therefore, the obtained 3D point cloud needs to be preprocessed to improve the accuracy of subsequent prediction. More specifically, in the step S3, the preprocessing specifically includes a background removal process and a point cloud filtering process; wherein:
in the background removing process, the 3D point cloud of the plant obtained in the step S2 is extracted by using RGB channels by taking color characteristics as different bases and using a method based on a color threshold value, and irrelevant red paper background parts of the point cloud are removed; the color threshold is specifically set as: G-B is more than or equal to 15 or R-B is more than or equal to 15 and R is more than or equal to 120 and R-G is more than or equal to 45 and R-B is more than or equal to 45;
in the point cloud filtering unit, a statistical outlierremove filter in a PCL point cloud library is used to filter the point cloud obtained in the last step, and the implementation principle is as follows: assuming it is a normal distribution, the shape is determined by the mean
Figure 960454DEST_PATH_IMAGE001
And standard deviation of
Figure 849912DEST_PATH_IMAGE002
The decision, and hence the outlier, is defined as the point where the average distance exceeds the threshold. Traversing all points in the point cloudCalculating the average distance between each point and the nearest K neighboring points; then calculate the mean of all average distances
Figure 791324DEST_PATH_IMAGE001
And standard deviation of
Figure 622882DEST_PATH_IMAGE002
Distance thresholdthresholdThe specific calculation formula is as follows:
Figure 16954DEST_PATH_IMAGE010
wherein the content of the first and second substances,
Figure 975683DEST_PATH_IMAGE011
is a constant; finally traversing all point clouds again, removing the average distance between the point clouds and the K adjacent points, wherein the average distance is larger than the distance threshold valuethresholdA point of (c).
More specifically, in the step S3, the manual labeling is specifically adoptedcloudcompareThe software implementation specifically comprises:
inputting point cloud needing manual markingcloudcompareIn software, a segmentation tool is adopted to carry out semantic segmentation and labeling process on point clouds, and a category label in three categories of leaves (Leaf), stems (Stem) or Non-plants (Non-plant) is distributed to each point cloud; thereby completing manual labeling, as shown in fig. 4, the numbers in the figure are used as labels.
In a specific implementation process, because a subsequently constructed semantic segmentation network needs to be trained first for use, a data set for training the semantic segmentation network can be obtained through preprocessing and manual labeling before actual segmentation, the data set is divided into a training set and a test set according to the proportion of 3.
More specifically, in step S4, the semantic segmentation network specifically performs the following operations: predicting the category labels of the points in the point cloud to complete semantic label prediction; wherein:
the semantic segmentation network adopts an encoder-decoder structure, as shown in fig. 5, in which the MLP is a multi-layer perceptron, and specifically performs the following steps:
s41: inputting the manually marked point cloud into a combined semantic-instance segmentation network, and performing dimension-increasing operation on the point cloud by a multilayer perceptron; if the input point cloud dimension is N × 9, N represents the number of points, in this embodiment, 8092,9 is taken to represent the characteristic dimension of each point, and the point cloud dimension is raised from 9 to 32 by using the multilayer perceptron;
s42: encoding the point clouds after the dimension increasing through four encoders, gradually reducing the number of the point clouds and increasing the dimension of each point cloud; each encoder is composed of a down-sampling module and an attention module, the point cloud is down-sampled at four times of sampling rate, namely, only 25% of point features are reserved after each layer, namely, the cardinality of the generated point cloud is changed from N to N/256 (N → N/4 → N/16 → N/64 → N/256), and meanwhile, the feature dimension of each layer is gradually increased to reserve more information, namely, the feature dimension is changed from M to M2 4 M represents the dimension of the point cloud after the dimension increase in step S41, and is [32 → 64 → 128 → 256 → 512 in the embodiment];
S43: recovering the point number of the point cloud to N by using four decoders after passing through the encoder; for each layer in a decoder comprising an up-sampling module and a multilayer perceptron, firstly, a nearest neighbor point of each point cloud is inquired by using a KNN algorithm, and then the point cloud is up-sampled by nearest neighbor interpolation; then, splicing the up-sampled features and the intermediate features generated by the corresponding encoders through jump connection to obtain a fused feature map; then inputting the fused feature map into a multilayer perceptron to obtain output;
s44: through three shared FC layers, namely, (N, 128) → (N, 32) → (N, C), and after the first FC layer, a dropout layer with a random loss rate set to 0.5 is applied. The output of the network is the semantic label prediction result of NxC, wherein C represents the category number.
In a specific implementation process, the down-sampling module can perform down-sampling on the input point clouds, so that the number of the point clouds is reduced, and the feature of each point is subjected to dimension increasing. According to the scheme, farthest point sampling is used, farthest points from an existing sampling point set are selected continuously and iteratively to obtain sampled point clouds, then a ball query algorithm is used for fixing a radius for each point cloud to find all points in the radius, a plurality of local areas are generated, each local area passes through a multilayer perceptron, and finally maximum pooling is carried out, so that the obtained characteristic is the output of downsampling.
In a specific implementation process, in order to extract deep semantic features of point cloud, a multi-head two-stage attention module is designed in the embodiment: the local attention and global attention modules are shown in fig. 6. The local attention module uses ball query to search for local regions, focusing on important neighbor features and then giving them more weight. The global attention module focuses on feature dependencies of all points based on a self-attention mechanism. And finally, splicing the output characteristics of the local attention module and the global attention module to obtain the output of the whole attention module. In addition, a multi-point attention mechanism is introduced to obtain more complete characteristic information and improve the generalization capability of the network.
First, a position embedding module is introduced that explicitly expresses spatial position coding as:
Figure 771601DEST_PATH_IMAGE012
then, the point features corresponding to the point features are spliced as follows:
Figure 790372DEST_PATH_IMAGE013
wherein
Figure 921008DEST_PATH_IMAGE014
And
Figure 152270DEST_PATH_IMAGE015
respectively representing points
Figure 802694DEST_PATH_IMAGE016
Coordinates, | | | | represents the euclidean distance between the neighboring points and the center point, | | | denotes the stitching operation of the feature. Then, instead of max/average pooling, a powerful attention mechanism is used, which often results in most of the information being lost, thereby automatically aggregating the feature information. In detail, the present embodiment uses a function
Figure 992367DEST_PATH_IMAGE017
To learn attention scores and to weight and sum these features, there are:
Figure 626610DEST_PATH_IMAGE018
wherein the function
Figure 913760DEST_PATH_IMAGE017
For a shared multi-layered perceptron mlp, W represents a learnable weight.
After the local features are aggregated, a global attention module is designed for updating the global features based on self-attention and calculating attention scores of all points by using matrix dot products. Is provided with
Figure 418690DEST_PATH_IMAGE019
Figure 513685DEST_PATH_IMAGE020
Figure 369646DEST_PATH_IMAGE021
Respectively representing queries, keys and value matrices, generated from input feature linear mappings. First, attention weights are determined using dot products of matrices
Figure 191977DEST_PATH_IMAGE022
The attention score is then normalized using the softmax function. To enhance the normalization effect, the present embodiment uses the L1 norm.
Figure 816994DEST_PATH_IMAGE023
Figure 817311DEST_PATH_IMAGE024
The feature Fs is obtained by multiplying the normalized attention by the value vector. On this basis, element subtraction is introduced to calculate the offset between the input features and the element subtraction. The function g () carries a nonlinear ReLU layer behind two shared multi-layer perceptrons.
Figure 160567DEST_PATH_IMAGE025
In addition, a multi-head attention mechanism is introduced to obtain more complete information, the generalization capability of the network is further improved,
Figure 68480DEST_PATH_IMAGE026
represents a point ofmThe number of attention heads is M, and 4 attention heads are provided in this embodiment.
Figure 797271DEST_PATH_IMAGE027
In a specific implementation process, for dense tasks such as semantic segmentation and the like for classifying each point in the point cloud, the point set needs to be up-sampled and restored to the original point number. Similar to the deconvolution operation in a convolutional neural network, the point cloud is upsampled using an upsampling module to transition features from the shape level to the point level. Using KNN algorithm for each pointpFind itkProximity of points, based on points in three-dimensional spacepAnd itkInterpolation is carried out on Euclidean distances of adjacent points, and the specific calculation formula is as follows:
Figure 234069DEST_PATH_IMAGE028
Figure 64621DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 979488DEST_PATH_IMAGE030
is thatpIs determined by the point of the neighborhood of the point,
Figure 844675DEST_PATH_IMAGE031
representative heel pointpAnd
Figure 701642DEST_PATH_IMAGE030
the distance therebetween is inversely proportional to the weight, and therefore,
Figure 19491DEST_PATH_IMAGE030
separation devicepThe further away this weight is;
Figure 738048DEST_PATH_IMAGE032
representative pointpAnd with
Figure 192163DEST_PATH_IMAGE030
The distance function between them, in this embodiment, the euclidean distance is used.
Example 2
The scheme also provides a three-dimensional point cloud semantic segmentation system based on multi-head two-stage attention, as shown in FIG. 4, the three-dimensional point cloud semantic segmentation system based on multi-head two-stage attention is used for realizing the three-dimensional point cloud semantic segmentation method based on multi-head two-stage attention, and comprises an image acquisition platform, a three-dimensional reconstruction module, a preprocessing module, an artificial labeling module and a semantic segmentation network construction module; wherein:
the image acquisition platform acquires 2D sequence images of plants with high precision and multiple angles through a camera;
the three-dimensional reconstruction module is used for performing three-dimensional reconstruction according to the acquired 2D sequence images of the plants to obtain 3D point clouds of the plants;
the preprocessing module is used for preprocessing the 3D point cloud of the plant;
the manual marking module is used for manually marking the preprocessed point cloud;
the semantic segmentation network construction module is used for constructing a semantic segmentation network, the marked point cloud is used as input, and semantic label prediction is carried out by the semantic segmentation network to obtain the segmentation result of the plant organs.
More specifically, the three-dimensional reconstruction module comprises a matching point acquisition unit, a matching point screening unit, a sparse point cloud acquisition unit and a dense point cloud acquisition unit; wherein:
the matching point obtaining unit adopts a scale-invariant local feature description operator to extract feature points from the plant 2D sequence images, and establishes a K-dimensional space binary tree model to calculate the Euclidean distance between the feature points of every two plant 2D sequence images by using a nearest neighbor search algorithm so as to perform three-dimensional matching on the feature points, so as to obtain matching points;
the matching point screening unit adopts a consistency algorithm to solve the camera attitude, screens the matching points and eliminates wrong or outlier matching points;
the sparse point cloud obtaining unit recovers three-dimensional point coordinates corresponding to the matching points by using a triangulation algorithm based on the obtained camera pose, and then performs iterative optimization on the camera pose and the three-dimensional point coordinates by using a light beam adjustment algorithm to obtain a sparse point cloud;
and the dense point cloud obtaining unit adopts CMVS and PMVS algorithms to expand the pixels around the feature points of the sparse point cloud to form dense point cloud, namely obtaining the 3D point cloud of the plant.
More specifically, the preprocessing module comprises a background removing unit and a point cloud filtering unit; wherein:
in a background removing unit, extracting green plant point cloud and red paper background by using RGB (red, green and blue) channels according to different color characteristics of the 3D point cloud of the plant by using a method based on a color threshold value, and removing irrelevant red paper background parts of the point cloud;
in a point cloud filtering unit, a stateticalOutlierRemoval filter in a PCL point cloud library is used for filtering the point cloud obtained in the last step, so that the purpose of filtering the point cloud is achievedThe principle is as follows: assuming it is a normal distribution, the shape is determined by the mean
Figure 236343DEST_PATH_IMAGE033
And standard deviation of
Figure 16387DEST_PATH_IMAGE034
The decision, and hence the outlier, is defined as the point where the average distance exceeds the threshold. Traversing all point data in the point cloud, and calculating the average distance between each point and K nearest neighbor points of each point; then calculate the mean of all average distances
Figure 804214DEST_PATH_IMAGE033
Sum standard deviation
Figure 112836DEST_PATH_IMAGE034
Then distance threshold valuethresholdThe specific calculation formula is as follows:
Figure 327917DEST_PATH_IMAGE035
wherein, the first and the second end of the pipe are connected with each other,
Figure 89199DEST_PATH_IMAGE036
is a constant.
Finally, traversing the point cloud again, and removing the point cloud with the average distance between the point cloud and the K adjacent points larger than the threshold valuethresholdA point of (c).
More specifically, the manual labeling module adoptscloudcompareThe software implementation specifically comprises:
inputting point cloud needing manual markingcloudcompareIn software, a segmentation tool is adopted to carry out semantic segmentation and labeling process on the point clouds, and one category label of leaves, stems or non-plants is distributed to each point cloud, so that manual labeling is completed.
More specifically, the semantic segmentation network constructed by the semantic segmentation network construction module specifically comprises the following operations: predicting the category labels of the points in the point cloud to complete semantic label prediction; wherein:
the semantic segmentation network adopts an encoder-decoder structure, and specifically comprises the following steps:
performing 32 → 64 ascending operation on the point cloud by a Fully Connected layer (FC); then, four encoders are used for encoding, the number of the point clouds is gradually reduced, and the dimensionality of each point cloud is increased; each encoder consists of a down-sampling module and attention module, the point cloud is down-sampled at four times the sampling rate, i.e. only 25% of the point features remain after each layer, i.e. the cardinality variation of the resulting point cloud is (N → N/4 → N/16 → N/64 → N/256), while the feature dimension of each layer is gradually increased to retain more information (32 → 64 → 128 → 256 → 512);
then, decoding the encoded point cloud through four encoders, and restoring the point number of the point cloud to N; for each layer in a decoder comprising an up-sampling module and a multi-layer perceptron, firstly, a KNN algorithm is used for inquiring K nearest neighbor points of each point cloud, and then the point cloud is up-sampled through nearest neighbor interpolation; then, splicing the up-sampled features and intermediate features generated by corresponding encoders through jump connection to obtain a fusion feature map; the fused feature map is then input to a multi-tier perceptron.
Finally, the output semantic tag prediction result is obtained through three shared FC layers, namely dimension change is (N, 128) → (N, 32) → (N, C), and a dropout layer with a random loss rate set to 0.5 is applied after the first FC layer.
In the specific implementation process, the system is used for realizing the point cloud segmentation method, and is simple to realize, convenient to operate and easy to apply and popularize in reality.
Example 3
More specifically, in order to further explain the technical effects of the present solution, the present embodiment will explain the solution in more detail.
In this embodiment, the segmentation result of the proposed point cloud semantic segmentation method is evaluated from a point-wise level, a common semantic segmentation evaluation index is an average accuracy (mAcc) and a cross-over ratio (mlou) for segmenting each category, and a calculation formula is:
Figure 929985DEST_PATH_IMAGE037
wherein
Figure 93113DEST_PATH_IMAGE038
The number of the representative categories is,
Figure 479095DEST_PATH_IMAGE039
representation of belonging toiClass is predicted asjThe point of the class is determined by the point of the class,
Figure 993253DEST_PATH_IMAGE040
is represented as belonging tojClass is predicted asiThe point of class, both of which are cases of classification errors,
Figure 122883DEST_PATH_IMAGE041
indicating correctly classified points.
The final experimental result of this embodiment is that the semantic segmentation of the 3D plant point cloud aims to segment the point into three categories, namely Leaf, stem, and Non-plant. The semantic segmentation results are shown in table 1, and the visualization of the segmentation results is shown in fig. 7. In addition, the method provided by the embodiment is compared with the performance of other advanced deep neural networks.
TABLE 1 semantic segmentation results Table
Figure 389785DEST_PATH_IMAGE042
The columns of the table correspond to the experimental results of four segmentation models, and it can be noted in table 1 that in all the models, the segmentation precision of the stem is lower than that of the other two types, because the stem is a weak plant organ, and the average number of the stem leaves is only a small part of the leaf number of each point cloud model compared with the average number of the leaf leaves. Since the number of stem points is small, the number of points for which accurate prediction is performed is small, and each point for which prediction is incorrect has a great influence on the accuracy of stem division. It can be seen that the present embodiment obtains the best segmentation performance with 99.17% OA, 95.62% pacc and 93.62% mlou, and has certain advantages in local information perception capability and segmentation accuracy compared with other mainstream algorithms. In addition, the present embodiment uses other advanced deep learning models as the backbone network: 1) PointNet, 2) PointNet + +, 3) DGCNN, 4) ShellNet, 5) PointWeb; the semantic segmentation performance was then explored and compared to the method herein. It can be seen from the table that the present solution can achieve the highest segmentation accuracy in all 3 categories of the data set.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention and are not intended to limit the embodiments of the present invention. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. This need not be, nor should it be exhaustive of all embodiments. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (8)

1. A three-dimensional point cloud semantic segmentation method based on multi-head two-stage attention is characterized by comprising the following steps:
s1: constructing an image acquisition platform, and acquiring a high-precision and multi-angle plant 2D sequence image through a camera;
s2: performing three-dimensional reconstruction according to the acquired 2D sequence image of the plant to obtain a 3D point cloud of the plant;
s3: preprocessing and manually marking the 3D point cloud of the plant to obtain marked point cloud;
s4: constructing a multi-head two-stage attention three-dimensional point cloud semantic segmentation network, taking the marked point cloud as input, and performing semantic label prediction by the three-dimensional point cloud semantic segmentation network to finish the segmentation of three-dimensional point cloud semantics;
in the step S4, the three-dimensional point cloud semantic segmentation network adopts an encoder-decoder structure, and specifically performs the following steps:
s41: inputting the manually marked point cloud into a semantic segmentation network, and performing dimensionality improvement operation on the point cloud by a complete connection layer;
s42: encoding the point clouds subjected to dimensionality increase through four encoders, gradually reducing the number of the point clouds and increasing the dimensionality of each point cloud; each encoder consists of a down-sampling module and a multi-head two-stage attention module, the point cloud is down-sampled at four times of sampling rate, each layer only retains 25% of point characteristics, namely the cardinality of the generated point cloud is changed intoN→N/4→N/16→N/64→N/256,NThe number of representative points; meanwhile, the multi-head two-stage attention module is used for acquiring the geometric features of the plant point cloud in a layered mode, the feature dimension of each layer is gradually increased to keep more information, namely, the feature is transformed into 32 → 64 → 128 → 256 → 512;
s43: after passing through the encoder, four decoders are used to restore the point number of the point cloud toN(ii) a For each layer in the decoder, an up-sampling module and a multi-layer perceptron are included; the up-sampling module firstly uses a KNN algorithm to inquire K nearest neighbor points of each point cloud, and then up-samples the point cloud through a nearest neighbor interpolation algorithm; then, splicing the up-sampled features and the intermediate features generated by the corresponding encoders through jump connection to obtain a fused feature map; finally, inputting the fused feature map into a multilayer perceptron to obtain output;
s44: with three shared fully connected layers, dimension change is (N, 128) → (N, 32) → (N, C), and a random loss rate set is applied after the first fully connected layer
Figure 33663DEST_PATH_IMAGE001
The drop layer(s) of (a),
Figure 112478DEST_PATH_IMAGE001
is a natural number less than 1; the output of the semantic segmentation network isN×CThe semantic tag of (1), whereinCRepresenting the number of categories.
2. The method for semantically segmenting the three-dimensional point cloud based on the multi-head two-stage attention according to claim 1, wherein the step S2 specifically comprises the following steps:
s21: extracting feature points from the plant 2D sequence images by adopting a scale-invariant feature transformation operator, establishing a K-dimensional space binary tree model by using a nearest field search algorithm, and calculating Euclidean distance between the feature points of every two plant 2D sequence images through the K-dimensional space binary tree model to perform three-dimensional matching on the feature points to obtain matching points;
s22: solving the camera attitude by adopting a consistency algorithm, screening the matching points, and eliminating wrong or outlier matching points;
s23: based on the obtained camera pose, recovering three-dimensional point coordinates corresponding to the matching points by using a triangulation algorithm, and performing iterative optimization on the camera pose and the three-dimensional point coordinates by using a beam adjustment algorithm to obtain sparse point cloud;
s24: and expanding pixels around the feature points of the sparse point cloud by adopting CMVS and PMVS algorithms to form dense point cloud, namely obtaining the 3D point cloud of the plant.
3. The method for semantically segmenting the three-dimensional point cloud based on the multi-head two-stage attention according to claim 1, wherein in the step S3, the preprocessing specifically comprises a background removal process and a point cloud filtering process; wherein:
in the background removing process, the color features of the 3D point cloud of the plant obtained in the step S2 are taken as different bases, a method based on a color threshold value is used, a green plant point cloud and a red paper point cloud are extracted by utilizing an RGB channel, and irrelevant background parts of the point cloud are removed; then, in the point cloud filtering process, filtering the point cloud with the irrelevant background part removed by using the statistical outlierremove in the PCL point cloud library, and the specific process is as follows:
traversing all point data in the point cloud, and calculating the average distance between each point and K nearest neighbor points of each point; then calculate the mean of all average distances
Figure 185955DEST_PATH_IMAGE003
And standard deviation of
Figure 957602DEST_PATH_IMAGE004
(ii) a Here, it is assumed that there is a normal distribution whose shape is represented by the mean
Figure 908241DEST_PATH_IMAGE003
Sum standard deviation
Figure 853063DEST_PATH_IMAGE004
Determining, thus defining as outliers, points whose average distance exceeds the threshold, the distance thresholdshresholdThe specific calculation formula is as follows:
Figure 205547DEST_PATH_IMAGE005
wherein, the first and the second end of the pipe are connected with each other,
Figure 148095DEST_PATH_IMAGE006
is a constant; finally traversing all point clouds again, removing the average distance between the point clouds and the K adjacent points, wherein the average distance is larger than the distance threshold valueshresholdPoint (2) of (c).
4. The method for semantic segmentation of the three-dimensional point cloud based on multi-head two-stage attention according to claim 3, wherein in the step S3, the manual labeling specifically comprises:
inputting point cloud needing manual marking
Figure 648347DEST_PATH_IMAGE007
In software, a segmentation tool is adopted to carry out a semantic segmentation and labeling process on the point cloud, and each point in the point cloud is assigned with a category label of leaves, stems or non-plants, so that manual labeling is completed.
5. A three-dimensional point cloud semantic segmentation system based on multi-head two-stage attention is characterized by comprising an image acquisition platform, a three-dimensional reconstruction module, a preprocessing module, a manual labeling module and a semantic segmentation network construction module; wherein:
the image acquisition platform acquires a high-precision and multi-angle plant 2D sequence image through a camera;
the three-dimensional reconstruction module is used for performing three-dimensional reconstruction according to the collected 2D sequence images of the plants to obtain 3D point clouds of the plants;
the preprocessing module is used for preprocessing the 3D point cloud of the plant;
the manual marking module is used for manually marking the preprocessed point cloud;
the semantic segmentation network construction module is used for constructing a semantic segmentation network, the marked point cloud is used as input, and the semantic segmentation network is used for performing semantic label prediction to obtain a segmentation result of a plant organ level;
the semantic segmentation network constructed by the semantic segmentation network construction module specifically executes the following operations: predicting the category labels of the points in the point cloud to complete semantic label prediction, wherein:
the three-dimensional point cloud semantic segmentation network adopts an encoder-decoder structure and specifically executes the following steps:
inputting the manually marked point cloud into a semantic segmentation network, and performing dimension-increasing operation on the point cloud by a complete connection layer;
encoding the point clouds after the dimension increasing through four encoders, gradually reducing the number of the point clouds and increasing the dimension of each point cloud; each encoder consists of a down-sampling module and a multi-head two-stage attention module, the point cloud is down-sampled at four times of sampling rate, each layer only retains 25% of point characteristics, namely the cardinality of the generated point cloud is changed intoN→N/4→N/16→N/64→N/256,NThe number of representative points; meanwhile, the multi-head two-stage attention module is used for acquiring the geometric features of the plant point cloud in a layered mode, the feature dimension of each layer is gradually increased to reserve more information, namely the feature transformation is 32 → 64 → 128 → 256 → 512;
after passing through the encoder, four decoders are used to restore the point number of the point cloud toN(ii) a For each layer in the decoder, an up-sampling module and a multi-layer perceptron are included; the up-sampling module firstly uses KNN algorithm to inquire K nearest neighbor points of each point cloud,then, the point cloud is up-sampled through a nearest neighbor interpolation algorithm; then, splicing the up-sampled features and the intermediate features generated by the corresponding encoders through jump connection to obtain a fused feature map; finally, inputting the fused feature map into a multilayer perceptron to obtain output;
with three shared fully connected layers, namely dimension change to (N, 128) → (N, 32) → (N, C), and a random loss rate set applied after the first fully connected layer
Figure 68964DEST_PATH_IMAGE001
The drop layer of (a) is formed,
Figure 541533DEST_PATH_IMAGE001
is a natural number less than 1; the output of the semantic segmentation network isN×CThe semantic tag of (1), whereinCRepresenting the number of categories.
6. The multi-head two-stage attention-based three-dimensional point cloud semantic segmentation system according to claim 5, wherein the three-dimensional reconstruction module comprises a matching point acquisition unit, a matching point screening unit, a sparse point cloud acquisition unit and a dense point cloud acquisition unit; wherein:
the matching point obtaining unit adopts a scale-invariant local feature description operator to extract feature points from the plant 2D sequence images, and establishes a K-dimensional spatial binary tree model to calculate the Euclidean distance between the feature points of every two plant 2D sequence images by using a nearest neighbor search algorithm so as to perform stereo matching of the feature points, so as to obtain matching points;
the matching point screening unit adopts a consistency algorithm to solve the camera attitude, screens the matching points and eliminates wrong or outlier matching points;
the sparse point cloud obtaining unit recovers three-dimensional point coordinates corresponding to the matching points by using a triangulation algorithm based on the obtained camera pose, and then performs iterative optimization on the camera pose and the three-dimensional point coordinates by using a light beam adjustment algorithm to obtain a sparse point cloud;
and the dense point cloud acquisition unit adopts CMVS and PMVS algorithms to expand the pixels around the feature points of the sparse point cloud to form dense point cloud, namely obtaining the 3D point cloud of the plant.
7. The multi-head two-stage attention-based three-dimensional point cloud semantic segmentation system according to claim 5, wherein the preprocessing module comprises a background removal unit and a point cloud filtering unit; wherein:
in a background removing unit, extracting green plant point cloud and red paper point cloud from 3D point cloud of a plant by using an RGB channel by using a method based on color threshold values according to different color characteristics, and removing irrelevant background parts of the point cloud;
in the point cloud filtering unit, filtering the point cloud with the irrelevant background part removed by using a statistical outlierRemoval in a PCL point cloud library, wherein the specific process is as follows:
traversing all point data in the point cloud, and calculating the average distance between each point and K nearest neighbor points of each point; then calculate the mean of all average distances
Figure 717300DEST_PATH_IMAGE003
Sum standard deviation
Figure 642531DEST_PATH_IMAGE004
(ii) a It is assumed here that there is a normal distribution whose shape is given by the mean
Figure 866838DEST_PATH_IMAGE003
Sum standard deviation
Figure 492117DEST_PATH_IMAGE004
Determining, thus defining as outliers, points whose average distance exceeds the threshold, the distance thresholdshresholdThe specific calculation formula is as follows:
Figure 776468DEST_PATH_IMAGE005
wherein the content of the first and second substances,
Figure 251312DEST_PATH_IMAGE006
is a constant; finally, traversing all the point clouds again, and removing the point clouds with the average distance between the point clouds and the K adjacent points larger than the distance threshold valueshresholdPoint (2) of (c).
8. The system of claim 6, wherein the manual labeling module employs a multi-headed two-level attention based semantic segmentation of the three-dimensional point cloud
Figure 279310DEST_PATH_IMAGE007
The software implementation specifically comprises:
inputting point cloud needing manual markingcloudcompareIn software, a segmentation tool is adopted to carry out a semantic segmentation and labeling process on the point clouds, and one type of label of leaves, stems or non-plants is distributed to each point cloud, so that manual labeling is completed.
CN202210709918.8A 2022-06-22 2022-06-22 Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention Active CN114792372B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210709918.8A CN114792372B (en) 2022-06-22 2022-06-22 Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210709918.8A CN114792372B (en) 2022-06-22 2022-06-22 Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention

Publications (2)

Publication Number Publication Date
CN114792372A CN114792372A (en) 2022-07-26
CN114792372B true CN114792372B (en) 2022-11-04

Family

ID=82463395

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210709918.8A Active CN114792372B (en) 2022-06-22 2022-06-22 Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention

Country Status (1)

Country Link
CN (1) CN114792372B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115312119B (en) * 2022-10-09 2023-04-07 之江实验室 Method and system for identifying protein structural domain based on protein three-dimensional structure image
CN115620001B (en) * 2022-12-15 2023-04-07 长春理工大学 Visual auxiliary system based on 3D point cloud bilateral amplification algorithm
CN116704497B (en) * 2023-05-24 2024-03-26 东北农业大学 Rape phenotype parameter extraction method and system based on three-dimensional point cloud
CN116817754B (en) * 2023-08-28 2024-01-02 之江实验室 Soybean plant phenotype extraction method and system based on sparse reconstruction

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489358A (en) * 2020-03-18 2020-08-04 华中科技大学 Three-dimensional point cloud semantic segmentation method based on deep learning
CN112465889A (en) * 2021-01-04 2021-03-09 广东工业大学 Plant point cloud segmentation method, system and storage medium based on two-dimensional-three-dimensional integration
CN114022858A (en) * 2021-10-18 2022-02-08 西南大学 Semantic segmentation method, system, electronic device and medium for automatic driving

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101625767B (en) * 2008-07-10 2013-07-10 北京石油化工学院 Method for rebuilding point-cloud type three-dimensional surface of nonparallel outline medical image
JP5830004B2 (en) * 2012-12-11 2015-12-09 株式会社日立製作所 3D model generation apparatus, 3D model generation method, and 3D model generation program
CN104036550B (en) * 2014-06-25 2017-02-15 北京师范大学 Laser radar point-cloud interpretation and reconstruction method for building elevations on basis of shape semantics
JP2022508674A (en) * 2018-10-09 2022-01-19 レソナイ インコーポレイテッド Systems and methods for 3D scene expansion and reconstruction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111489358A (en) * 2020-03-18 2020-08-04 华中科技大学 Three-dimensional point cloud semantic segmentation method based on deep learning
CN112465889A (en) * 2021-01-04 2021-03-09 广东工业大学 Plant point cloud segmentation method, system and storage medium based on two-dimensional-three-dimensional integration
CN114022858A (en) * 2021-10-18 2022-02-08 西南大学 Semantic segmentation method, system, electronic device and medium for automatic driving

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于多尺度几何感知Transformer的植物点云补全网络;曾安;《农业工程学报》;20220228;第38卷(第4期);第198-205页 *

Also Published As

Publication number Publication date
CN114792372A (en) 2022-07-26

Similar Documents

Publication Publication Date Title
CN114792372B (en) Three-dimensional point cloud semantic segmentation method and system based on multi-head two-stage attention
Georgiou et al. A survey of traditional and deep learning-based feature descriptors for high dimensional data in computer vision
CN111339903B (en) Multi-person human body posture estimation method
Masone et al. A survey on deep visual place recognition
CN113330490B (en) Three-dimensional (3D) assisted personalized home object detection
Guerry et al. Snapnet-r: Consistent 3d multi-view semantic labeling for robotics
KR20200028330A (en) Systems and methods that enable continuous memory-based learning in deep learning and artificial intelligence to continuously run applications across network compute edges
Biasutti et al. Lu-net: An efficient network for 3d lidar point cloud semantic segmentation based on end-to-end-learned 3d features and u-net
Komorowski et al. Minkloc++: lidar and monocular image fusion for place recognition
CN110689021A (en) Real-time target detection method in low-visibility environment based on deep learning
CN113870160B (en) Point cloud data processing method based on transformer neural network
CN113743544A (en) Cross-modal neural network construction method, pedestrian retrieval method and system
US20230095533A1 (en) Enriched and discriminative convolutional neural network features for pedestrian re-identification and trajectory modeling
CN113312973A (en) Method and system for extracting features of gesture recognition key points
Hong et al. USOD10K: a new benchmark dataset for underwater salient object detection
Weng et al. Sgformer: A local and global features coupling network for semantic segmentation of land cover
CN113592015B (en) Method and device for positioning and training feature matching network
CN116311218A (en) Noise plant point cloud semantic segmentation method and system based on self-attention feature fusion
CN115578574A (en) Three-dimensional point cloud completion method based on deep learning and topology perception
CN114299285A (en) Three-dimensional point cloud semi-automatic labeling method and system, electronic equipment and storage medium
CN108596034B (en) Pedestrian re-identification method based on target center coding appearance model
Guo et al. Udtiri: An open-source road pothole detection benchmark suite
Bureš Sémantická segmentace v dlouhodobé vizuální lokalizaci
CN117351246B (en) Mismatching pair removing method, system and readable medium
Gupta et al. Post-Disaster Segmentation Using FloodNet

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant