CN110378909B

CN110378909B - Single wood segmentation method for laser point cloud based on Faster R-CNN

Info

Publication number: CN110378909B
Application number: CN201910551190.9A
Authority: CN
Inventors: 云挺; 陈鑫鑫; 王佳敏; 曹林
Original assignee: Nanjing Forestry University
Current assignee: Nanjing Forestry University
Priority date: 2019-06-24
Filing date: 2019-06-24
Publication date: 2023-06-06
Anticipated expiration: 2039-06-24
Also published as: CN110378909A

Abstract

The invention discloses a single-wood segmentation method facing laser point cloud based on fast R-CNN, which comprises the steps of obtaining forest stand point cloud data; calculating the point cloud characteristics of the scanned forest stand to realize branch-leaf separation of the forest stand point cloud data; performing self-adaptive voxelization operation on the main point cloud data of the forest stand, and performing multi-angle projection on the main point cloud data to generate a corresponding depth image; detecting a trunk in the generated depth image by adopting a deep learning method; and obtaining the space three-dimensional point cloud of the corresponding trunk by back projection by utilizing the detected position information of the trunk in the depth image. And taking the obtained point cloud of the trunk part as a seed point, and combining a region growing algorithm to realize single wood separation. The invention adopts a deep learning method, learns by means of big data samples, has higher single tree segmentation accuracy, and provides possibility for accurately solving the single rubber tree segmentation problem of LiDAR data based on the ground by using the deep learning.

Description

Single wood segmentation method for laser point cloud based on Faster R-CNN

Technical Field

The invention relates to the technical field of single plant separation, in particular to a laser point cloud-oriented single plant segmentation method based on Faster R-CNN.

Background

Natural rubber widely planted in tropical regions is used as an important industrial raw material and strategic material, and has increasingly outstanding roles in national economy construction. However, the rubber tree is a wild species in Amazon river basin in south America, and no wild resources exist in China, so that the rubber forest in China is mostly a pure artificial forest. Hainan is taken as the largest rubber production base in China, and has a rubber forest of approximately 800 ten thousand acres, so that the largest artificial ecological system is formed. But it is often disturbed by typhoons due to geographical location. It has been counted that over 100 typhoons have been encountered in the southwest island over the last 60 years. Typhoons occur in a short time and cause serious original damage to rubber trees, such as trunk and branch breakage, root pulling, and the like. In the global warming background, although the typhoon frequency of logging into Hainan every year tends to be reduced, the average strength of typhoons generally tends to be enhanced, which must have serious influence on the planting and production of rubber. In order to determine the wind resistance index of the rubber tree, a strong resistance variety is cultivated, and a precise rubber tree single-tree segmentation algorithm is an indispensable precondition for obtaining the structural parameters and dynamic change information of the rubber tree.

The laser radar technology, i.e. the laser detection and ranging system (Light Detection and Ranging, liDAR), provides a promising approach to obtain three-dimensional (3D) phenotypic characteristics of plants with its ability to accurately record three-dimensional laser points, which mainly works in the ultraviolet, visible, infrared, etc. band range. According to the different working platforms, laser scanning systems can be divided into five types: satellite-based laser scanning (SLS), on-board laser scanning (ALS), mobile Laser Scanning (MLS), on-board laser scanning (VLS), and Terrestrial Laser Scanning (TLS). Wherein SLS and ALS adopt a top-down scanning mode, MLS, VLS and TLS adopt a bottom-up scanning mode. The top-down scanning method can clearly scan vegetation canopy, and has great potential and advantages in the aspects of recording forest vertical structural characteristics and extracting canopy structural parameters. The bottom-up scanning method can clearly record the lower part (such as trunks and leaves) of the canopy, is more suitable for ground forest investigation work, but if the leaf area density of vegetation below the canopy is higher, the lower part data of the forest is also lost.

At present, aiming at airborne laser radar data, scholars have proposed a plurality of single tree segmentation methods, and the single tree segmentation of the forest is realized based on the geometric form features of the canopy of the forest, and the single tree segmentation methods can be roughly divided into two types: a single plant separation algorithm based on a Canopy Height Model (CHM) and directly based on a scattered point cloud. Precise individual crown segmentation using ground-based mobile laser radar is still challenging, especially for ecological forests where crown height is irregular and crossing is severe. Although some creative studies have been reported on detecting trunk position from ground-based mobile lidar data for tree segmentation, this type of approach still has two problems (1) long-term exposure to deformation of the wood composition of the tree under investigation by hurricane disasters; (2) The robustness and versatility of the single-wood segmentation model for ground-oriented mobile LiDAR data requires further investigation.

Disclosure of Invention

The invention aims to solve the technical problem of providing a single-tree segmentation method facing the laser point cloud based on the Faster R-CNN aiming at the defects of the prior art, wherein the single-tree segmentation method facing the laser point cloud based on the Faster R-CNN adopts a deep learning method, learns by means of big data samples, has higher single-tree segmentation accuracy, and provides possibility for accurately solving the single-tree segmentation problem of LiDAR data based on the ground by using deep learning.

In order to achieve the technical purpose, the invention adopts the following technical scheme:

a single wood segmentation method facing to laser point cloud based on Faster R-CNN comprises the following steps:

step 1: acquiring forest stand point cloud data based on ground mobile LiDAR;

step 2: calculating the point cloud characteristics of the scanned forest stand, so as to realize branch-leaf separation of the forest stand point cloud data;

step 3: performing self-adaptive voxelization operation on the main point cloud data of the forest stand, and performing multi-angle projection on the main point cloud data to generate a corresponding depth image;

step 4: detecting a trunk in the generated depth image by adopting a deep learning method;

step 5: obtaining a space three-dimensional point cloud of a corresponding trunk by back projection by utilizing the detected position information of the trunk in the depth image;

step 6: and taking the obtained point cloud of the trunk part as a seed point, and combining a region growing algorithm to realize single wood separation.

As a further improved technical scheme of the present invention, the step 1 includes:

and (3) measuring by a Velodyne HDL-32E scanner according to a set measuring route by moving back and forth in a forest stand, and splicing the measured point cloud data by a SLAM algorithm.

As a further improved technical scheme of the present invention, the step 2 includes:

acquiring each laser point p in point cloud data of a forest stand _i Is characterized by the following:

wherein: { e _ix ,e _iy ,e _iz Is the point p _i Normal vector of { c } _0,i ,c _1,i ,c _2,i Is the point p _i And { l } structural tensor features _0,i ,l _1,i ,l _2,i Is the point p _i Shape characteristic values of (2);

according to each obtained laser spot p _i Features of (2)

And the branches and leaves of the forest stand are separated by combining a Gaussian classifier.

As a further improved technical scheme of the present invention, the step 3 includes:

in the self-adaptive voxelization process, defining the length of a voxel block according to the planting line spacing of the forest tree, defining the width of the voxel block according to the planting line spacing of the forest tree, and defining the height of the voxel block according to the height of the crown trunk of the forest tree clone;

after the voxelization operation, the point cloud data of the forest stand are allocated to the corresponding voxels by adopting an allocation value V _i To mark each valid voxel;

for each voxel V _i Two corresponding depth images are generated by projection from the Y-axis positive direction and the X-axis positive direction.

As a further improved technical solution of the present invention, the step 4 includes:

(4.1) modeling:

selecting a training sample plot, sequentially operating forest segments in the training sample plot by adopting the methods from the step 1 to the step 3 to obtain a plurality of depth images, marking the trunk position of the tree in each obtained depth image, and taking the marked depth images as a training set;

selecting a test sample site, sequentially operating forest stands in the test sample site by adopting the methods from the step 1 to the step 3 to obtain a plurality of depth images, and taking the obtained depth images as a test set;

training the ability of the Faster R-CNN model to identify the trunk position of the tree in the depth image by using the training set to obtain a trained Faster R-CNN network model;

testing the test set by using a trained Faster R-CNN network model;

(4.2) detecting tree trunks of depth images generated by forest segments needing single-tree segmentation:

and (3) inputting a plurality of depth images obtained by sequentially adopting the methods from step 1 to step 3 of the forest stand needing to be subjected to single-tree segmentation into a trained fast R-CNN network model, outputting a prediction boundary frame and a prediction confidence coefficient of the trunk position of the tree in each depth image, and reserving the prediction boundary frame with the prediction confidence coefficient of more than 80%.

As a further improved technical solution of the present invention, the step 5 includes:

and combining the set position information and size information of the voxel block and the position information of the prediction boundary box in the corresponding depth image generated by the voxel block to calculate x, y and z values of the tree trunk space position, and combining the x, y and z values to back-project the tree trunk in the two-dimensional prediction boundary box to the corresponding 3D space, so as to obtain the three-dimensional point cloud of the corresponding tree trunk part.

As a further improved technical solution of the present invention, the step 6 includes:

based on the point cloud of the trunk part of the tree obtained in the step 5, the area growth algorithm based on octree is used for realizing the segmentation of the point cloud of the skeleton part of the tree; wherein the tree skeleton part is the rest structure of the tree with leaves removed;

and for the point clouds of the unclassified leaf parts, taking the point clouds of the segmented tree skeleton parts as center points, and clustering the point clouds of the leaf parts into the point clouds of the corresponding tree skeleton parts through a clustering algorithm.

The beneficial effects of the invention are as follows: according to the method, the self-adaptive voxelization method is utilized to project the cloud data of the target point into a two-dimensional depth image, and then the deep learning method is utilized to detect the trunk part of the forest in the depth image; the detected two-dimensional trunk can acquire three-dimensional point cloud data of the corresponding trunk part through back projection; based on the acquired trunk point cloud data, the final single wood segmentation can be realized by combining a region growing algorithm. Therefore, the method is different from a single tree segmentation algorithm based on crown detection, and the method firstly detects the trunk part of a single tree, thereby fundamentally reducing the influence of crown detection errors on segmentation results. The existing single tree segmentation algorithm based on tree detection for moving laser radar data towards the ground in a trunk mode can generate certain errors when being applied to forest segments with serious deformation of forest forms after wind damage. In general, the invention has great prospect in applying the method combined with deep learning to single rubber tree segmentation research of ground-based mobile LiDAR scanning data.

Drawings

Fig. 1 is a flow chart of the single-wood separating method of the present embodiment.

Fig. 2 is a view showing an outline of the study area of this embodiment.

Fig. 3 is a result diagram of data preprocessing, adaptive voxel formation and multi-angle projection generation of depth images of three rubber forest segments according to the present embodiment.

Fig. 4 is a flow chart of the detection of the trunk of the rubber tree based on the fast R-CNN model in the present embodiment.

Fig. 5 is a diagram showing an example of the training sample of the present embodiment.

FIG. 6 is a training loss diagram of the fast R-CNN model of the present embodiment.

Fig. 7 is a diagram showing an example of the test results of the test sample according to the present embodiment.

FIG. 8 is a diagram showing the result of the trunk detection of the rubber tree according to the present embodiment.

Fig. 9 is a graph of the region growing result based on the point cloud of the trunk portion in the present embodiment.

Fig. 10 is a graph of final single-wood segmentation results after She Zidian cloud clustering based on the skeleton part point cloud in the embodiment.

Detailed Description

The following further describes embodiments of the present invention with reference to fig. 1 to 10:

a single wood segmentation method facing to laser point cloud based on Faster R-CNN, the flow is shown in figure 1, includes:

step 1: and acquiring forest stand point cloud data based on the ground mobile LiDAR.

The method comprises the following steps: FIG. 2 is a schematic diagram of an investigation, an experimenter carrying a Velodyne HDL-32E scanner to walk around three rubber forest stands at a speed of 0.5 m/s according to established measurement routes to obtain point cloud data for a target rubber forest stand. The data splice of the whole system adopts simultaneous localization and mapping (SLAM) algorithm.

In this example, one forest Duan Ziji was created for each of the three rubber forest stands (rubber forest stand 1, rubber forest stand 2, and rubber forest stand 3), and used as a training pattern in the subsequent experiments. Each training plot consisted of an area of approximately 0.6x0.6 km representing the corresponding rubber forest stand and tree height measurements were made in three training plots using a Vertex IV meter at day 2016, 2, 11 (see table 1 for specific values). Furthermore, three further subsets (which do not intersect the subset used as training points) were again selected from the three rubber forest segments, each subset consisting of an area of about 0.3×0.6km, which was used as a subsequent test specimen.

Step 2: and calculating the point cloud characteristics of the scanned forest stand, so that branch and leaf separation of the forest stand point cloud data is realized.

The method comprises the following steps: calculating and acquiring each laser spot p _i Is characterized by the following:

wherein: { e _ix ,e _iy ,e _iz Is the point p _i Normal vector of { c } _0,i ,c _1,i ,c _2,i Is the point p _i And { l } structural tensor features _0,i ,l _1,i ,l _2,i Is the point p _i Is a shape characteristic value of (a). According to each obtained laser scanning point p _i Features of->

And the branches and leaves of the rubber forest stand are separated by combining a Gaussian classifier.

And 3, performing self-adaptive voxelization operation on the main point cloud data of the forest stand, and performing multi-angle projection on the main point cloud data to generate a corresponding depth image. In the self-adaptive voxelization process, defining the length of a voxel block according to the planting line spacing of the forest tree, defining the width of the voxel block according to the planting line spacing of the forest tree, and defining the height of the voxel block according to the height of the crown trunk of the forest tree clone; after the voxelization operation, the point cloud data of the forest stand are allocated to the corresponding voxels by adopting an allocation value V _i To mark each valid voxel.

The line spacing is generally about 6-8m according to the artificial rubber forest, and is about 2.8-3m, so that in the self-adaptive voxelization process, the embodiment defines the length of the voxel block to be 8m and the width to be 3m. Furthermore, considering the growth and structural characteristics of the different rubber tree clones, different heights are set for voxels of the three training plots according to the height of the undercrown trunk of the three rubber tree clones (see table 1), the voxel heights being 5.62 (training plot 1), 8.40 (training plot 2) and 9.35 (training plot 3), respectively. After the voxelization operation, the point cloud data of the three training sample sites are assigned to the respective voxels and a value V is assigned _i To mark each valid voxel. For each voxel V _i Two corresponding depth images are generated by projection from the Y-axis positive direction and the X-axis positive direction. The number of depth images generated from the scan points of the three training plots is 233,268 and 301, respectively, all of which are used to construct the training set, with the total number of depth images in the training set being 802. In fig. 3, (a), (b) and (c) are the result graphs of data preprocessing, adaptive voxelization and multi-angle projection generation of depth images of the rubber forest stand 1, the rubber forest stand 2 and the rubber forest stand 3, respectively.

Table 1: parameters related to three rubber forest stands

And 4, detecting the trunk in the generated depth image by adopting a deep learning method.

(4.1) modeling: after a training sample plot is selected, sequentially adopting the methods from the step 1 to the step 3 to operate the forest stand in the training sample plot to obtain a plurality of depth images, marking the trunk position of the tree in each obtained depth image, and taking the marked depth images as a training set. After a test sample is selected, the forest stand in the test sample is operated by adopting the methods from the step 1 to the step 3 in sequence, a plurality of depth images are obtained, and the obtained depth images are used as a test set. And training the ability of the Faster R-CNN model to identify the trunk position of the tree in the depth image by using the training set to obtain a trained Faster R-CNN network model. And testing the test set by using the trained Faster R-CNN network model.

The Faster Region-based Convolutional Neural Network (Faster R-CNN) model consists of two Convolutional Neural Network (CNN) networks-a proposed regional Region Proposal Network (RPN) network and a Fast Region-based Convolutional Neural Network (Fast R-CNN) detection network using the proposed regional. The RPN samples the random region information of the image as proposed regions and trains them to determine regions that may contain targets. The Fast R-CNN detection network further processes the region information collected by the RPN network, determines the target class in the region, and adjusts the size and position of the region to locate the precise position of the target in the image. FIG. 4 is a model architecture for automatic rubber tree trunk detection and identification based on Faster R-CNN.

The training process of the fast R-CNN network model mainly comprises four steps. First, the parameters of the entire fast R-CNN network model are initialized with the pre-trained model. Second, the RPN network is trained with a training set. Third, the RPN generated advice region is used to train the Fast R-CNN detection network. Finally, the RPN and Fast R-CNN form a joint network, and the weights of the joint network are adjusted by repeating the above procedure.

Based on the pretrained convolutional neural network, the training set generated by the training pattern is used for training and optimizing parameters of the Faster R-CNN network model. As described in step 3, the training samples are generated by performing multi-angle projection on the adaptive voxel block, and according to the number and morphological characteristics of the trunk of the rubber tree in the depth image, the embodiment analyzes the depth image in 6 cases: (a) The depth image only comprises a complete rubber tree trunk; (b) The depth image comprises two complete rubber tree trunks; (c) a plurality of branched stems appear in the voxels; (d) information of the trunk is obscured by leaves or branches; (e) Trunk information belonging to a plurality of trees is overlapped in one voxel; (f) The trunk of the same rubber tree trunk tree appears in two adjacent images resulting from the projection of two adjacent voxels. In order to prepare for the subsequent training process, the training samples must be marked first (i.e. the trunk position in the obtained depth image is marked), and for the above six cases, the present embodiment proposes different marking modes respectively: (a) marking the entire trunk; (b) marking the trunks of two trees in one image; (c) Marking trunks of two trees in one image, including branches; (d) marking only the trunk portion, without marking the leaf portion; (e) marking all overlapping trunks; (f) Only the trunk portion in a voxel is marked, while branches that occur in neighboring voxels or the upper part of the trunk is not marked. The labeling results of depth images for six cases in a partial training set are shown in fig. 5, where a rectangular box that tightly encloses the trunk of the rubber tree is used as a group trunk in subsequent training.

Specific training process of model:

A. selection of a feature extraction network model:

the choice of convolutional neural network for feature extraction determines the accuracy of the final recognition result of the training process, so how to select an appropriate convolutional neural network framework is important in the training process.

In computer vision, the depth of the network is an important factor in achieving good results. As network depth increases, the level of functionality increases accordingly. However, too large a depth value may cause a gradient extinction phenomenon. Therefore, the VGG16 network is selected as the feature extraction network model in the present embodiment. VGG16 networks consist of 13 convolutional layers, each followed by a ReLU layer, which is a common activation function in artificial neural networks. Some convolution layers are also followed by a maximum pool layer to preserve features and reduce unnecessary parameters to increase computation speed.

B. Pretraining the CNN model:

as shown in table 1, the total number of samples in the training set is 802, but a large number of samples is the basis for high-precision training. The transfer learning can use general data to acquire a pre-training model and use the model to construct an RPN network and a Fast RCNN detection network, so that the problem of too small training samples is solved. In this embodiment, a training set (approximately 100,000 images, 1000 classifications) in ImageNet is used to pre-train the VGG16 network model.

C. Training of RPN network:

as an important component of Faster R-CNN, the RPN neural network takes depth images as input and outputs a set of rectangular suggested regions to the Fast R-CNN detection network. The special network structure can improve the region extraction speed. Meanwhile, the method can scale and translate the size and the position of the suggested area according to the group trunk, so that the positioning is more accurate.

The feature map of the input picture can be extracted by utilizing the convolutional layer of the VGG16 feature extraction network. For each position of the feature map, a convolution operation is performed through a 3 x 3 sliding window to obtain a multi-dimensional feature vector corresponding to the same position, reflecting the depth features in the sliding window of that position. And this feature vector is fed to the fully connected layers of the two peers: regression layer (reg layer) and classification layer (cls layer). Anchor is centered on a 3 x 3 sliding window, each sliding window has nine anchors, the size of which corresponds to three proportional combinations [128 ] ² ，256 ² ，512 ² ]And three aspect ratios [1:1,1:2,2:1]. For an anchor, if the overlap of any one of its groups trunk is higher than 0.7, a positive label is assigned to the anchorA sign (i.e., positive anchor). In some rare cases, this may not find a positive anchor, for which we designate an anchor with the highest overlap with one of the groups trunk as the positive anchor. For a non-positive anchor, if its overlap with all groups trunk is below 0.3, it is designated as a negative anchor. For each anchor, a probability value of its belonging to the foreground (i.e., the information in the anchor is determined as the target)/the background (i.e., the information in the anchor is determined as the non-target) and a positional deviation of the anchor with respect to the ground trunk are calculated. When the information in the anchor is identified as a target (i.e., the anchor belongs to the foreground), then the anchor is retained as a suggested area and used for subsequent training. Before further training the Fast R-CNN detection network, the suggested regions generated by the RPN network are first mapped into a feature map and a series of ROIs (regions of interest) of random size are generated on the feature map.

D. Training of Fast R-CNN detection network:

the ROI with random size is normalized to a fixed size by the pooling operation as an input to the ROI pooling layer (see fig. 4). The normalized ROI is fed into two fully connected layers, where the fully connected layers (cls layers) are used to achieve classification of targets in the ROI. The other full-connection layer (reglayer) adjusts the position of the ROI according to the group trunk through two translation and two scaling parameters, so that the position of the ROI is closer to the real position of the target.

E. Loss function in training process

During the training process, the parameters of the neural network are adjusted by the loss function, which is defined as follows:

wherein i is an index value of an anchor in the training process. Classification loss function L _cls (e _i ,e _i ^* ) Is a log-loss function of two classes (whether the anchor belongs to the foreground or the background).

L _cls (e _i ,e _i ^* )＝-log[e _i ^* e _i +(1-e _i ^* )(1-e _i )] (2)；

Wherein e _i Is the probability that the anchor predicts as the foreground, e _i ^* Is a group trunk tag. If the anchor is a positive anchor, e _i ^* The value is equal to 1, if the anchor is a negative anchor, e _i ^* The value is equal to 0.

L _reg (t _i ,t _i ^* ) For the regression loss function:

L _reg (t _i ,t _i ^* )＝R(t _i -t _i ^* ) (3)；

for each anchor, the regression loss is calculated by multiplying by e _i ^* This means that the loss is calculated only when the anchor is determined to be foreground, and otherwise, the loss is not calculated. Wherein t is _i ＝{t _x ,t _y ,t _w ,t _h And is a vector indicating the offset predicted by the anchor. t is t _i ^* ＝{t _x ^* ,t _y ^* ,t _w ^* ,t _h ^* And t is _i The vector of the same dimension represents the actual offset of the anchor with respect to the ground trunk (see fig. 4). As shown in equation (4), R is a smoothing loss function that controls the smoothed region by a parameter σ, typically σ=1, in the fast R-CNN function σ=3:

as shown in equation (1), the outputs of cls and reg layers (see FIG. 4) are respectively represented by { e _i Sum { t } _i Composition. N (N) _cls Is the size of the feature map (about 1750, 50×35), N _reg Is the batch size (in RPN networks, N _reg =256, N in Fast R-CNN detection network _reg =128). As shown above, N _cls And N _reg The difference is too large and the parameter lambda is used to balance the two, resulting in a total loss function L ({ e) _i },{t _i }) can uniformly account for 2 losses _。

Using the manually marked rubber tree trunk as a reference, a difference between the predicted region information and the group trunk can be generated. And according to the calculated difference, adjusting the weight and the offset of the neural network by using a back propagation algorithm. Therefore, with enough training, the fast R-CNN can accurately detect the trunk position and the class of the rubber tree.

The convolutional neural network of the experiment is constructed based on a TensorFlow deep learning framework. Experiments were performed on a PC with intel i7-8550U CPU,16GB RAM and NIVIDA GTX 1070 GPU. During the experiment, the Monentim method was used for training of Faster R-CNN, where the weight decay was 0.0001 and the Momentum was 0.9. The learning rate during training was 0.001 and the number of iterations was 70000. Training loss as shown in fig. 6, the total training time was about 2 hours.

While the overall loss is not smooth enough, the overall downward trend is apparent (fig. 6). The loss occurs mainly in the first 100 iterations. The final loss is about 0.002, which means that the error between the predicted result and the corresponding ground trunk is small.

After the Faster Region-based Convolutional Neural Network (Faster R-CNN) model is trained, it needs to be tested. The testing process comprises four main steps: first, for the forest stand point cloud data of each test specimen, the point cloud is assigned to different voxels by a voxelization method. Next, a depth image for testing is generated by multi-angle projection of the voxels. Thirdly, testing a test sample by using a trained fast R-CNN network model to predict the positions of the trunks of the rubber trees in the image, and only reserving the prediction results with more than 80% of the prediction confidence, wherein FIG. 7 is a test recognition result diagram of the trunks of the rubber trees in six cases.

(4.2) detecting tree trunks of depth images generated by forest segments needing single-tree segmentation through a trained Faster R-CN network model:

and (3) sequentially adopting a plurality of depth images obtained by the method from step 1 to step 3 for the forest stand needing to be subjected to single-tree segmentation, inputting the plurality of depth images into a trained Faster R-CNN network model, outputting a prediction boundary frame and a prediction confidence coefficient of the trunk position of the tree in each depth image, and reserving the prediction boundary frame with the prediction confidence coefficient of more than 80%.

Step 5: and obtaining the space three-dimensional point cloud of the corresponding trunk by back projection by utilizing the detected position information of the trunk in the depth image.

The method comprises the following specific steps: and combining the position information and the size information of the set voxel blocks and the position information of the prediction boundary boxes in the corresponding depth images, and calculating to obtain x, y and z values of the trunk space position of the rubber tree. And combining the x, y and z values, and back projecting the trunk in the two-dimensional prediction boundary box to the corresponding 3D space, so that the three-dimensional point cloud of the trunk part of the corresponding rubber tree can be obtained.

Step 6: based on the trunk part point cloud obtained in the step 5, realizing the segmentation of the rubber tree skeleton point cloud by using an octree-based region growing algorithm; and for unclassified leaf part point clouds, taking the segmented rubber tree skeleton part point clouds as center points, and clustering She Zidian clouds into corresponding skeleton part point clouds through a clustering algorithm.

In fig. 8, (a), (b) and (c) are shown the results of a single rubber tree trunk location of three test sites (i.e., 8 (a) corresponds to test site 1;8 (b) selected from

rubber stand

1, 2 selected from rubber stand 2, and 8 (c) corresponds to test site 3 selected from rubber stand 3), respectively. Each rubber tree trunk is represented by light vertical lines that serve as seed points to further segment the unclassified point cloud. The results of the single rubber tree skeleton point segmentation at three test sites (i.e., test site 1, test site 2, and test site 3) are shown in fig. 9 (a), (b), and (c), respectively. In fig. 10, (a), (b) and (c) are shown the leaf portion clustering results based on the skeleton point segmentation results for three samples (i.e., sample 1, sample 2 and sample 3), respectively.

For three rubber forest test specimens, the present experiment evaluates the segmentation results at the single rubber tree level. If the rubber tree is marked and classified as class A, it is True Positive (TP); if the rubber tree is marked as class a but not segmented (assigned to another class), then it is False Negative (FN); if the rubber tree is not present but segmented, it is a False Positive (FP). We expect higher TP, lower FN and lower FP to achieve higher accuracy. Further, the backbone detection rate r (recall), the correctness P (precision) of the backbone detection, and the overall accuracy F (F score) of the backbone detection are calculated using the following equations.

As shown in Table 2, the segmentation results for all three samples were accurate, with r, P and F values greater than 0.98. For sample 1 (PR 107), the values of r, P and F were 1,0.98 and 0.99, respectively. For sample 2 (CATAS 7-20-59), the values of r, P and F were 0.99,0.99 and 0.99, respectively. For sample 3 (CATAS 8-7-9), the values of r, P and F were 0.98,0.99 and 0.98, respectively. The accuracy of these segmentations is almost the same, although the species of planting are different.

Table 2: evaluation of accuracy of rubber tree segmentation results of test sample sites in three rubber forest lands

In order to further locate the error condition in the detection result, the embodiment divides the test set into six cases according to the number and the state of the trunk of the rubber tree contained in the image. The number of images in these six cases was 228,22,48,18,14 and 29, respectively. The present example analyzes the accuracy of the trunk detection of the rubber tree in these six cases, calculates the average accuracy P, the average recall r, the average F score for each case, and compares the results using the average F score. As shown in Table 3, the values of F were as high as 100% for cases a, c and d. While for the other cases b, d and e, the accuracy of F reached 90% although there were some false positive and false negative errors.

Many factors can lead to errors in the detection results for the three cases (b, d and e) described above. First, as can be seen from table 3, the training pictures corresponding to these three cases are relatively few, which may lead to a problem of under fitting. For case e, the problem of the projection angle may also cause that the corresponding multiple trunks are highly overlapped in the projection picture, which is difficult to distinguish. In the case f, due to typhoon invasion, the trunk may have a serious inclination condition, so that when voxelization is performed, the upper and lower parts of the trunk appear in different voxel blocks, and when labeling, the embodiment only marks the lower half of the trunk in the corresponding picture, and the upper half is not marked. However, due to the similarity of the shape characteristics, the upper trunk part of the model can be detected erroneously during detection.

Table 3: evaluation of accuracy of rubber tree segmentation under six different conditions

/>

Single-tree segmentation is an important premise for retrieving forest attributes from various types of forest remote sensing data. Previous single-tree segmentation related studies rely on single-tree detection of monochromatic wavelength on-board laser scanning, and focus on the use of geometric spatial information of the point cloud. However, these methods have difficulty extracting clustered crowns with similar height and density distributions because clustered trees do not meet the assumption of geometric constraint features. For example, a clustered crown having similar height and density profiles may be falsely detected as a single treetop. On the other hand, a non-treetop local maximum may be erroneously detected as treetop.

The present embodiment proposes a deep learning based approach to improve the segmentation of a single tree. Since typhoons frequently occur in a research area, the rubber tree is severely inclined, so that the morphological structure of the crown is not obvious. Therefore, single rubber tree segmentation based on canopy features is difficult to achieve, and the accuracy of the segmentation results is susceptible to interference. The fast R-CNN method combines a large number of data samples to learn the characteristics of the target to be detected, so that the single tree segmentation method based on the fast R-CNN model is more robust. As shown in Table 3, the overall accuracy of our method reaches 90% despite some false positive and false negative errors in the test results. Although the method of the embodiment cannot achieve 100% accuracy of pure manual detection, the manpower cost of manual detection can be effectively reduced. In general, this work demonstrates the possibility of using deep learning to solve the single rubber tree segmentation problem of ground-based LiDAR data.

The scope of the present invention includes, but is not limited to, the above embodiments, and any alterations, modifications, and improvements made by those skilled in the art are intended to fall within the scope of the invention.

Claims

1. A single wood segmentation method facing to laser point cloud based on Faster R-CNN is characterized by comprising the following steps:

step 1: acquiring forest stand point cloud data based on ground mobile LiDAR;

step 6: taking the obtained point cloud of the trunk part as a seed point, and combining a region growing algorithm to realize single wood separation;

the step 3 comprises the following steps:

for each voxel V _i Generating two corresponding depth images by projection from the Y-axis forward direction and the X-axis forward direction;

the step 4 comprises the following steps:

(4.1) modeling:

testing the test set by using a trained Faster R-CNN network model;

inputting a plurality of depth images obtained by adopting the methods from step 1 to step 3 to a trained fast R-CNN network model, outputting a prediction boundary frame and a prediction confidence coefficient of the trunk position of the tree in each depth image, and reserving the prediction boundary frame with the prediction confidence coefficient of more than 80%;

the step 5 comprises the following steps:

2. The method for laser point cloud oriented single wood segmentation based on fast R-CNN according to claim 1, wherein the step 1 comprises:

3. The method for laser point cloud oriented single wood segmentation based on fast R-CNN according to claim 2, wherein the step 2 comprises:

according to each obtained laser spot p _i Features of (2)

4. The method for laser point cloud oriented single wood segmentation based on fast R-CNN according to claim 1, wherein the step 6 comprises: