CN113989535A

CN113989535A - Point cloud classification method combining region growing and random forest

Info

Publication number: CN113989535A
Application number: CN202111239501.1A
Authority: CN
Inventors: 王竞雪; 宿颖; 刘肃艳
Original assignee: Liaoning Technical University
Current assignee: Liaoning Technical University
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-01-28

Abstract

The invention provides a point cloud classification method combining region growing and random forests, which is characterized in that a region growing algorithm is utilized to segment airborne LiDAR point cloud data of a training set to obtain a plurality of segmented patches; extracting the characteristics of the segmented patches; determining an optimal feature combination based on feature importance and out-of-bag errors of different feature combinations, and realizing feature selection of a random forest classifier; training a random forest classifier by using the optimal feature combination, and using the random forest classifier for classifying the test data set; and performing topology optimization on the surface patches in the classification result to obtain a final point cloud classification result. Compared with the prior art, the invention realizes a random forest classification algorithm facing to a segmentation object, takes a segmentation surface patch as a primitive, is easy to express and extract the ground feature characteristics, and combines a random forest classifier to classify the ground feature characteristics, thereby accurately classifying the point cloud, and providing basic data support for subsequent point cloud data statistics.

Description

Point cloud classification method combining region growing and random forest

Technical Field

The invention relates to the technical field of remote sensing data processing, in particular to a point cloud classification method combining region growing and random forests.

Background

The airborne laser radar (LiDAR) can directly obtain high-precision and high-density three-dimensional point coordinates (LiDAR point cloud for short), and is widely applied to the aspects of three-dimensional reconstruction, city planning and management, vehicle navigation, disaster emergency and evaluation and the like, wherein point cloud classification plays a key role. The point cloud classification mainly comprises a process of distinguishing ground points, building points and vegetation points, and the existing LiDAR point cloud classification method mainly comprises an unsupervised point cloud classification method and a supervised point cloud classification method.

The unsupervised point cloud classification method is mainly used for constructing the relation among three-dimensional point clouds by calculating various characteristics of the point clouds, and actually realizing a point cloud clustering process based on characteristic similarity. A point cloud data classification method based on height difference [ J ] mapping and reporting, 2018(06):46-49 ], and the like, provides a point cloud classification method based on a height difference secondary derivative, and realizes effective classification of buildings and vegetation. A cloth simulation filtering algorithm is proposed by ZHANG W M (ZHANG W M, QI J B, XIE D H.an easy-to-use air filtration method on cl other simulation [ J ]. Remote Sensing,2016,8(6):501.) and the like, and the separation of ground points and non-ground points is realized by utilizing the change before and after cloth turning. The unsupervised point cloud classification algorithm generally only extracts ground objects of certain specific categories, and the application scene is limited greatly and lacks universality.

The supervised point cloud classification method mainly comprises an artificial neural network, a support vector machine, a random forest, a decision tree and the like. The methods firstly use manual labels to select training samples to train the classifier, and then use the classifier to carry out point cloud classification on the test samples. The method for classifying the point cloud by using the information vector machine instead of the support vector machine is provided, and the problem of weak model sparsity during point cloud classification by using the support vector machine is solved. Schlemia beans (Schlemia beans, ChengYing bud, Sao Xiao Song, Qin Xian Xiong, Wen Peu.) the traditional random algorithm is improved by integrating the cloth filtering and improving the point cloud classification algorithm [ J ] of random forest, the progress of laser and optoelectronics, 2020,57(22):192 & 200.), and the like, and the point cloud classification algorithm of the integrated cloth filtering and weighted weak correlation random forest model is provided. The Hodelog (Hodelog. point cloud single point classification method [ J ] based on the curvature-considered adaptive neighborhood modern manufacturing technology and equipment, 2021,57(07):119-122.) proposes a curvature-considered adaptive neighborhood point cloud classification method, which can generate an ideal three-dimensional point cloud neighborhood and enhance the separability of point cloud characteristics. The point cloud classification method calculates the point cloud characteristics by taking a single laser foot point as a minimum classification unit, trains the classifier by applying all the characteristics, does not consider the influence of characteristic calculation and selection on the classification performance of the classifier, and influences the operation efficiency of the algorithm due to overhigh characteristic dimensionality.

Disclosure of Invention

Based on the technical problem, the invention provides a point cloud classification method combining region growing and random forests, which comprises the following steps:

step 1: utilizing a region growing algorithm to carry out segmentation processing on LiDAR point cloud of a training set;

step 2: extracting the characteristics of the patches obtained by fitting each divided unit;

and step 3: determining an optimal feature combination based on feature importance and out-of-bag errors of different feature combinations, and realizing feature selection of a random forest classifier;

and 4, step 4: training a random forest classifier by adopting the optimal characteristic combination, and classifying a test set by using the trained classifier;

and 5: and carrying out topology optimization on the classification result to obtain a final point cloud classification result.

The step 1 comprises the following steps:

step 1.1: carrying out normal vector and curvature estimation on the point cloud of the LiDAR in the training set point by adopting a random sampling consistency method and a principal component analysis method;

step 1.2: selecting a point with the minimum curvature as an initial seed point;

step 1.3: searching k adjacent points of the seed points by adopting a KD tree, and performing region growth by taking two characteristic similarities of a vertical distance and a normal vector included angle as growth conditions;

step 1.4: until no new adjacent point appears, the region growth is finished, and the seed point clustering result point set is separated from the original point cloud and stored as an independent unit;

step 1.5: repeating the step 1.2 to the step 1.4 until all the point clouds are segmented to obtain a plurality of segmentation units;

step 1.6: and performing surface patch fitting on each unit, calculating the elevation and normal vector characteristics of the segmented surface patches, and performing optimization integration on adjacent surface patches to obtain the final point cloud segmentation result.

The step 1.1 comprises the following steps:

step 1.1.1: selecting k adjacent points of the current point based on a KD tree principle;

step 1.1.2: randomly selecting 3 points from the three points to establish an initial fitting plane to obtain a plane fitting equation, and calculating the distances from the other adjacent points to the fitting plane;

step 1.1.3: standard deviation by point-to-plane distance versus distance threshold T_dEstimating to make the distance to the fitting plane less than T_dThe adjacent points are marked as interior points, and the number of the interior points conforming to the plane model is counted;

step 1.1.4: repeating the step 1.1.2-step 1.1.3 for N times to obtain N plane equations, and selecting a fitting plane containing the largest number of interior points as a best fitting plane model of the points;

step 1.1.5: principal component analysis is performed on the interior point data contained in the best fitting plane model, and the covariance matrix C is obtained and expressed as:

x, Y, Z respectively represents one-dimensional vectors of X coordinates, Y coordinates and Z coordinates of all interior points obtained by the random sampling consistency method of k neighborhood points of a current point, and cov (-) represents the covariance of two components;

step 1.1.6: and calculating the eigenvalue and the eigenvector according to the covariance matrix C, wherein the eigenvector corresponding to the minimum eigenvalue is the normal vector of the point, and the ratio of the minimum eigenvalue to the sum of all eigenvalues is defined as the curvature of the point.

The step 3 comprises the following steps:

step 3.1: training a random forest classifier by using the features extracted in the step 2;

step 3.2: testing the classification precision of the random forest classifier according to the data outside the bag, and simultaneously obtaining the importance index of each characteristic variable;

step 3.3: arranging the characteristic variables from high to low according to the importance indexes, and deleting the characteristic with the minimum importance index to form a group of new characteristic combinations;

step 3.4: repeating the step 3.1 to the step 3.3 until the number of the residual characteristic variables is equal to a given threshold value, and ending the iteration;

step 3.5: and selecting the feature combination with the minimum random forest out-of-bag error as the optimal feature combination.

The step 4 comprises the following steps:

step 4.1: training a random forest classifier by using the optimal feature combination;

step 4.2: segmenting the test set data by using a region growing algorithm, and extracting the characteristics of a surface patch obtained by fitting each segmented unit;

step 4.3: and (4) inputting the result obtained in the step (4.2) into a trained random forest classifier, giving a test result for each decision tree in the random forest for each segmentation object, counting the test results of all the decision trees, and taking the test class with the highest ticket number as a final classification result.

The step 5 comprises the following steps:

step 5.1: for point clouds with number less than given threshold T_nThe divided patches of (2) are searched for their neighboring patches, and the patch and the neighboring patch are determinedWhether the sheet attributes are the same or not, and if the sheet attributes are different from the attributes of the adjacent sheets, defining the sheet as an island sheet;

step 5.2: calculating the three-dimensional distance between the island patch and its adjacent patch, if less than a given distance threshold T_DIf so, merging the segmented patches into segmented patches with larger areas in adjacent patches, and re-classifying the segmented patches, otherwise, keeping the original classification results of the segmented patches.

The step 3.1 comprises the following steps:

step 3.1.1: number N of decision trees contained in a given random forest_tRandomly selecting w characteristic variables from all the characteristics as split nodes of each decision tree, and generating the decision tree by continuously splitting the nodes;

step 3.1.2: assuming that a training set is segmented to obtain M segmented patches, taking the M segmented patches as M sample data, wherein each sample data contains l-dimensional features;

step 3.1.3: extracting h samples from the M samples by adopting a random sampling method to serve as a training sample set constructed by a single decision tree, wherein the samples which are not extracted are regarded as corresponding data outside the bag;

step 3.1.4: repeating the step 3.1.3, selecting N_tEach training sample set being used for N_tTraining of decision trees to generate N_tThe decision trees form a random forest classifier.

The step 3.2 comprises:

step 3.2.1: testing a single decision tree by using corresponding sample data outside the bag, calculating the error outside the bag of the decision tree, and recording as err₁Then, the characteristic variable l in the data outside the bag is compared_xNoise is added randomly for interference, and the error outside the bag is calculated again and recorded as err₂Then the feature variable l in the decision tree_xIs that V ═ err₁-err₂|；

Step 3.2.2: assume a characteristic variable l_xThe feature variable l is counted when the feature variable exists in r decision trees_xThe total importance of (1) is the average of the sum of the importance of the variable in all decision trees to obtain a characteristic variable l_xImportance index of。

The invention has the beneficial effects that:

the invention provides a point cloud classification method combining region growing and random forests, which has the following beneficial effects:

(1) the invention improves the traditional point-based random forest point cloud classification method, and the original point cloud data is segmented, and the segmented surface patches are used as the minimum units to calculate the point cloud characteristics, so that the characteristics have more accurate semantic information.

(2) According to the invention, after patch feature calculation, feature importance measurement based on random forests is introduced, and an optimal feature combination is selected from the feature importance measurement to train a random forest classifier, so that the classification precision of the classifier is effectively improved.

Drawings

FIG. 1 is a flow chart of a point cloud classification method combining region growing and random forests in an embodiment of the present invention;

FIG. 2 is a flow chart of a point cloud segmentation process in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of a random forest classifier constructed according to an embodiment of the present invention;

FIG. 4 is a plot of an experimental region of a training data set in accordance with an embodiment of the present invention;

FIG. 5 is a plot of an experimental region of a test data set in accordance with an embodiment of the present invention;

FIG. 6 is a graph of experimental results of a test data set in accordance with an embodiment of the present invention.

Detailed Description

The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. Aiming at the problems in the prior art, the invention provides a point cloud classification method combining region growing and random forests. For two groups of acquired LiDAR point clouds, one group is used as a training set, and the other group is used as a testing set; LiDAR point cloud data with existing standard classification results are used as a training set, a random forest classifier is trained by using the LiDAR point cloud data, and the LiDAR point cloud to be classified (namely a test set) is classified by using the trained random forest classifier. The feature determines the effectiveness and accuracy of machine learning and directly influences the classification precision, so that the invention strengthens the calculation and selection of the feature through the following two aspects: on one hand, the original point cloud data is divided into a plurality of divided surface patches, and the divided surface patches are used as minimum units to calculate various features, so that the features have more accurate semantic information; on the other hand, random forests are used for feature selection, and classifiers are retrained through optimal feature combinations for testing point cloud classification, so that the point cloud classification precision is improved.

A point cloud classification method combining region growing and random forests is shown in FIG. 1 and comprises the following steps:

step 1: segmenting the LiDAR point cloud of the training set by using a region growing algorithm, wherein the specific principle is as shown in figure 2;

step 1.1: carrying out normal vector and curvature estimation on LiDAR point cloud data of a training set point by adopting a random sampling consistency method (RANSAC) and a Principal Component Analysis (PCA);

step 1.1.1: selecting K adjacent points of the current point based on a KD tree (K-dimensional index tree data structure), where K is 20 in this example;

step 1.1.4: repeating the step 1.1.2 to the step 1.1.3 for N times to obtain N plane equations, and selecting a fitting plane containing the largest number of interior points as a best fitting plane model of the points, wherein in the example, N is 100;

step 1.1.6: calculating eigenvalues and eigenvectors according to the covariance matrix C, wherein the eigenvector corresponding to the minimum eigenvalue is the normal vector of the point, and the ratio of the minimum eigenvalue to the sum of all eigenvalues is defined as the curvature of the point;

step 1.3: searching k adjacent points of the seed points by adopting a KD tree, and performing region growth by taking two characteristic similarities of a vertical distance and a normal vector included angle as growth conditions, wherein the vertical distance threshold and the normal vector included angle threshold are respectively 0.5 and 10 in the example;

step 1.5: repeating the step 1.2-step 1.4 until all the point clouds are segmented to obtain a plurality of segmentation units;

step 1.6: performing surface patch fitting on each unit, calculating the elevation and normal vector characteristics of the segmented surface patches, and performing optimization integration on adjacent surface patches to obtain the final point cloud segmentation result; the process is specifically realized based on a region growing process of the surface patches, the surface patch with the smallest curvature in the surface patch set is sequentially selected as a seed surface patch, and the similarity judgment condition is determined by taking the height difference and the normal vector included angle between the seed surface patch and the surface patch to be grown as similarity judgment conditions in the region growing process; in this example, the height difference threshold of the two slices is 0.5, and the normal vector included angle of the two slices is not more than 30.

Step 2: extracting the characteristics of the patches obtained by fitting each divided unit; the features include elevation-related features, geometric-related features, eigenvalue and eigenvector-related features, echo and reflection intensity features, and others. Respectively extracting the following characteristics of each divided surface patch, specifically:

(1) the elevation related characteristics of the divided surface patches comprise three types, namely normalized average elevation of the divided surface patches, elevation difference of the divided surface patches and elevation variance of the divided surface patches:

average elevation H of divided patches_aThe elevation average value of all point clouds contained in a segmentation patch is shown as follows:

wherein N' is the number of point clouds contained in the divided surface slice, Z_iExpressing the normalized elevation value of the ith point in the segmentation surface patch;

the elevation difference of the segmentation surface patches refers to the difference between the maximum elevation value and the minimum elevation value of the point cloud in the segmentation surface patches;

elevation variance H of divided patches_vThe variance of the elevation values of all points in the segmented patch is shown as follows:

(2) the geometric correlation characteristics of the divided patches are five types:

plane fitting index S_n: the value is the average value of the distances from all points in the patch to the fitting plane;

surface roughness S_r: the value is the curvature average value of all laser foot points in the surface patch;

area S_m: the value is the area of the divided patch projected on the two-dimensional XOY plane;

rectangular degree S_j: the value is the ratio of the area of a polygon projected to a two-dimensional XOY plane by the divided surface patches to the area of the minimum circumscribed rectangle of the polygon;

narrow length S_x: the value is the ratio of the short side to the long side in the minimum circumscribed rectangle of the polygon projected to the two-dimensional XOY plane by the segmentation surface patch;

(3) the above-mentioned andthe characteristic value and the characteristic vector related characteristic comprise a point characteristic lambda of a segmentation surface patch_p＝λ₃/λ₂Linear characteristic lambda_l＝(λ₁-λ₂)/λ₁Characteristic of dough kneading_s＝(λ₂-λ₃)/λ₁Wherein λ is₁，λ₂，λ₃Three eigenvalues obtained for solving the covariance matrix of the point cloud data, the magnitudes of which satisfy lambda₁>λ₂>λ₃；

(4) The echo and reflection intensity characteristics comprise echo frequency ratio, average intensity and intensity variance of the divided patches, and specifically comprise the following steps:

the invention calculates the echo characteristics based on the segmentation patches by a multi-echo ratio, and assumes that the number of point clouds with multiple echoes in a certain segmentation patch is N_mIf the number of point clouds included in the segmented patch is N', the multi-echo ratio is N_IThe calculation formula is N_I＝N_m/N′；

The invention uses two reflection intensity related characteristics, the intensity mean value I_aAnd intensity variance I_v，I_iThe intensity value of the ith laser foot point in the divided surface patch is represented, then I_aAnd I_vThe calculation formulas of (A) and (B) are respectively as follows:

(5) the other features are the number N' of point clouds included in the segmented patch.

And step 3: and determining an optimal feature combination based on the feature importance and the out-of-bag errors of different feature combinations, and realizing feature selection of the random forest classifier. And (3) performing feature selection based on a random forest on the segmentation patch features calculated in the step (2), wherein the construction principle of a random forest classifier is shown in the attached figure 3, and the specific method is as follows:

step 3.1.1: number N of decision trees contained in a given random forest_tRandomly selecting w characteristic variables from all characteristic variables as splitting nodes of each decision tree, and generating the decision trees through continuous splitting of the nodes, wherein the number of the decision trees contained in a given random forest is 200 in the example, namely N_t＝200；

Step 3.1.2: assuming that the training set is segmented to obtain M segmented patches, taking the M segmented patches as M sample data, wherein each sample data contains l-dimensional features, and the total of 15 features is calculated in the example, so that the value of l is 15;

step 3.1.3: extracting h samples from the M samples by adopting a random sampling (Bootstrap) method to serve as a training sample set constructed by a single decision tree, wherein the samples which are not extracted are regarded as corresponding data outside the bag;

step 3.1.4: repeating the step 3.1.3, selecting N_tEach training sample set being used for N_tTraining of decision trees to generate N_tForming a random forest classifier by the decision trees;

Step 3.2.2: assume a characteristic variable l_xThe feature variable l is counted when the feature variable exists in r decision trees_xThe total importance of (1) is the average of the sum of the importance of the variable in all decision trees to obtain a characteristic variable l_xThe importance index of (a);

step 3.5: the feature combination with the minimum random forest out-of-bag error is selected as the optimal feature combination, and in the example, when the feature combination is normalized height average, height difference, average curvature, height variance, intensity variance, linear features of the segmented patches, facial features of the segmented patches, plane fitting indexes and the number of point clouds included in the segmented patches, the out-of-bag error is the minimum, so that the feature combination is used as the optimal feature combination of the invention.

And 4, step 4: training a random forest classifier by adopting the optimal feature combination, and classifying a test data set by utilizing a training result, wherein the specific steps are implemented as follows:

step 4.1: training a random forest classifier by using the optimal feature combination (the training method is the same as the step 3.1.1 to the step 3.1.4);

step 4.2: dividing the test set data by using a region growing algorithm, and extracting features of a patch obtained by fitting each divided unit, wherein the extracted features comprise elevation related features, geometric related features, eigenvalue and eigenvector related features, echo and reflection intensity features and other features (the dividing method is the same as the step 1.1-the step 1.6);

And 5: performing topology optimization on the classification result to obtain a final point cloud classification result, and specifically performing the following steps:

step 5.1: for point clouds with number less than given threshold T_nSearching adjacent patches of the divided patches, judging whether the attributes of the patches and the adjacent patches are the same, if the attributes of the patches and the adjacent patches are different, defining the patches as 'island patches', in the example, taking the patches with the point cloud number less than 50 contained in the divided patches as the patches to be processed, and performing neighborhood search;

step 5.2: calculating the three-dimensional distance between the 'island patch' and its adjacent patches, if less than a given distance threshold T_DIt is merged into the divided patches with larger area in the adjacent patches and then re-classified, otherwise, the original classification result of the divided patches is retained, in this example, T_DSet to 5.

The experimental data set used by the invention is used for testing the point cloud data of the benchmark in the Vaihingen region provided by ISPRS-Commission III. The data is located in the central area of the city, and the collection time is in the midsummer season, so the vegetation is luxuriant. The average density of the point cloud in the whole area under 30 percent course overlapping and 60 percent side direction overlapping is 6.7pts/m². The training set comprises 753876 points, the test set comprises 411722 points, the original point cloud data comprises three-dimensional coordinate information, intensity information and echo information, and the point cloud data of the training set and the test set are respectively displayed as shown in fig. 4 and 5 and are colored according to the elevation.

The method is realized by using MATLAB7.11.0 platform programming on a CPU dual-core 3.30GHz, memory 4GB and Windows 7 flagship version system in the experiment, a confusion matrix is established based on standard data, and the point cloud classification accuracy of the invention is evaluated. Meanwhile, in order to verify the effectiveness of the algorithm, the algorithm is respectively compared with the classification results of an original elevation feature classification method and a Support Vector Machine (SVM) classification method. Through comparative analysis, the highest overall classification precision of the algorithm is 87%, and the Kappa coefficient is 0.7965; the classification precision of the original elevation feature classification method is only 52.88%, and the importance of the normalized elevation features is verified; the point cloud classification precision based on the Support Vector Machine (SVM) is 85%, the Kappa coefficient is 0.7683, and the classification performance of the random forest classifier based on the random forest classifier is proved to be superior to that of the support vector machine classifier in the example. Fig. 6 is a diagram of the final classification effect of the present example.

Claims

1. A point cloud classification method combining region growing and random forests is characterized by comprising the following steps:

2. The method for point cloud classification in combination with region growing and random forests as claimed in claim 1, wherein said step 1 comprises:

3. A method of point cloud classification in conjunction with region growing and random forests as claimed in claim 2 wherein said step 1.1 comprises:

4. The method for point cloud classification in combination with region growing and random forests as claimed in claim 1, wherein said step 3 comprises:

5. A method for point cloud classification in combination with region growing and random forests as claimed in claim 1 or 4 wherein said step 4 comprises:

6. The method for point cloud classification in combination with region growing and random forests as claimed in claim 1, wherein said step 5 comprises:

step 5.1: for point clouds with number less than given threshold T_nSearching adjacent patches, judging whether the attributes of the patches are the same as those of the adjacent patches, and if the attributes of the patches are different from those of the adjacent patches, defining the patches as island patches;

7. A method of point cloud classification combining region growing and random forests according to claim 4, wherein said step 3.1 comprises:

8. A method of point cloud classification combining region growing and random forests according to claim 4, wherein said step 3.2 comprises:

Step 3.2.2: assume a characteristic variable l_xThe feature variable l is counted when the feature variable exists in r decision trees_xThe total importance of (1) is the average of the sum of the importance of the variable in all decision trees to obtain a characteristic variable l_xThe importance index of (2).