CN115439653A

CN115439653A - Substation scene point cloud semantic segmentation method

Info

Publication number: CN115439653A
Application number: CN202211275613.7A
Authority: CN
Inventors: 胡帆; 杨罡; 王大伟; 张娜; 张兴忠
Original assignee: Shanxi Hongshuntong Technology Co ltd; State Grid Electric Power Research Institute Of Sepc
Current assignee: State Grid Electric Power Research Institute Of Sepc
Priority date: 2022-10-18
Filing date: 2022-10-18
Publication date: 2022-12-06

Abstract

The invention provides a transformer substation scene point cloud semantic segmentation method, belonging to the technical field of semantic segmentation; the problems that a transformer substation scene is complex, point clouds are numerous, features are difficult to extract, and features with identifiability are easy to under-segment or over-segment and the like due to the fact that the features cannot be accurately extracted are solved; the method comprises the following steps: establishing a transformer substation point cloud semantic segmentation data set; labeling the point cloud data set; constructing a transformer substation point cloud semantic segmentation model, namely a Seg-PointNet model, wherein the model is mainly created by proposing a multi-scale residual error structure (RES-MLP), proposing a 3D point cloud characteristic pyramid (3 DP-SSP) and integrating the model into an attention machine module SENet; training and testing a model, wherein the model is trained on a public data set S3DIS established by Stanford university, and verification is performed on a self-built substation point cloud data set SCP to realize the segmentation of the substation point cloud data; the method is mainly applied to scene modeling of the transformer substation and is applicable to mobile devices.

Description

Substation scene point cloud semantic segmentation method

Technical Field

The invention provides a transformer substation scene point cloud semantic segmentation method, and belongs to the technical field of computer vision and pattern recognition, deep learning and point cloud semantic segmentation.

Background

With the digital operation of power grids and the development of artificial intelligence technology, intelligent inspection of substations is gradually replacing artificial mode to become a main inspection mode. In the intelligent inspection process, the environment needs to be sensed, and the scene needs to be understood, so that an effective obstacle avoidance method is designed. The 3D semantic segmentation is used as a main link of scene understanding, and means that each point in the point cloud is endowed with a specific semantic label, each point cloud is endowed with a specific meaning so as to accurately describe the type of an object in a space, and the endowed semantic label has a specific meaning in the real world.

The main challenges of 3D point cloud semantic segmentation in the substation scenario are: (1) data collection is not easy: the transformer substation has complex scenes, more non-movable areas and limited data acquisition; and (2) data annotation is complex: the marking data set in the 3D point cloud is in a point unit, and the marking process consumes a long time; (3) difficulty in feature extraction: the 3D point cloud of the transformer substation is relatively sparse and has irregularity, and each point in the point cloud needs to be endowed with a specific semantic label in semantic segmentation, so that the difficulty is high.

Disclosure of Invention

The invention provides a transformer substation scene point cloud semantic segmentation method, which aims to solve the problems that a transformer substation scene is complex, the number of point clouds is numerous, the feature extraction is difficult, local features are ignored, shallow features are extracted in the feature extraction process, and discriminative features cannot be accurately extracted, so that under-segmentation or over-segmentation is easy to occur in a segmentation result, and the like.

In order to solve the technical problems, the invention adopts the technical scheme that: a transformer substation scene point cloud semantic segmentation method comprises the following steps:

step 1: establishing a transformer substation point cloud semantic segmentation data set;

step 2: point cloud data set labeling: adopting a point cloud marking tool to classify semantic categories under scenes into 6 types: the transformer comprises a 10kV transformer room outer wall, a transformer, a person, an enclosing wall, a ground and a fire-fighting sandbox;

and 3, step 3: constructing a transformer station point cloud semantic segmentation model: performing model optimization on the basis of PointNet, and constructing a depth model Seg-PointNet for semantic segmentation of the point cloud of the transformer substation;

the network structure of the Seg-PointNet model mainly comprises a sensing module RES-MLP of multi-scale residual errors, a 3D point cloud feature pyramid 3DP-SSP and an attention mechanism SENet;

assuming that the number of sampling points of the point cloud is set to be N, and carrying out dimension increasing on the N point clouds through a multi-scale sensing module RES-MLP based on a residual error structure; then extracting feature vectors of different scales through a space pyramid 3DP-SSP module of the three-dimensional point cloud, splicing the global features and the local features, and weighting the features by adopting an SEnet attention mechanism in the splicing process; finally, copying the spliced features to each point of the N points, and reducing the dimension through a multi-scale sensing module RES-MLP based on a residual error structure to finally obtain a semantic segmentation result of each point;

and 4, step 4: model training and testing: training and verifying the constructed Seg-PointNet model, and deploying the model on equipment to perform point cloud semantic segmentation.

The multi-scale perception module RES-MLP based on the residual structure comprises four residual multilayer perceptrons RES-MLP-1, RES-MLP-2, RES-MLP-3 and RES-MLP-4, and the depth of each residual multilayer perceptron is twice of the depth of an MLP in the original PointNet structure.

The RES-MLP-1 and the RES-MLP-2 are used for performing dimension increasing operation on the original point cloud data to obtain the features after dimension increasing;

the RES-MLP-1 raises the data dimension of the point cloud from N x 6, each point generates a 64-dimensional point cloud feature matrix, the N x 64-dimensional point cloud feature matrix is generated, and the RES-MLP-2 raises the data dimension of the point cloud from 64 dimensions to 1024 dimensions, and the N x 1024-dimensional point cloud feature matrix is generated.

The RES-MLP-3 and the RES-MLP-4 are used for copying the spliced features to each point cloud data in the N points to perform dimensionality reduction operation;

RES-MLP-3 reduces the dimension of the combined feature of the dimension N1088 to N128, RES-MLP-4 continues to reduce the dimension of the feature of the dimension N128 to m divided categories, and the final semantic division result is output.

And the RES-MLP-1 comprises two residual blocks, each residual block comprises two convolution layers, and the data dimension of the point cloud is subjected to dimensionality from N × 6 dimensionality to N × 64 dimensionality.

And the RES-MLP-2 comprises three residual blocks, each residual block comprises two convolution layers, and the data of the point cloud is subjected to dimensionality from N × 64 dimensionality to N × 1024 dimensionality.

And the RES-MLP-3 comprises three residual blocks, each residual block comprises two convolution layers, global features and local features are spliced into 1088-dimensional features for dimension reduction, and the dimension reduction is carried out to 128 dimensions.

The RES-MLP-4 comprises two residual blocks, each residual block comprises two convolution layers, and the 128-dimensional features subjected to dimension reduction are mapped to m semantic labels.

The 3DP-SSP adopts a multi-window pooling mode to obtain multi-dimensional local features, and the 3DP-SSP is expressed by the following formula:

in the above formula, W _n Window size for pyramid pooling, f represents MLP extracted features, g represents max pooling operation, con represents merging of multi-scale features.

Compared with the prior art, the invention has the beneficial effects that: the invention provides an improved PointNet model based on a multi-scale residual error structure and a 3D point cloud characteristic pyramid (3 DP-SSP) and named as a Seg-PointNet model, aiming at a semantic segmentation task of laser radar point cloud data in a complex scene of a transformer substation. The model introduces a multi-scale Residual Error Structure (RES) on the basis of PointNet, provides a sensing module (RES-MLP) based on the multi-scale residual error, fully excavates the features of different scales and improves the representation capability of the complex features; on the basis, a 3D point cloud characteristic pyramid module (3 DP-SSP) is introduced to represent the depth semantic characteristics of the transformer substation scene. The Seg-PointNet model is trained and tested on a public data set S3DIS constructed by Stanford university, and the trained model is verified on a self-built substation point cloud data set (SCP-dataset).

Drawings

The invention is further described below with reference to the accompanying drawings:

FIG. 1 is a network structure diagram of a Seg-PointNet model according to the present invention;

FIG. 2 is a block diagram of the RES-MLP module according to the present invention;

FIG. 3 is a block diagram of a 3DP-SSP module according to the present invention;

FIG. 4 is a graph comparing semantic segmentation results of the S3DIS data set using the Seg-PointNet model of the present invention and the conventional PointNet model.

Detailed Description

As shown in fig. 1 to 4, the invention provides a transformer substation point cloud semantic segmentation method based on Seg-PointNet, which is based on a PointNet network model and realizes 3D semantic segmentation of the transformer substation point cloud by optimizing the model, and has the core steps as follows:

step 1: and establishing a transformer substation point cloud semantic segmentation data set. The constructed substation point cloud data set adopts an Ouster OS-1-64 line laser radar to collect point cloud data in a mode of 1024 × 10, namely, each line has ten circles per second and each circle has 1024 points; there are 64 lines, i.e. 64 x 1024 points of a turn. The constructed Substation point cloud data set is named as (SCP dataset), and the SCP dataset scale is 320.

And 2, step: and marking the point cloud data set. The data annotation adopts a semantic-segmentation-editor point cloud annotation tool, and according to the problem to be solved by the invention, the semantic categories under the scene are divided into 6 types: the transformer substation comprises an outer wall of a 10kV transformer room, a transformer, a person, an enclosing wall, a ground and a fire-fighting sandbox.

And step 3: and constructing a transformer substation point cloud semantic segmentation model, namely a Seg-PointNet model. Based on the PointNet thought, model optimization is carried out, and a depth model Seg-PointNet suitable for transformer substation point cloud semantic segmentation is constructed.

In the process of semantic segmentation, pointNet ignores local features, most of the features are extracted in a shallow layer, and all the features are given the same importance degree, so that problems such as under-segmentation and over-segmentation occur. Aiming at the problems, the invention provides a Seg-PointNet model for performing semantic segmentation on complex transformer substation scenes.

The network structure of the Seg-PointNet model is shown in FIG. 1: the system mainly comprises a multi-scale residual error perception module (RES-MLP), a 3D point cloud feature pyramid (3 DP-SSP) and an attention mechanism SEnet.

Assuming that the number of sampling points of the point cloud is set to be N, the input dimension is 6.

Firstly, feature extraction is carried out by a residual-based multi-scale perception module (RES-MLP) provided by the invention, and 4 different RES-MLP structures (RES-MLP-1 to RES-MLP-4) are included. And obtaining N x 1024 dimensional output characteristics through RES-MLP-1 rising dimension to N x 64 dimensional characteristics and through RES-MLP-2 structure transformation. Extracting feature vectors of different scales through a space pyramid 3DP-SSP module of the three-dimensional point cloud, then splicing the global features and the local features, and weighting the features by adopting an SEnet attention mechanism in the splicing process. And finally, copying the spliced features to each point in the N points, reducing the dimension of the spliced features into 128-dimensional features through RES-MLP-3, and reducing the dimension of the spliced features through RES-MLP-4 to obtain a semantic segmentation result of each point.

The residual error-based multi-scale sensing (RES-MLP) module proposed by the present invention is further explained below.

Based on the idea of a residual error network, the invention designs a multi-scale perception module (RES-MLP) based on a residual error structure, wherein the RES-MLP means a residual error multi-layer perceptron and comprises RES-MLP-1 to RES-MLP-4, the depth of each RES-MLP is two times of the depth of an MLP in an original PointNet structure, the structures of RES-MLP-1 to RES-MLP-4 are shown in figure 2, RES-MLP-1 and RES-MLP-2 perform the dimension increasing operation of point cloud data, and RES-MLP-3 and RES-MLP-4 perform the dimension reducing operation of the point cloud data.

The RES-MLP-1 structure increases the data dimension of the point cloud from N x 6, each point generates a 64-dimensional point cloud feature matrix, and the N x 64-dimensional point cloud feature matrix is generated. RES-MLP-2 raises the data dimension of the point cloud from 64 dimensions to 1024 dimensions, and generates a point cloud feature matrix with N x 1024 dimensions. And according to the Seg-PointNet model structure, splicing the global features and the local features extracted by the pyramid, namely splicing the part N (64 + 1024) in the graph 1 into the combined features with the dimension of N + 1088. The ResMLP-3 module performs dimensionality reduction on the combined features of dimension N x 1088, with the dimension being reduced to dimension N x 128. And the ResMLP-4 structure continues to reduce the dimension of the N x 128-dimensional features to m segmentation categories, and outputs the final semantic segmentation result.

FIG. 2 (a) shows RES-MLP-1, which contains two residual blocks, each containing two convolutional layers. And (4) performing ascending dimension on the data dimension of the point cloud from N × 6 ascending dimension to N × 64, wherein the depth of the point cloud is twice as deep as the MLP of the original corresponding position. RES-MLP-1 can obtain more point cloud features than MLP in the original structure.

Fig. 2 (b) shows RES-MLP-2, which contains three residual blocks, each of which includes two convolutional layers. The method has the function of performing dimension increasing on the data of the point cloud from N × 64 dimension to N × 1024, wherein the depth of the data is twice the depth of an MLP (maximum layer processing) of an original corresponding position, and the depth is deeper than RES-MLP-1, so that more features are lost due to the maximum pooling, and the dimension of the features must be increased to a greater extent in the operation, and abundant feature information is obtained.

FIG. 2 (c) shows RES-MLP-3, which similarly contains three residual blocks, each including two convolutional layers. And performing dimension reduction on 1088-dimensional features obtained by splicing the global features and the local features, and reducing the dimension to 128 dimensions.

FIG. 2 (d) shows RES-MLP-4, which contains two residual blocks, each of which includes two convolutional layers. Its function is to map the 128-dimensional features subjected to dimension reduction to m semantic tags. Similarly, the network structure here is twice as many as the MLP layer in the original structure.

The principle of construction of the above network structure is that if there is a residual connection between the input and output, the neural network will train more easily and gradient explosion and gradient disappearance will not occur easily. Based on the viewpoint, the residual block is used for deepening the MLP of the PointNet, the deep semantic feature extraction capability of the deepened network becomes stronger, and therefore the network performance is better.

Taking the architecture of RESMLP-2 as an example, the overall performance of the network is analyzed, and when all the stacked layers of the residual block are set to 0, the output of the residual block is also 0. At this time, RES-MLP-2 becomes MLP, becoming a shallow neural network in the standard PointNet.

In the whole network, if the outputs of all the residual blocks are all zero, the deep layer network is converted into the shallow layer network, so that the flexibility of the whole network is higher. In the initial stage of the training process, all the residual blocks of the deep network participate in training, the deep network can obtain rich and accurate characteristics, when the network is close to stability, the weight of the residual blocks is gradually set to be 0, and the network is gradually converted into a shallow network.

The spatial pyramid (3 DP-SSP) module of the three-dimensional point cloud proposed by the present invention is further explained below.

The PointNet model obtains global features through pooling, but the model lacks extraction of local features, and the extraction of the local features plays an important role in the performance of the whole network and is influenced by a feature pyramid thought.

As shown in FIG. 3, 3DP-SSP adopts a multi-window pooling method to obtain multi-dimensional local features. Black bars represent Dim dimensional information of the point cloud, pyramids are represented by cones of different sizes, respectively, and the sizes of the pooling windows are

The pooling step size of each pooling window is equal to the size of the pooling window,

the window gets the characteristic W1 x Dim,

the window obtains the characteristic W ₂ *Dim，

The window obtains the characteristic W _n * Dim. Polymerizing the pooled features of different sizes by 3DP-SSP to obtain (W) ₁ +W ₂ +…W _n ) Dim dimensional features. The characteristics obtained by the 3DP-SSP module provided by the invention not only keep global characteristics, but also contain local characteristics with different scales.

3DP-SSP can be expressed as equation (1):

in the publication (1), W _n Window size for pyramid pooling, f represents features extracted by MLP, G represents maximum pooling operation, con represents merging of multi-scale features, G () represents the whole feature pyramid processing procedure, x _n Representing a characteristic semaphore.

(3) SENET attention mechanism

SENET is a convolutional structure proposed by the autopilot company Momenta for image recognition, also known as a compression and excitation module. The invention integrates the SENET module into the model and aims to improve the quality of generated features by explicitly modeling the dependency between convolution feature channels of the SENET module. The specific principle is to provide a mechanism to compress the features, and the compressed features are weighted to obtain new features with strong identification capability. The SEnet structure uses global information to emphasize important information features and suppress non-important features. The invention adopts SENET as an attention module of the network, the process of adding attention is shown as a black box column part in figure 1, and because the dimensions of the features are different, convolution is used in short connection to match the dimensions, and the short connection has the function of preventing the features from disappearing when the weight of the SENET module is 0.

And 4, step 4: and (5) training and testing the model. The Seg-PointNet model constructed based on the method is trained and verified, and is deployed in the mobile end embedded equipment for point cloud semantic segmentation.

Firstly, training is carried out on an S3DIS data set (6 regions, 271 rooms) constructed at Stanford university, the feasibility and the effectiveness of the proposed algorithm are verified, each point in the scene of the S3DIS data set is semantically labeled, the S3DIS data set is divided into 6 regions, namely Area1, area2, area3, area4, area5 and Area6 according to the building region, and the labeled categories respectively comprise 13 categories, namely a desk, a chair, a floor, a wall, a ceiling and the like. Each point is represented by a (X, Y, Z, R, G, B) six-dimensional vector, X, Y, Z are position channels of the point cloud, and RGB are color channels. And training and testing by adopting a cross validation method, wherein the cross validation method specifically comprises the steps of respectively setting 5 regions as a training set and setting the other 1 region as a testing set according to the sequence.

In order to verify the effectiveness of the algorithm provided by the invention, the algorithm and PointNet are trained on an S3DIS data set, and FIG. 4 shows the segmentation results of Seg-PointNet and PointNet under each semantic category, which are respectively the comparison of the segmentation results of conference rooms, offices, corridors and the like, wherein the segmentation of ceilings, floors, walls, tables, chairs, bookshelves and the like is more accurate. As can be seen from the effect diagram in FIG. 4, the Seg-PointNet network provided by the invention has better expression effect on semantic segmentation than PointNet and clearer segmentation result for each component.

It should be noted that, regarding the specific structure of the present invention, the connection relationship between the modules adopted in the present invention is determined and can be realized, except for the specific description in the embodiment, the specific connection relationship can bring the corresponding technical effect, and the technical problem proposed by the present invention is solved on the premise of not depending on the execution of the corresponding software program.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A transformer substation scene point cloud semantic segmentation method is characterized by comprising the following steps: the method comprises the following steps:

step 2: point cloud data set labeling: adopting a point cloud marking tool to divide semantic categories under scenes into 6 categories: the transformer comprises a 10kV transformer room outer wall, a transformer, a person, an enclosing wall, a ground and a fire-fighting sandbox;

and step 3: constructing a transformer station point cloud semantic segmentation model: performing model optimization on the basis of PointNet, and constructing a depth model Seg-PointNet for semantic segmentation of the point cloud of the transformer substation;

assuming that the number of sampling points of the point cloud is set to be N, and carrying out dimension increasing on the N point clouds through a multi-scale sensing module RES-MLP based on a residual error structure; then extracting feature vectors of different scales through a space pyramid 3DP-SSP module of the three-dimensional point cloud, then splicing the global features and the local features, and weighting the features by adopting an SEnet attention mechanism in the splicing process; finally, copying the spliced features to each point of the N points, and reducing the dimension through a multi-scale sensing module RES-MLP based on a residual error structure to finally obtain a semantic segmentation result of each point;

2. The transformer substation scene point cloud semantic segmentation method according to claim 1, characterized in that: the multi-scale perception module RES-MLP based on the residual structure comprises four residual multilayer perceptrons RES-MLP-1, RES-MLP-2, RES-MLP-3 and RES-MLP-4, and the depth of each residual multilayer perceptron is twice of the depth of an MLP in the original PointNet structure.

3. The substation scene point cloud semantic segmentation method according to claim 2, characterized in that: the RES-MLP-1 and the RES-MLP-2 are used for performing dimension increasing operation on the original point cloud data to obtain the features after dimension increasing;

4. The substation scene point cloud semantic segmentation method according to claim 3, characterized in that: the RES-MLP-3 and the RES-MLP-4 are used for copying the spliced features to each point cloud data in the N points to perform dimensionality reduction operation;

and RES-MLP-3 performs dimensionality reduction on the combined features with dimensions of N x 1088, the dimensions are reduced to N x 128, RES-MLP-4 continues to perform dimensionality reduction on the features with dimensions of N x 128, and final semantic segmentation results are output from m segmentation categories.

5. The substation scene point cloud semantic segmentation method according to claim 2, characterized in that: the RES-MLP-1 comprises two residual blocks, each residual block comprises two convolution layers, and the data dimension of the point cloud is subjected to dimension increasing from N x 6 dimension to N x 64 dimension.

6. The transformer substation scene point cloud semantic segmentation method according to claim 5, characterized in that: and the RES-MLP-2 comprises three residual blocks, each residual block comprises two convolution layers, and the data of the point cloud is subjected to dimension increasing from N × 64 dimension to N × 1024.

7. The substation scene point cloud semantic segmentation method according to claim 6, characterized in that: and the RES-MLP-3 comprises three residual blocks, each residual block comprises two convolution layers, global features and local features are spliced into 1088-dimensional features for dimension reduction, and the dimension reduction is carried out to 128 dimensions.

8. The substation scene point cloud semantic segmentation method according to claim 7, characterized in that: the RES-MLP-4 comprises two residual blocks, each residual block comprises two convolution layers, and the 128-dimensional features subjected to dimension reduction are mapped to m semantic labels.

9. The substation scene point cloud semantic segmentation method according to claim 1, characterized in that: the 3DP-SSP adopts a multi-window pooling mode to obtain multi-dimensional local features, and the 3DP-SSP is expressed by the following formula:

；