CN113256640B

CN113256640B - Method and device for partitioning network point cloud and generating virtual environment based on PointNet

Info

Publication number: CN113256640B
Application number: CN202110603532.4A
Authority: CN
Inventors: 姚寿文; 兰泽令; 王瑀; 栗丽辉; 常富祥
Original assignee: Beijing Institute of Technology BIT
Current assignee: Beijing Institute of Technology BIT
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-05-24
Anticipated expiration: 2041-05-31
Also published as: CN113256640A; WO2022252274A1

Abstract

The invention discloses a method and a device for partitioning network point cloud and generating virtual environment based on PointNet, comprising the following steps: acquiring a point cloud in a data set to be processed in a virtual environment; performing point cloud semantic segmentation on the point cloud by adopting an improved PointNet network; and replacing the object with a virtual model with physical attributes in the virtual environment according to the point cloud after semantic segmentation, and generating a virtual object containing all the physical attributes. By adopting the technical scheme of the invention, the problems that real-time data transmission and environment reconstruction are difficult due to the huge number of point clouds, and an operator is difficult to distinguish objects in the environment in the point cloud environment are solved.

Description

Point cloud segmentation and virtual environment generation method and device based on PointNet

Technical Field

The invention belongs to the technical field of virtual presentation, and particularly relates to a method and a device for network point cloud segmentation and virtual environment generation based on PointNet.

Background

With the development of sensor technology, sensors based on laser radars, depth cameras and the like are widely applied to the fields of automatic driving, remote operation, virtual reality and the like. Since three-dimensional depth information enables capturing depth information of an environment, rendering of the environment based on three-dimensional point cloud data is of great help to an operator in understanding the surroundings (e.g., vehicle surroundings). The perception capability of an operator to the environment can be improved by reconstructing the three-dimensional environment through the point cloud, but the real-time data transmission and the environment reconstruction are difficult due to the huge number of the point cloud, and in the point cloud environment, the operator can possibly distinguish objects in the environment difficultly.

Disclosure of Invention

Aiming at the technical problems, the invention provides a method and a device for partitioning a network point cloud and generating a virtual environment based on PointNet.

In order to achieve the purpose, the invention adopts the following technical scheme:

a method for partitioning and generating a virtual environment based on a PointNet network point cloud comprises the following steps:

step S1, point clouds in a data set to be processed in a virtual environment are obtained;

step S2, performing point cloud semantic segmentation on the point cloud by adopting an improved PointNet network;

step S3, replacing the object in the virtual environment with a virtual model with physical attributes according to the point cloud after semantic segmentation, and generating a virtual object containing all physical attributes.

Preferably, the data set is six indoor scenes in three buildings, and has eleven room types, namely, a conference room, a rest room, an auditorium, a toilet, a copying room, a rest room, a storage room, a corridor, a storage room, an office, a hall and an open space; the semantic categories of the data set are respectively ceiling, chair, door, floor, table, wall, beam, column, window, sofa, bookshelf, wood board and sundries; the point cloud in the dataset contains coordinate position information XYZ and color information RGB.

Preferably, the structure of the PointNet network is as shown in fig. 2, and includes: the system comprises a first T-Net layer, a second T-Net layer, a plurality of perceptrons MLPs and a feature fusion layer.

Preferably, the improved PointNet network structure is divided into two parts of feature extraction and cloud semantic segmentation, wherein,

the characteristic extraction process comprises the following steps: the global feature extraction is realized by extracting the point cloud local features, and the method specifically comprises the following steps: acquiring d-dimensional characteristics of n point clouds, wherein the d-dimensional characteristics comprise space coordinate values, color information and point normal information; the local features of the point cloud are classified and learned continuously, and finally, the global features are extracted through maximum pooling;

the point cloud semantic segmentation process comprises the following steps: and splicing the local features and the global features, performing dimension reduction processing through a plurality of layers of MLPs, and finally predicting the category of the point cloud so as to realize the segmentation of the point cloud.

Preferably, the point cloud semantic segmentation realized by the improved PointNet network structure comprises the following steps:

step 21, aligning the positions of the point clouds in the data set to be processed through a first T-Net layer;

step 22, increasing the dimensionality of the local features of the point cloud from 3 dimensions to 64 dimensions through MLP;

step 23, performing feature alignment on the point cloud through a second T-Net layer;

step 24, increasing the dimensionality of the local features of the point cloud from 64 dimensionality to 128 dimensionality through MLP, and then increasing the dimensionality to 1024 dimensionality;

step 25, performing pooling treatment on the point cloud through a maximum value symmetric function to obtain point cloud global characteristics;

step 26, splicing the global characteristics of the point cloud and the local characteristics of the point cloud through a characteristic fusion layer;

and 27, performing dimensionality reduction on the spliced point cloud features through MLP (Multi-level processing), and realizing semantic segmentation of the point cloud.

Preferably, the process of extracting the local features of the point cloud is as follows: acquiring the spatial position information of the point cloud, the number n and the dimension d of the point cloud; sampling the point clouds at the farthest points, and indexing the obtained multiple center point clouds to obtain the position information and the number of the point clouds; and grouping all the point clouds by using the center point cloud as a circle center through a ball query method, and extracting local features of the point clouds.

Preferably, the farthest point sampling of the point cloud is specifically as follows: randomly initializing a point as a farthest point, after obtaining the space position coordinates, comparing Euclidean distances between all the remaining points and the current point, obtaining the coordinates and the distances of the farthest point, storing the distance values into a distance matrix, then taking the obtained points as query points, calculating the distance from each remaining point to the current point, obtaining the maximum value, and repeating the steps until i farthest points are sampled.

Preferably, the grouping of the point clouds by the ball query method specifically comprises: calculating Euclidean distances L between S central points and all points determined after sampling, setting a distance threshold value R, selecting point clouds in a spherical area with the distance R from the central points, and if L is less than R²If the value of M is smaller than the required point cloud number NS, the point with the maximum distance is taken, NS-M points are supplemented, the required point cloud number is met, and then feature extraction is carried out.

The invention provides a device for partitioning network point cloud and generating virtual environment based on PointNet, which comprises:

the acquisition module is used for acquiring a point cloud of a data set to be processed in a virtual environment;

the segmentation module is used for performing point cloud semantic segmentation on the point cloud by adopting a PointNet network;

and the generation module is used for replacing the object with a virtual model with physical attributes in the virtual environment according to the segmented point cloud and generating a virtual object containing all the physical attributes.

The invention designs a point cloud segmentation processing algorithm based on a PointNet neural network model, and takes the extraction of local point cloud characteristics into consideration so as to realize the semantic segmentation processing of the point cloud. And expanding the point cloud data set, training the designed neural network model on the expanded data set, analyzing the segmentation result, and transmitting the point cloud segmentation result to a virtual environment to realize the model generation of the target object corresponding to the point cloud in the virtual environment. The method and the device solve the problems that real-time data transmission and environment reconstruction are difficult due to the large amount of point clouds, and an operator possibly distinguishes objects in the environment in the point cloud environment.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flow chart of a method for partitioning a network point cloud and generating a virtual environment based on PointNet;

FIG. 2 is a schematic diagram of a PointNet network;

FIG. 3 is a schematic diagram of a T-Net transformation flow;

FIG. 4 is a schematic structural framework diagram of a PointNet network;

FIG. 5 is a schematic view of a process of local point cloud feature extraction;

FIG. 6 is a schematic view of a farthest point sampling flow;

FIG. 7 is a schematic view of a ball inquiry flow;

FIG. 8 is a schematic diagram of local point cloud feature extraction;

fig. 9 is a schematic diagram of generation of a model of a target object (indoor scene) in a virtual environment.

Detailed Description

The present invention will be described in detail with reference to the following embodiments, wherein like or similar elements are designated by like reference numerals throughout the several views, and wherein the shape, thickness or height of the various elements may be expanded or reduced in practice. The examples are given solely for the purpose of illustration and are not intended to limit the scope of the invention. Any obvious modifications or variations can be made to the present invention without departing from the spirit or scope of the present invention.

As shown in fig. 1, the invention provides a point cloud segmentation and virtual environment generation method based on a PointNet network, comprising:

step S2, carrying out point cloud semantic segmentation on the point cloud by adopting an improved PointNet network;

step S3, replacing the object with a virtual model with physical attributes in the virtual environment according to the point cloud after semantic segmentation, and generating a virtual object containing all the physical attributes.

Further, the data set is six different large indoor scenes in three buildings, and has eleven room types, namely, a conference room, a rest room, an auditorium, a toilet, a copying room, a rest room, a storage room, a corridor, a storage room, an office, a hall and an open space. The semantic categories of the data set are thirteen types, namely ceilings, chairs, doors, floors, tables, walls, beams, columns, windows, sofas, bookshelves, wood boards and sundries. The point cloud in the dataset contains coordinate position information XYZ and color information RGB.

Further, in step S2, the configuration of the PointNet network is as shown in fig. 2, and includes: a first T-Net layer, a second T-Net layer, a plurality of perceptrons (MLPs), and a feature fusion layer.

The method for realizing point cloud semantic segmentation through the PointNet network comprises the following steps:

In the PointNet network structure, two T-Net layers are provided, namely a first T-Net layer and a second T-Net layer, and after the first T-Net layer is positioned after point cloud input, the position of the input point cloud is aligned, so that the point cloud has rigid invariance. The second T-Net layer is located after the first MLP and performs feature alignment on the point cloud after the MLP. First, a Transformation matrix (Transformation matrix) is obtained through a T-Net layer, and is matrix-multiplied with an input point cloud matrix set, so as to obtain transformed alignment data, as shown in fig. 3, where d is a dimension of the point cloud data. For the first T-Net layer, the input data is a point cloud space matrix, the output is the aligned point cloud space matrix, and when the input point cloud only contains space position information, d is 3; if color information is included, d is 6, and the matrix of the first T-Net layer is a 3 × 3 or 6 × 6 matrix. For the second T-Net network, input data is a high-dimensional point cloud feature matrix passing through an MLP, output is an aligned point cloud feature matrix, the dimensionality of the point cloud is changed from 3 to 64 due to the fact that the point cloud passes through the MLP structure of the multi-layer perceptron, and the matrix of the first T-Net layer is a 64 x 64 matrix. T-Net is a small-sized PointNet network structure actually, and the internal network structure of the T-Net is the same as that of PointNet, and only the final output result is a transformation matrix, so that the subsequent operation process is facilitated.

The MLP in PointNet has the function of performing dimension-increasing processing on point cloud data so as to ensure the maximum number of point cloud characteristics in the subsequent point cloud processing and segmentation as far as possible. Unlike the traditional MLP layer structure, the weights of all neurons in the MLP layer in the PointNet network are the same value. Because the point clouds have disorder, for a point cloud with n points, the input modes of the point cloud have n arrangements, and therefore the output results of the model for the n arrangements are required to be the same. The PointNet network realizes the function through a maximum value symmetric function, the output result is the characteristics of point cloud, as shown in the following formula,

f({x₁,…x_n})≈g(h(x₁),…h(x_n))

wherein f represents a function for extracting features, h represents a feature extraction layer of each layer of MLP, and g is a maximum value symmetric function.

After feature extraction, the PointNet network obtains a 1024-dimensional global feature, in order to obtain higher point cloud segmentation accuracy, the feature fusion layer carries out splicing fusion on the global feature and an appointed 64-dimensional point feature so as to obtain a new feature based on local and global features, and then prediction of point cloud categories is obtained through a plurality of layers of MLP networks.

As shown in fig. 4, the PointNet network structure is divided into feature extraction and point cloud semantic segmentation. The feature extraction is a process of extracting global features by extracting local features of point clouds, and d-dimensional features of n point clouds are input, wherein the d-dimensional features are original features of the point clouds and comprise space coordinate values, color information and normal line information of the points; the extraction of global features is finally obtained through maximum pooling processing by continuously classifying and learning the local features of the point cloud. And (3) performing point cloud semantic segmentation, namely splicing intermediate features (local features) in feature extraction with finally obtained global features, performing dimension reduction processing through a plurality of layers of MLPs, and finally predicting the category of the point cloud so as to realize the segmentation of the point cloud.

As shown in fig. 5, after obtaining spatial position information xyz and original information of the input point cloud, such as the number n and the dimension d of the point cloud, the point cloud is sampled at the farthest point, a plurality of obtained center point clouds are indexed to obtain position information new _ xyz and the number new _ n of the point cloud, then all the input point clouds are grouped by using the obtained center point clouds as the center of a circle through a ball query method, and local point cloud features are extracted to obtain nSample feature information, and when new feature dimensions appear, the features are spliced to retain the features of the point cloud as much as possible.

The input point clouds are first grouped, i.e. sampled, the most distant point sampling covers the whole set of points better than the random sampling. Therefore, the center points of the point clouds are selected by sampling the Farthest Points (FPS), and the number of the finally obtained center points is the grouping number, so that the points in the point clouds are connected with one another to a certain extent. The specific method includes firstly randomly selecting a center i from the whole point set as a farthest point and obtaining a coordinate value of the point, then comparing Euclidean distances from all the points to the center point to obtain a point with a maximum distance, storing the obtained point into a distance matrix, comparing distances between all point clouds and the points stored in the matrix, updating the points in the matrix if the distance between a certain point is smaller than the distance between the point stored in the matrix to ensure that the point stored in the matrix is the closest value of the distance between each point and a sampling point, then selecting the point with the maximum distance again, and iterating again until the target point is collected, as shown in FIG. 6.

In the input point cloud, a point is initialized randomly as a farthest point, after the space position coordinates are obtained, Euclidean distances between all the remaining points and the current point are compared, the coordinates and the distances of the points with the farthest distances are obtained, the distance values are stored in a distance matrix, then the obtained points are used as query points, the distance between each remaining point and the current point is calculated, the maximum value is obtained, and the steps are repeated until i farthest points are sampled.

After the selection of the number of the central points is completed, the point cloud grouping selection process in each group needs to be completed to determine the number of the point clouds contained in each group, and two methods of K nearest neighbor search and ball query mainly exist. The K neighbor search is a machine learning method, K number of adjacent points can be found according to the method to complete point cloud grouping, and the ball query is to search points around the query points according to a set radius range on the premise of setting the upper limit of the query points. Compared with the K neighbor search method, the local neighborhood of the ball query ensures the area size of the fixed range, so that the characteristics of the local area have more universality in the whole space, and therefore, the invention adopts the ball query mode to group the point clouds, as shown in fig. 7. For the input point cloud, calculating Euclidean distances L between S central points and all points determined after sampling, setting a distance threshold value R, selecting the point cloud in a spherical area with the distance R from the central point, and if L is less than R²If the value of M is smaller than the required point cloud number NS, the point with the maximum distance is taken, NS-M points are supplemented to meet the required point cloud number, and then feature extraction is carried out.

The global feature extraction of the input point cloud data is an important step when point cloud segmentation is carried out, after the grouping of point clouds is completed through farthest point sampling and ball query in the previous section, the global feature of local point clouds needs to be calculated, and the points obtained after grouping are regrouped and then learned, so that the global feature extraction of all input point clouds is realized.

In the process, the feature extraction of the grouped local point cloud data is a key step. The input in the network is all point cloud information of each group after grouping, namely each grouped point cloud is seen as a whole, global features of each group of point clouds are extracted, the steps are the same as the steps of extracting the features of the whole point cloud in a PointNet structure, therefore, the invention uses the idea of extracting the features of local point clouds in the PointNet network structure for reference, and the method specifically comprises the following steps:

order to

f is a continuous function of distance for any point cloud feature on χ → R, for

Any one continuous function h and a symmetric function g (x)₁,x₂,x₃,…,x_n) Make a pair

The method comprises the following steps of (1) preparing,

|f(S)-γ(MAX{h(x_i)})|＜ε

wherein x is₁,x₂,x₃,…,x_nIs all elements in S, γ is a continuous function, MAX indicates to perform MAX firing operation, i.e. to input n vectors and output a new vector with maximum value for each element.

In the PointNet network structure, the continuous function h is fitted by the multilayer perceptron MLP, and the gamma function is the activation function, as shown in fig. 8. The input number is three-dimensional coordinate information (x, y, z) of the point cloud, then the input point cloud is converted from three dimensions to high dimensions through MLP, and then the local point cloud features are extracted through processing of a maximum symmetric function g and a gamma activation function.

The semantic segmentation is to segment the object class corresponding to each point cloud so as to distinguish the objects in the point cloud environment. In semantic segmentation; firstly, performing dimensionality reduction on the obtained point cloud global features by using a multilayer perceptron (MLP); then, classifying the point cloud through a softmax function to obtain the probability score of each point in each category; and finally, carrying out label classification to realize semantic segmentation processing of the point cloud.

Further, in step S3, since the huge data amount of the point cloud will increase the burden of data transmission and processing, after the point cloud is segmented and identified by using the improved PointNet network, the object is replaced by a virtual model with physical properties in the virtual environment according to the point cloud data, so as to better represent the surrounding environment, as shown in fig. 9.

The invention also provides a device for partitioning the point cloud based on the PointNet and generating the virtual environment, which realizes the method for partitioning the point cloud based on the PointNet and generating the virtual environment, and comprises the following steps:

and the generating module is used for replacing the object with a virtual model with physical attributes in the virtual environment according to the segmented point cloud, and generating a virtual object containing all the physical attributes.

Further, the data set is six indoor scenes in three buildings, and has eleven room types, namely, a conference room, a rest room, an auditorium, a toilet, a copying room, a rest room, a storage room, a corridor, a storage room, an office, a hall and an open space; the semantic categories of the data set are respectively ceiling, chair, door, floor, table, wall, beam, column, window, sofa, bookshelf, wood board and sundries; the point cloud in the dataset contains coordinate position information XYZ and color information RGB.

The invention has the following beneficial effects:

(1) the invention solves the problems that real-time data transmission and environment reconstruction are difficult due to the huge number of point clouds, and an operator is difficult to distinguish objects in the environment in the point cloud environment.

(2) Aiming at the difficulty and the challenge of point cloud processing, a point cloud segmentation network model for deep learning is designed based on a PointNet network structure model, and the acquired point cloud data are directly segmented without being converted before being processed. Firstly, the point cloud is subjected to grouping sampling processing, then local features of the point cloud are obtained, then global feature extraction is carried out on the point cloud, finally semantic segmentation processing of the point cloud is achieved, and the processing precision of deep learning point cloud segmentation is improved by training a network model.

(3) In order to contain more object categories, the data set is expanded by the method, and the object categories are enriched. Meanwhile, an object model presenting method based on a point cloud segmentation result in a virtual environment is researched, the segmented object type and the corresponding space coordinate are transmitted to the virtual environment, the established model base in the virtual environment is called, the object model corresponding to the point cloud is presented to the virtual environment, and the display mode of the object model in the virtual environment is realized.

It should be understood that although the present description refers to embodiments, not every embodiment contains only a single technical solution, and such description is for clarity only, and those skilled in the art should take the description as a whole, and the technical solutions in the embodiments may be appropriately combined to form other embodiments understood by those skilled in the art.

Claims

1. A method for partitioning and generating a virtual environment based on a PointNet network point cloud is characterized by comprising the following steps:

step S3, replacing the object with a virtual model with physical attributes in the virtual environment according to the point cloud after semantic segmentation, and generating a virtual object containing all the physical attributes;

wherein, the PointNet network includes: the system comprises a first T-Net layer, a second T-Net layer, a plurality of perceptrons MLP and a feature fusion layer; the PointNet network structure is divided into two parts of feature extraction and cloud semantic segmentation;

the characteristic extraction process comprises the following steps: the global feature extraction is realized by extracting the point cloud local features, and the method specifically comprises the following steps: acquiring d-dimensional characteristics of n point clouds, wherein the d-dimensional characteristics comprise space coordinate values, color information and point normal information; the method comprises the steps of continuously classifying and learning local features of point cloud, and obtaining extraction of global features through maximum pooling;

the process of extracting the local features of the point cloud comprises the following steps: acquiring the spatial position information of the point cloud, the number n and the dimension d of the point cloud; sampling the point clouds at the farthest points, and indexing the obtained multiple center point clouds to obtain the position information and the number of the point clouds; grouping all point clouds by using the center point cloud as a circle center through a ball query method, and extracting local features of the point clouds;

the process of extracting the local features of the point cloud comprises the following steps:

let χ { [ S: S ∈ [0, 1 ]]Where S | ═ n }, f is a continuous function of distance to any point cloud feature on χ → R, and f is a continuous function of distance to any point cloud feature on χ → R

There is an arbitrary continuous function h and a symmetric function g (x)₁,x₂,x₃,…,x_n) Make a pair

With | f (S) - γ (MAX { h (x))_i)})|＜ε，

Wherein x is₁,x₂,x₃,…,x_nIs all elements in S, gamma is a continuous function, and MAX represents MAX firing operation, i.e. n vectors are input and one is outputThe largest new vector per element;

fitting the continuous function h through a multi-layer perceptron MLP, wherein a gamma function is an activation function, the input number is three-dimensional coordinate information (x, y, z) of the point cloud, converting the input point cloud from three dimensions to high dimensions through the MLP, and then processing through a maximum symmetric function g and the gamma activation function, so as to extract local point cloud characteristics;

the point cloud semantic segmentation process comprises the following steps: splicing the local features and the global features, performing dimensionality reduction processing through multiple layers of MLPs, and finally predicting the category of the point cloud so as to realize the segmentation of the point cloud; the point cloud semantic segmentation comprises the following steps:

step 23, performing feature alignment on the point cloud space through a second T-Net layer;

step 27, performing dimensionality reduction processing on the spliced point cloud features through MLP (Multi-level processing), and realizing semantic segmentation of the point cloud;

in the process of performing dimension-increasing processing on the point cloud local features through the MLP, the dimension-increasing processing is realized through a maximum symmetric function, as shown in the following formula:

f({x₁,x₂,x₃,…,x_n})≈g(h(x₁),…,h(x_n))

wherein f represents a function for extracting features, h represents a feature extraction layer of each layer of MLP, and g is a maximum value symmetric function;

the sampling of the farthest point of the point cloud is specifically as follows: randomly initializing a point as a farthest point, after obtaining a space position coordinate of the point, comparing Euclidean distances between all the remaining points and a current point, obtaining the coordinate and the distance of the farthest point, storing a distance value into a distance matrix, then taking the obtained point as a query point, calculating the distance from each remaining point to the current point, obtaining a maximum value, and repeating the steps until i farthest points are sampled;

the method for searching the ball groups the point clouds specifically as follows: calculating Euclidean distances L between S central points and all points determined after sampling, setting a distance threshold value R, selecting point clouds in a spherical area with the distance R from the central points, and if L is less than R²If the value of M is smaller than the required point cloud number NS, the point with the maximum distance is taken, NS-M points are supplemented to meet the required point cloud number, and then feature extraction is carried out.

2. The PointNet based network point cloud segmentation and virtual environment generation method of claim 1, wherein the data set is six indoor scenes of three buildings, with eleven room types, respectively being conference room, rest room, auditorium, toilet, copy room, storage room, corridor, storage room, office, lobby and open space; the semantic categories of the data set are respectively ceiling, chair, door, floor, table, wall, beam, column, window, sofa, bookshelf, wood board and sundries; the point cloud in the dataset contains coordinate position information XYZ and color information RGB.

3. A PointNet-based network point cloud segmentation and virtual environment generation device, wherein the generation device is configured to implement the generation method according to any one of claims 1 to 2, and the generation device includes:

the segmentation module is used for performing point cloud semantic segmentation on the point cloud by adopting an improved PointNet network;

the generating module is used for replacing the object with a virtual model with physical attributes in the virtual environment according to the segmented point cloud, and generating a virtual object containing all the physical attributes;

the data set comprises six indoor scenes in three buildings, eleven room types are provided in total, and the ten room types are respectively a conference room, a rest room, an auditorium, a toilet, a copying room, a storage room, a corridor, a storage room, an office, a hall and an open space; the semantic categories of the data set are respectively ceiling, chair, door, floor, table, wall, beam, column, window, sofa, bookshelf, wood board and sundries; the point cloud in the dataset contains coordinate position information XYZ and color information RGB.