CN112785694A

CN112785694A - BIM three-dimensional reconstruction method based on deep learning

Info

Publication number: CN112785694A
Application number: CN202110160200.3A
Authority: CN
Inventors: 姚鸿方
Original assignee: Ximengtek Chongqing Industrial Development Co ltd
Current assignee: Ximengtek Chongqing Industrial Development Co ltd
Priority date: 2021-02-05
Filing date: 2021-02-05
Publication date: 2021-05-11

Abstract

The invention provides a BIM three-dimensional reconstruction method based on deep learning, which comprises the following steps: s1, acquiring point cloud data of an indoor building scene to be identified by using a scanner, S2, designing a PA-Net network adopted by the invention, wherein the network mainly performs feature extraction on the point cloud data and performs joint segmentation on a feature matrix, S3, training a deep learning framework PA-Net provided by the invention by using a training data set to obtain a trained model, S4, inputting the point cloud data of the indoor building scene to be identified, acquired by S1, into the PA-Net model trained by S3, outputting to obtain an instance segmentation matrix, namely the point cloud instance segmentation matrix containing a specific label, S5, reversely generating a BIM model by using an IFC standard on the instance segmentation matrix containing semantic information acquired by S4; the invention can reversely generate the BIM after the point cloud data is collected, and assists in building planning and building repair.

Description

BIM three-dimensional reconstruction method based on deep learning

Technical Field

The invention relates to the field of BIM model reconstruction, in particular to a BIM three-dimensional reconstruction method based on deep learning.

Background

Deep learning is now successful in many areas, such as natural language processing, computer vision, speech recognition, etc. The deep learning is widely applied to computer vision, and can be used for object recognition, image classification, semantic segmentation, instance segmentation and the like. The Building Information Model (BIM) technology is firstly proposed by Autodesk company in 2002, has been approved in the world, can help to realize the integration of Building Information, and all kinds of Information are always integrated in a three-dimensional Model Information database from the design, construction and operation of a Building to the end of the whole life cycle of the Building, and all working units work cooperatively, thereby effectively improving the engineering implementation efficiency, saving resources and reducing the cost. For some buildings in the ages or buildings to be repaired urgently, a proper BIM model cannot be obtained from the construction drawing or no construction drawing is available. Aiming at the problem, in order to obtain a BIM model with strong applicability more quickly and better, firstly, some quick scanning equipment is used for collecting point cloud data of a building to be detected, and then a 3D building model is generated from the point cloud data. However, the process is still a task requiring a large amount of manual work, and therefore, the invention provides a method for realizing conversion from point cloud data to a 3D model based on a deep learning method.

Disclosure of Invention

The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides a BIM three-dimensional reconstruction method based on deep learning.

In order to achieve the above object, the present invention provides a BIM three-dimensional reconstruction method based on deep learning, including the following steps:

s1, acquiring point cloud data of the indoor building scene to be identified by adopting a scanner;

s2, constructing the PA-Net network, and extracting the point cloud data and jointly dividing the feature matrix by the network;

s3, training the PA-Net network by adopting a training data set to obtain a trained model;

s4, inputting the point cloud data of the indoor building scene to be identified, which is acquired in S1, into the PA-Net model trained in S3, and outputting to obtain an instance partition matrix, namely the point cloud instance partition matrix containing specific labels;

and S5, reversely generating the BIM by utilizing the IFC standard on the segmentation matrix containing the semantic information example obtained in the step S4.

Further, the S2 includes:

the PA-Net network comprises a point cloud processing network PointNet and a joint segmentation module ASIS,

s2-1, designing and building a point cloud processing network PointNet, wherein the network module is used for completing feature extraction of point cloud data;

s2-2, designing a joint segmentation module ASIS to complete semantic segmentation and instance segmentation of the feature matrix extracted by the PointNet.

Further, the S3 includes:

the joint division network system consists of 3 parts: firstly, a main network PointNet in charge of point cloud feature extraction; a joint segmentation module which is responsible for fusing examples and semantic information, namely an ASIS module; the classifier responsible for example segmentation;

input point cloud matrix

Only coordinate information (x, y, z) and color information (R, G, B) of each point are included,

the size of the matrix dimension is represented as Nx 6, and N represents the number of all point cloud data;

and a main network of the PA-Net network framework is responsible for performing primary feature extraction on the input point cloud and obtaining a feature matrix.

Further, the PointNet comprises a PointNet network structure:

order to

Is the output vector of the k-1 layer of a fully connected FC network, l _k-1,11 st neuron, l, of layer k-1 representing a fully connected network_k-1,2Representing the 2 nd neuron at layer k-1 of the fully-connected network,

represents the output of the nth neuron at layer k-1 of the fully-connected network, so the output of each neuron on the k-layer neural network is:

wherein N is_k-1The number of the neurons on the hidden layer of the k-1 layer is shown,

W_k(i, j) weight matrix W representing the k-th hidden layer_kThe ith row and the jth column of (1),

l_k-1(j) the jth column offset representing the k-1 th layer,

b_k(i) bias vector b representing the k-th hidden layer_kThe number of the ith row of (a),

W_k(i) weight matrix W representing the k-th hidden layer_kThe number of the ith row of (a),

l_k-1indicating the bias of the k-1 th layer i.e. the input to the k-1 th layer network,

(·)^Tthe transpose of the matrix is represented,

N_kindicates the number of neurons on the k-th hidden layer,

W_kand b_kWeight matrix and bias vector for the k-th hidden layer respectively,

the output of the k-th hidden layer is then:

l_k＝W_kl_k-1+b_k，

wherein, W_kAnd b_kRespectively is a weight matrix and a bias vector of a k-th hidden layer;

l_k-1is the input of the k-1 layer network;

will find l_kAs input to the next network layer;

the 2D convolutional layer network consists of a plurality of convolutional filters, each filter processes data on different channels, and the data are subjected to convolutional summation through a sliding window;

order to

Is output data of the k-1 layer of CNN, wherein

The output of the k-1 th layer is shown,

representing the 1 st channel output of the k-1 th layer output matrix,

represents the 2 nd channel output of the k-1 th layer output matrix,

c for representing k-1 layer output matrix_k-1The output of each channel is used as the output,

dimension size H representing k-1 layer output_k-1×W_k-1×C_k-1，

Represents the output of the c-th channel of the k-1 th layer output matrix,

dimension size H representing k-1 layer output_k-1×W_k-1，H_k-1And W_k-1Respectively representing the number of rows and columns of the matrix, C_k-1Representing the number of channels output by the k-1 layer convolutional neural network; then, the output of the ith row and the jth column on the c channel of the kth convolutional neural network is:

wherein,

and

respectively representing the weight tensor and bias vector of the k-th layer,

representing a vector dimension of C_k，

C_kRepresents the number of channels output by the k-th convolutional neural network,

C_k-1represents the number of channels output by the k-1 layer convolutional neural network,

m denotes the total number of rows and the total number of columns of the convolution filter,

represents the c-th channel, z represents the incremental number of channels for the k-1 layer, the weight tensor for the k-th layer,

C_Kdenotes the number of channels, C_K＝1...C_k-1，

m represents the number of rows of the convolution filter,

n represents the number of columns of the product filter,

is the output of the k-1 th layer of the z-channel CNN,

representing the bias vector for the c-th channel of the k-th layer,

s is the step size of the convolution filter sliding over the input data,

therefore, we can obtain the output matrix on the c channel on the k convolutional neural network as:

wherein,

represents the output of row 1 and column 1 on the c channel on the k convolutional neural network,

represents the output of row 2 and column 1 on the c channel on the k convolutional neural network,

indicating the c channel on the k convolutional neural network

The output of column 1 of the row,

represents the output of row 1 and column 2 on the c channel on the k convolutional neural network,

represents the output of row 2 and column 2 on the c channel on the k convolutional neural network,

indicating the c channel on the k convolutional neural network

The output of column 2 of the row,

represents line 1, on the c channel on the k convolutional neural network

The output of the column is then,

represents line 2, on the c channel on the k convolutional neural network

The output of the column is then,

indicating the c channel on the k convolutional neural network

Go to the first

The output of the column is then,

H_k-1and W_k-1Respectively representing the number of rows and the number of columns of the matrix,

s is the step size of the convolution filter sliding over the input data,

therefore, the output data of the k-th CNN layer is:

wherein,

representing the 1 st channel output of the k-th layer output matrix,

represents the 2 nd channel output of the k-th layer output matrix,

c for k output matrix_kThe output of each channel is used as the output,

representing the size of a matrix dimensionIs composed of

s is the step size of the convolution filter sliding over the input data,

C_krepresenting the number of channels output by the k layer convolutional neural network;

similarly, in order to perform nonlinear transformation on data, each layer of the convolutional neural network needs to be followed by an activation function, so that the transformation formula of each layer of the convolutional neural network can be abbreviated as follows:

wherein,

as output data of the k-th layer of the CNN,

and

respectively representing the weight tensor and bias vector of the k-th layer,

is the output of the k-1 layer of CNN;

f (·) is the ReLu activation function f (x) max (0, x).

Further, the joint partitioning comprises joint partitioning an ASIS modular structure:

ASIS is composed of 2 paths, where in1 and in2 paths are both the outputs of the network of PointNet, i.e.

U is an output characteristic matrix and is matched with the output characteristic matrix; in1 is a semantic-aware example partition, the semantic feature matrix is first passed through a full-link layer with a ReLU activation function, and at this time, the semantic feature matrix of in2 is added to in1, which can be described as: in1+ FC (in2), FC (·) is fully connected FC operation; then outputting the data as out1 through a full connection layer, wherein the output shape is NXE, N represents the number of all point cloud data, E is the embedding dimension, the embedding of the point cloud refers to the example relation between points, namely, the points close to each other in the space belong to the same example, and the path can distinguish the point cloud data into different examples;

in addition, the in2 path is a semantic segmentation fused with the instance, and a fixed number of other points adjacent to each point embedded in the instance space, including the other points, are searched through a K neighbor algorithm; after the K neighbor algorithm, point clouds belonging to the same instance can be obtained, and then the aggregation is realized with the in2 path.

Further, the ASIS modular structure includes a network structure of ASIS joint partitioning modules:

the ReLu activation function can enable the network to have the capability of learning nonlinear characteristics, and the formula is as follows:

f(s)＝max(0,s)

where s is the input sample;

we use the K-nearest neighbor algorithm, the details of which can be described as follows:

S-A, calculating the distance between each point in the known category datA set and the current point, and using A Euclidean distance formulA:

where d represents the distance between each point and the current point,

(x₀,y₀,z₀) Expressed as the coordinates of the current point cloud,

(x₁,y₁,z₁) Expressed as another point cloud coordinate；

S-B, calculating the distances between all datA in the datA set and the current point by the calculation method in S-A, and sequencing the datA according to increasing order to obtain [ d₁,d₂,...,d_N]，

d₁Represents the distance between the 1 st point and the current point,

d_Nrepresents the distance between the nth point and the current point,

n represents the number of all point cloud data;

S-C, selection [ d₁,d₂,...,d_N]The first p points in the list have the first distance to the current point;

S-D, determining the frequency of the p points in the category;

S-E, returning the category with the highest frequency of the previous p points as the prediction classification of the current point;

the example segmentation classifier is realized by a mean shift clustering method, the classifier takes an example segmentation characteristic matrix with semantic perception capability as output, and the calculation mode of the offset mean is as follows:

wherein S is_hRepresents a high-dimensional sphere region having a radius h with x as the center point, and v is included in S_hNumber of points in range, x_iIs shown to be contained in S_hThe center point update formula for points within the range is as follows:

x^t+1＝M^t+x^t，

wherein M is^tIs an offset mean value obtained in a t state; x is the number of^tIs the center in the t state, x^t+1Which is the center of time t + 1.

Further, the S3 includes:

s3-1, adopting an S3DIS indoor point cloud data set manufactured by the Stanford university computer science and the civil engineering and environmental engineering system together as a PA-Net training data set, wherein an input point cloud matrix only comprises coordinate information (x, y, z) and color information (R, G, B) of each point, the category of data set annotation comprises building elements, furniture elements and sundries, the building elements comprise ceilings, floors, walls, columns, beams, windows and doors, and the furniture elements comprise tables, chairs, bookcases, sofas and blackboards; other elements with less occurrence times or not in the interest are classified as sundries;

s3-2, training the PA-Net by using the S3DIS indoor point cloud data set;

and S3-3, training the model for multiple times by continuously adjusting the training parameters, and selecting the model with the best effect as the model used in S3.

Further, the S4 includes:

s4-1, inputting the point cloud data of the indoor building scene needing to be identified, which is collected in the S1, into the PA-Net model trained in the S2;

s4-2, a main network of the PA-Net network framework is responsible for performing primary feature extraction on input point cloud, and the point cloud data needs to be subjected to feature extraction through multilayer convolution and pooling to obtain a primary feature matrix;

s4-3, segmenting the feature matrix by using a joint segmentation module in PA-Net to obtain an example segmentation feature matrix with semantic perception capability;

and S4-4, sending the obtained example segmentation feature matrix with the semantic perception information into an example segmentation classifier, and performing example segmentation to obtain an example segmentation matrix.

Further, the S5 includes:

s5-1, according to the IFC standard, selecting Autodesk Revit desktop application software as an IFC generator;

and S5-2, sending the obtained example segmentation matrix into an IFC generator to obtain a BIM model.

Further, the point cloud data is data containing three-dimensional coordinates and RGB color information;

the training data set is S3DIS established by Stanford university according to indoor scenes;

the scanner is a Matterport three-dimensional rapid scanner;

the point cloud data of the indoor building scene to be identified needs to be taken out of the small furniture or sundries first, so as to ensure accurate identification after successful acquisition.

In conclusion, due to the adoption of the technical scheme, the method and the device can help to complete the conversion from the point cloud data to the BIM model when dealing with the building field of building repair and BIM model loss, and reduce the consumption of manpower and material resources to a certain extent.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a system of the present invention for a federated split network;

FIG. 2 is a PointNet network architecture of the present invention;

FIG. 3 is an architecture of the Joint partitioning (ASIS) module of the present invention;

FIG. 4 is a flow chart of steps of an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The invention provides a BIM three-dimensional reconstruction method based on deep learning, which comprises the following steps:

s1, collecting point cloud data of the indoor building scene to be identified by adopting a scanner, wherein the point cloud data is data containing three-dimensional coordinates and RGB color information;

s2, designing a PA-Net network adopted by the invention, wherein the PA-Net network is mainly used for carrying out feature extraction on point cloud data and carrying out joint segmentation on a feature matrix;

s3, training the deep learning framework PA-Net provided by the invention by adopting Stanford university according to an S3DIS data set established in an indoor scene to obtain a trained model;

The scanner is a Matterport three-dimensional fast scanner,

The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S2 comprises the following steps:

s2-1, designing and building a point cloud processing network PointNet, wherein the network module mainly completes feature extraction of point cloud data;

The following explains the network structure of PA-Net:

the PA-Net network construction is completed under a tensoflow platform, and referring to a combined segmentation network system shown in FIG. 1, a PA-Net network framework can be used for simultaneously carrying out instance segmentation and semantic segmentation on a 3D point cloud and closely connecting the instance segmentation and the semantic segmentation together. The whole joint division network system consists of 3 parts: a main network (PointNet) in charge of point cloud feature extraction, wherein the structure of the PointNet can refer to FIG. 2; a joint segmentation module which is responsible for fusing examples and semantic information, namely an ASIS module; and the classifier is responsible for example segmentation.

Input point cloud matrix

the size of the representation matrix dimension is N × 6, where N represents the number of all point cloud data. And a main network of the PA-Net network framework is responsible for performing primary feature extraction on the input point cloud and obtaining a feature matrix. In the network, feature extraction is carried out on the point cloud data through a multi-layer convolutional layer Conv2D and a pooling layer, so that a classification target is found, wherein the sizes of convolution kernels of the multi-layer convolutional layer Conv2D are (1, 1) and (3, 3); the process is similar to a funnel, the feature map size is gradually reduced, and the subsequent semantic segmentation and instance segmentation need to restore the classified feature map to the original image size. The point cloud feature extraction main network is realized by using PointNet, the PointNet adopts twice space conversion, and a rotating matrix obtained by the twice space conversion can refer to a T-Net network structure in fig. 2. FIG. 2 shows that the first input conversion is to adjust the point cloud in the space, and the second feature conversion is to extract the point cloudAnd (4) carrying out feature alignment on the 64-dimensional features, namely, carrying out transformation on the point cloud on a feature layer surface, and carrying out matrix multiplication on a high-dimensional rotation matrix and the point cloud data to carry out spatial position correction. The maximum pooling layer can provide global characteristics for the whole point cloud, and the 2D convolution and full connection layer (FC) in the network are used for extracting the basic characteristics of data to finally obtain an output characteristic matrix

The size of the representation matrix dimension is N × 40, where N represents the number of all point cloud data. It can therefore be concluded that the PointNet series network can directly handle unordered point clouds without any pre-processing, such as conversion into depth images or voxels, etc.

Referring to fig. 2, the PointNet network structure is described in detail as follows:

for the fully connected layer, the k-1 layer and the k layer, which are used in the network structure, any neuron of the k-1 layer is connected with all nodes of the k layer, namely, each neuron of the k layer is obtained by weighting and summing the k-1 layer. Is provided with

Is the output vector of the k-1 layer of a Fully Connected (FC) network, l _k-1,11 st neuron, l, of layer k-1 representing a fully connected network_k-1,2Representing the 2 nd neuron at layer k-1 of the fully-connected network,

l_k-1(j) the jth column offset representing the k-1 th layer,

(·)^Tthe transpose of the matrix is represented,

N_kindicates the number of neurons on the k-th hidden layer,

W_kand b_kWeight matrix and bias vector for the k-th hidden layer respectively,

the output of the k-th hidden layer is then:

l_k＝W_kl_k-1+b_k，

l_k-1is the input of the k-1 layer network;

will find l_kAs input to the next network layer.

For the 2D convolutional layer we use in the network structure, a 2D convolutional layer network is composed of multiple convolutional filters, each filter processes data on a different channel, and the data is convolved and summed through a sliding window. Is provided with

Is output data of the k-1 layer of CNN, wherein

The output of the k-1 th layer is shown,

1 st pass representing the k-1 th layer output matrixThe output of the channel is carried out,

represents the 2 nd channel output of the k-1 th layer output matrix,

dimension size H representing k-1 layer output_k-1×W_k-1×C_k-1，

Represents the output of the c-th channel of the k-1 th layer output matrix,

dimension size H representing k-1 layer output_k-1×W_k-1，H_k-1And W_k-1Respectively representing the number of rows and columns of the matrix, C_k-1And the number of channels output by the k-1 layer convolutional neural network is represented. Then, the output of the ith row and the jth column on the c channel of the kth convolutional neural network is:

wherein,

and

respectively representing the weight tensor and bias vector of the k-th layer,

representing a vector dimension of C_k，

C_Kdenotes the number of channels, C_K＝1...C_k-1，

m represents the number of rows of the convolution filter,

n represents the number of columns of the product filter,

is the output of the k-1 th layer of the z-channel CNN,

representing the bias vector for the c-th channel of the k-th layer,

s is the step size of the convolution filter sliding over the input data,

wherein,

representing the c-th channel on the k-th convolutional neural networkThe output of row 2 and column 1 on the lane,

indicating the c channel on the k convolutional neural network

The output of column 1 of the row,

indicating the c channel on the k convolutional neural network

The output of column 2 of the row,

represents line 1, on the c channel on the k convolutional neural network

The output of the column is then,

represents line 2, on the c channel on the k convolutional neural network

The output of the column is then,

indicating the c channel on the k convolutional neural network

Go to the first

The output of the column is then,

s is the step size of the convolution filter sliding over the input data,

therefore, the output data of the k-th CNN layer is:

wherein,

representing the 1 st channel output of the k-th layer output matrix,

represents the 2 nd channel output of the k-th layer output matrix,

c for k output matrix_kThe output of each channel is used as the output,

the dimension of the expression matrix is

s is the step size of the convolution filter sliding over the input data,

wherein,

as output data of the k-th layer of the CNN,

and

respectively representing the weight tensor and bias vector of the k-th layer,

is the output of the k-1 layer of CNN;

f (·) is the ReLu activation function f (x) max (0, x).

Referring to FIG. 3, the Joint partitioning (ASIS) modular structure, ASIS is composed of 2 paths, where in1 and in2 paths are both the outputs of the network of PointNet, i.e., the

U is a feature matrix of the output and is matched with each otherAnd (6) mixing. in1 is a semantic-aware example partition, the semantic feature matrix is first passed through a full-link layer with a ReLU activation function, and at this time, the semantic feature matrix of in2 is added to in1, which can be described as: in1+ FC (in2), FC (·) is Fully Connected (FC) operation; and then outputting the data as out1 through a full connection layer, wherein the output shape is N × E, where N represents the number of all point cloud data, and E is the embedding dimension, and the embedding of the point cloud refers to the example relationship between points, that is, points close in space belong to the same example. The way may distinguish the point cloud data into different instances. In addition, the in2 path is a semantic segmentation fused with the instance, and the rest points with fixed quantity adjacent to each point embedded in the instance space, including the rest points, are searched by a K neighbor algorithm; the reason for using the K-nearest neighbor algorithm here is that, since the point clouds belong to the same instance, they are close to each other in space, and the points of different instances are separated from each other. After the K neighbor algorithm, the point cloud belonging to the same instance can be obtained, and then Aggregation is realized with in2 paths, and A in FIG. 3 represents Aggregation. And finally, outputting the data as out2 through a full connection layer, wherein the output shape is NxC, N represents the number of all point cloud data, and C represents the number of semantic categories. Through the combined segmentation module, the feature matrix further strengthens the feature connotation and becomes an example segmentation feature matrix with semantic perception capability, and the feature matrix is superior to the feature matrix directly used. Finally, the final instance label can be obtained by using an instance segmentation classifier realized by mean-shift clustering.

For the network structure of the ASIS joint partitioning module, the full connection layer is used as above, and its details are as follows:

f(s)＝max(0,s)，

where s is the input sample.

where d represents the distance between each point and the current point,

(x₀,y₀,z₀) Expressed as the coordinates of the current point cloud,

(x₁,y₁,z₁) Represented as another point cloud coordinate.

d₁Represents the distance between the 1 st point and the current point,

d_Nrepresents the distance between the nth point and the current point,

n represents the number of all point cloud data.

S-C, selection [ d₁,d₂,...,d_N]The first p points of (1) are the first points from the current point.

And S-D, determining the frequency of the p points in the category.

And S-E, returning the category with the highest occurrence frequency of the previous p points as the prediction classification of the current point.

The example segmentation classifier used by us can be realized by mean-shift clustering, and the classifier takes an example segmentation feature matrix with semantic perception capability as output, and can obtain semantic labels and example labels of predicted points. The following lists that the mean shift clustering algorithm relates to a formula, wherein the offset mean calculation mode is as follows:

x^t+1＝M^t+x^t，

The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S3 comprises the following steps:

s3-1, adopting an S3DIS indoor point cloud data set manufactured by Stanford university computer science and civil engineering and environmental engineering system together as a PA-Net training data set, wherein an input point cloud matrix only comprises coordinate information (x, y, z) and color information (R, G, B) of each point, the category of data set annotation comprises 13 categories of building elements, furniture elements and sundries, wherein the building elements comprise ceilings, floors, walls, columns, beams, windows and doors, the furniture elements comprise tables, chairs, bookcases, sofas and blackboards, and other elements with fewer occurrence times or without interest are classified into the sundries. Notably, because the dataset has an instance-level annotation for each point cloud, meaning that noise attributed to clutter is removed, no point cloud denoising is required when using the dataset. The S3DIS indoor point cloud data set meets the requirements needed by point cloud example segmentation, so the data set is selected as a segmentation and experiment object.

And S3-2, training the PA-Net by using the S3DIS indoor point cloud data set.

The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S4 comprises the following steps:

The force segmentation classifier has semantic perception capability, and can perform instance segmentation according to an instance segmentation matrix with semantic perception information, and the segmented result is the instance segmentation matrix, so that the force segmentation classifier does not contain semantic information.

The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S5 comprises the following steps:

The flow chart of the steps of the embodiment of the invention is shown in FIG. 4:

s001, start.

And S002, collecting point cloud data of the indoor building scene to be identified by adopting a Matterport scanner so as to generate a required example partition matrix.

S003, the model is trained using the S3DIS indoor point cloud data set, which is made by the Stanford university computer science together with the civil engineering and environmental engineering system and is collected by a Matterport scanner, and the data set contains three-dimensional coordinates (x, y, z) and color (R, G, B) information of the point cloud.

And S004, the data set needs to be segmented for the training and testing of the PA-Net, and all samples of the S3DIS indoor point cloud data set are divided into a training set and a testing set in a ratio of 9: 1. The training set is used for training the model, and the testing set is used for testing the model.

S005, designing the PA-Net training scheme as follows: setting the cycle traversal time epoch as 1000, namely cyclically traversing 1000 training sets; carrying out random disordering sequence processing on the training set samples; the number of training samples, batch-size, is set to 200, i.e. 200 samples are sent into the network per training round; the optimizer uses ADAM, i.e. adaptive moment estimation; the loss function uses MSE, i.e. mean square error; the conditions for saving the model are set to save the model every 10 rounds of training.

S006, loading the trained model, sending all sample characteristics of the test set into the model as input data for prediction, comparing the predicted target value with the actual target value, namely calculating the MSE (mean square error) of the predicted target value and the actual target value, and averaging the MSE obtained by all samples. When MSE is less than 10^-3In this case, the generalization performance of the model is considered to be good, and online prediction is possible, and step S007 is executed. When MSE is greater than 10^-3In the meantime, it is considered that the generalization performance of the model is weak, and the hyper-parameters in the network need to be adjusted, the step S005 is executed to retrain, and the model is stored and tested again. Wherein the ADAM optimizer parameter updating formula is as follows:

wherein, W represents a gradient,

b represents the deviation of the measured value,

w and b are used to update the network parameters,

alpha is a value representing the learning rate,

an exponential moving average of the gradient is represented,

which represents the square of the gradient of the square,

is a very small number, the prevented denominator is 0.

The mean square error MSE is calculated as follows:

wherein, y_qRepresents the qth real value of the data,

represents the qth predictor of the network, for a total of T data.

And S007, continuously repeating the step S5, and selecting the PA-Net model with the best effect to generate an example segmentation matrix.

S008, inputting the collected point cloud data of the indoor building scene to be identified into a trained PA-Net model, and outputting the data PA-Net through calculation to obtain an example segmentation matrix H.

And S009, the example uses the Autodesk Revit desktop application software as an IFC generator, and the example division matrix H is used as the input of the IFC generator, so that the required BIM indoor building model can be obtained.

And S010, ending.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims

1. A BIM three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps:

s4, inputting the point cloud data of the indoor building scene needing to be identified, which is collected in the S1, into the PA-Net model trained in the S3, and outputting to obtain an example segmentation matrix;

2. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S2 comprises:

3. The BIM three-dimensional reconstruction method based on deep learning of claim 2, wherein the S3 comprises:

input point cloud matrix

4. The BIM three-dimensional reconstruction method based on deep learning of claim 2, wherein the PointNet comprises a PointNet network structure:

order to

Is the output vector of the k-1 layer of a fully connected FC network, l_k-1,11 st neuron, l, of layer k-1 representing a fully connected network_k-1,2Representing the 2 nd neuron at layer k-1 of the fully-connected network,

l_k-1(j) the jth column offset representing the k-1 th layer,

(·)^Tthe transpose of the matrix is represented,

N_kindicates the number of neurons on the k-th hidden layer,

W_kand b_kWeight matrix and bias vector for the k-th hidden layer respectively,

the output of the k-th hidden layer is then:

l_k＝W_kl_k-1+b_k，

l_k-1is the input of the k-1 layer network;

will find l_kAs input to the next network layer;

order to

Is output data of the k-1 layer of CNN, wherein

The output of the k-1 th layer is shown,

representing the 1 st channel output of the k-1 th layer output matrix,

represents the 2 nd channel output of the k-1 th layer output matrix,

dimension size H representing k-1 layer output_k-1×W_k-1×C_k-1，

Represents the output of the c-th channel of the k-1 th layer output matrix,

wherein,

and b_kRespectively representing the weight tensor and bias vector of the k-th layer,

C_Kdenotes the number of channels, C_K＝1...C_k-1，

m represents the number of rows of the convolution filter,

n represents the number of columns of the product filter,

is the output of the k-1 th layer of the z-channel CNN,

representing the bias vector for the c-th channel of the k-th layer,

s is the step size of the convolution filter sliding over the input data,

wherein,

indicating the c channel on the k convolutional neural network

The output of column 1 of the row,

indicating the c channel on the k convolutional neural network

The output of column 2 of the row,

represents line 1, on the c channel on the k convolutional neural network

The output of the column is then,

represents line 2, on the c channel on the k convolutional neural network

The output of the column is then,

indicating the c channel on the k convolutional neural network

Go to the first

The output of the column is then,

s is the step size of the convolution filter sliding over the input data,

therefore, the output data of the k-th CNN layer is:

wherein,

representing the 1 st channel output of the k-th layer output matrix,

represents the 2 nd channel output of the k-th layer output matrix,

c for k output matrix_kThe output of each channel is used as the output,

the dimension of the expression matrix is

s is the step size of the convolution filter sliding over the input data,

wherein,

as output data of the k-th layer of the CNN,

and

respectively representing the weight tensor and bias vector of the k-th layer,

is the output of the k-1 layer of CNN;

f (·) is the ReLu activation function f (x) max (0, x).

5. The BIM three-dimensional reconstruction method based on deep learning of claim 2, wherein the joint segmentation comprises joint segmentation of ASIS modular structure:

U being an outputFeature matrices and are matched with each other; in1 is an example partition, the semantic feature matrix is first passed through the full connection layer with the ReLU activation function, and at this time, the semantic feature matrix of in2 is added to in1, which can be described as: in1+ FC (in2), FC (·) is fully connected FC operation; then outputting the data as out1 through a full connection layer, wherein the output shape is NXE, N represents the number of all point cloud data, E is the embedding dimension, the embedding of the point cloud refers to the example relation between points, namely, the points close to each other in the space belong to the same example, and the path can distinguish the point cloud data into different examples;

6. The BIM three-dimensional reconstruction method based on deep learning of claim 5, wherein the ASIS module structure comprises a network structure of an ASIS joint segmentation module:

f(s)＝max(0,s)，

where s is the input sample;

S-A, calculating the distance between each point in the known category datA set and the current point, using the Euclidean distance formulA,

where d represents the distance between each point and the current point,

(x₀,y₀,z₀) Expressed as the coordinates of the current point cloud,

(x₁,y₁,z₁) Watch (A)Shown as another point cloud coordinate;

d₁Represents the distance between the 1 st point and the current point,

d_Nrepresents the distance between the nth point and the current point,

n represents the number of all point cloud data;

S-D, determining the frequency of the p points in the category;

x^t+1＝M^t+x^t，

7. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S3 comprises:

s3-1, adopting an S3DIS indoor point cloud data set manufactured by the Stanford university computer science and the civil engineering and environmental engineering system together as a PA-Net training data set, wherein an input point cloud matrix only comprises coordinate information (x, y, z) and color information (R, G, B) of each point, the category of data set annotation comprises building elements, furniture elements and sundries, the building elements comprise ceilings, floors, walls, columns, beams, windows and doors, and the furniture elements comprise tables, chairs, bookcases, sofas and blackboards;

s3-2, training the PA-Net by using the S3DIS indoor point cloud data set;

8. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S4 comprises:

9. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S5 comprises:

10. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the point cloud data is a data containing three-dimensional coordinates and RGB color information;

the scanner is a Matterport three-dimensional rapid scanner;