CN112785694A - BIM three-dimensional reconstruction method based on deep learning - Google Patents

BIM three-dimensional reconstruction method based on deep learning Download PDF

Info

Publication number
CN112785694A
CN112785694A CN202110160200.3A CN202110160200A CN112785694A CN 112785694 A CN112785694 A CN 112785694A CN 202110160200 A CN202110160200 A CN 202110160200A CN 112785694 A CN112785694 A CN 112785694A
Authority
CN
China
Prior art keywords
output
layer
matrix
point cloud
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110160200.3A
Other languages
Chinese (zh)
Inventor
姚鸿方
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ximengtek Chongqing Industrial Development Co ltd
Original Assignee
Ximengtek Chongqing Industrial Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ximengtek Chongqing Industrial Development Co ltd filed Critical Ximengtek Chongqing Industrial Development Co ltd
Priority to CN202110160200.3A priority Critical patent/CN112785694A/en
Publication of CN112785694A publication Critical patent/CN112785694A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a BIM three-dimensional reconstruction method based on deep learning, which comprises the following steps: s1, acquiring point cloud data of an indoor building scene to be identified by using a scanner, S2, designing a PA-Net network adopted by the invention, wherein the network mainly performs feature extraction on the point cloud data and performs joint segmentation on a feature matrix, S3, training a deep learning framework PA-Net provided by the invention by using a training data set to obtain a trained model, S4, inputting the point cloud data of the indoor building scene to be identified, acquired by S1, into the PA-Net model trained by S3, outputting to obtain an instance segmentation matrix, namely the point cloud instance segmentation matrix containing a specific label, S5, reversely generating a BIM model by using an IFC standard on the instance segmentation matrix containing semantic information acquired by S4; the invention can reversely generate the BIM after the point cloud data is collected, and assists in building planning and building repair.

Description

BIM three-dimensional reconstruction method based on deep learning
Technical Field
The invention relates to the field of BIM model reconstruction, in particular to a BIM three-dimensional reconstruction method based on deep learning.
Background
Deep learning is now successful in many areas, such as natural language processing, computer vision, speech recognition, etc. The deep learning is widely applied to computer vision, and can be used for object recognition, image classification, semantic segmentation, instance segmentation and the like. The Building Information Model (BIM) technology is firstly proposed by Autodesk company in 2002, has been approved in the world, can help to realize the integration of Building Information, and all kinds of Information are always integrated in a three-dimensional Model Information database from the design, construction and operation of a Building to the end of the whole life cycle of the Building, and all working units work cooperatively, thereby effectively improving the engineering implementation efficiency, saving resources and reducing the cost. For some buildings in the ages or buildings to be repaired urgently, a proper BIM model cannot be obtained from the construction drawing or no construction drawing is available. Aiming at the problem, in order to obtain a BIM model with strong applicability more quickly and better, firstly, some quick scanning equipment is used for collecting point cloud data of a building to be detected, and then a 3D building model is generated from the point cloud data. However, the process is still a task requiring a large amount of manual work, and therefore, the invention provides a method for realizing conversion from point cloud data to a 3D model based on a deep learning method.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides a BIM three-dimensional reconstruction method based on deep learning.
In order to achieve the above object, the present invention provides a BIM three-dimensional reconstruction method based on deep learning, including the following steps:
s1, acquiring point cloud data of the indoor building scene to be identified by adopting a scanner;
s2, constructing the PA-Net network, and extracting the point cloud data and jointly dividing the feature matrix by the network;
s3, training the PA-Net network by adopting a training data set to obtain a trained model;
s4, inputting the point cloud data of the indoor building scene to be identified, which is acquired in S1, into the PA-Net model trained in S3, and outputting to obtain an instance partition matrix, namely the point cloud instance partition matrix containing specific labels;
and S5, reversely generating the BIM by utilizing the IFC standard on the segmentation matrix containing the semantic information example obtained in the step S4.
Further, the S2 includes:
the PA-Net network comprises a point cloud processing network PointNet and a joint segmentation module ASIS,
s2-1, designing and building a point cloud processing network PointNet, wherein the network module is used for completing feature extraction of point cloud data;
s2-2, designing a joint segmentation module ASIS to complete semantic segmentation and instance segmentation of the feature matrix extracted by the PointNet.
Further, the S3 includes:
the joint division network system consists of 3 parts: firstly, a main network PointNet in charge of point cloud feature extraction; a joint segmentation module which is responsible for fusing examples and semantic information, namely an ASIS module; the classifier responsible for example segmentation;
input point cloud matrix
Figure BDA0002935167170000021
Only coordinate information (x, y, z) and color information (R, G, B) of each point are included,
Figure BDA0002935167170000022
the size of the matrix dimension is represented as Nx 6, and N represents the number of all point cloud data;
and a main network of the PA-Net network framework is responsible for performing primary feature extraction on the input point cloud and obtaining a feature matrix.
Further, the PointNet comprises a PointNet network structure:
order to
Figure BDA0002935167170000023
Is the output vector of the k-1 layer of a fully connected FC network, l k-1,11 st neuron, l, of layer k-1 representing a fully connected networkk-1,2Representing the 2 nd neuron at layer k-1 of the fully-connected network,
Figure BDA0002935167170000031
represents the output of the nth neuron at layer k-1 of the fully-connected network, so the output of each neuron on the k-layer neural network is:
Figure BDA0002935167170000032
wherein N isk-1The number of the neurons on the hidden layer of the k-1 layer is shown,
Wk(i, j) weight matrix W representing the k-th hidden layerkThe ith row and the jth column of (1),
lk-1(j) the jth column offset representing the k-1 th layer,
bk(i) bias vector b representing the k-th hidden layerkThe number of the ith row of (a),
Wk(i) weight matrix W representing the k-th hidden layerkThe number of the ith row of (a),
lk-1indicating the bias of the k-1 th layer i.e. the input to the k-1 th layer network,
(·)Tthe transpose of the matrix is represented,
Nkindicates the number of neurons on the k-th hidden layer,
Wkand bkWeight matrix and bias vector for the k-th hidden layer respectively,
the output of the k-th hidden layer is then:
lk=Wklk-1+bk
wherein, WkAnd bkRespectively is a weight matrix and a bias vector of a k-th hidden layer;
lk-1is the input of the k-1 layer network;
will find lkAs input to the next network layer;
the 2D convolutional layer network consists of a plurality of convolutional filters, each filter processes data on different channels, and the data are subjected to convolutional summation through a sliding window;
order to
Figure BDA0002935167170000033
Is output data of the k-1 layer of CNN, wherein
Figure BDA0002935167170000034
The output of the k-1 th layer is shown,
Figure BDA0002935167170000035
representing the 1 st channel output of the k-1 th layer output matrix,
Figure BDA0002935167170000036
represents the 2 nd channel output of the k-1 th layer output matrix,
Figure BDA0002935167170000037
c for representing k-1 layer output matrixk-1The output of each channel is used as the output,
Figure BDA0002935167170000038
dimension size H representing k-1 layer outputk-1×Wk-1×Ck-1
Figure BDA0002935167170000039
Represents the output of the c-th channel of the k-1 th layer output matrix,
Figure BDA0002935167170000041
dimension size H representing k-1 layer outputk-1×Wk-1,Hk-1And Wk-1Respectively representing the number of rows and columns of the matrix, Ck-1Representing the number of channels output by the k-1 layer convolutional neural network; then, the output of the ith row and the jth column on the c channel of the kth convolutional neural network is:
Figure BDA0002935167170000042
wherein,
Figure BDA0002935167170000043
and
Figure BDA0002935167170000044
respectively representing the weight tensor and bias vector of the k-th layer,
Figure BDA0002935167170000045
representing a vector dimension of Ck
CkRepresents the number of channels output by the k-th convolutional neural network,
Ck-1represents the number of channels output by the k-1 layer convolutional neural network,
m denotes the total number of rows and the total number of columns of the convolution filter,
Figure BDA0002935167170000046
represents the c-th channel, z represents the incremental number of channels for the k-1 layer, the weight tensor for the k-th layer,
CKdenotes the number of channels, CK=1...Ck-1
m represents the number of rows of the convolution filter,
n represents the number of columns of the product filter,
Figure BDA0002935167170000047
is the output of the k-1 th layer of the z-channel CNN,
Figure BDA0002935167170000048
representing the bias vector for the c-th channel of the k-th layer,
s is the step size of the convolution filter sliding over the input data,
therefore, we can obtain the output matrix on the c channel on the k convolutional neural network as:
Figure BDA0002935167170000049
wherein,
Figure BDA00029351671700000410
represents the output of row 1 and column 1 on the c channel on the k convolutional neural network,
Figure BDA00029351671700000411
represents the output of row 2 and column 1 on the c channel on the k convolutional neural network,
Figure BDA0002935167170000051
indicating the c channel on the k convolutional neural network
Figure BDA0002935167170000052
The output of column 1 of the row,
Figure BDA0002935167170000053
represents the output of row 1 and column 2 on the c channel on the k convolutional neural network,
Figure BDA0002935167170000054
represents the output of row 2 and column 2 on the c channel on the k convolutional neural network,
Figure BDA0002935167170000055
indicating the c channel on the k convolutional neural network
Figure BDA0002935167170000056
The output of column 2 of the row,
Figure BDA0002935167170000057
represents line 1, on the c channel on the k convolutional neural network
Figure BDA0002935167170000058
The output of the column is then,
Figure BDA0002935167170000059
represents line 2, on the c channel on the k convolutional neural network
Figure BDA00029351671700000510
The output of the column is then,
Figure BDA00029351671700000511
indicating the c channel on the k convolutional neural network
Figure BDA00029351671700000512
Go to the first
Figure BDA00029351671700000513
The output of the column is then,
Hk-1and Wk-1Respectively representing the number of rows and the number of columns of the matrix,
m denotes the total number of rows and the total number of columns of the convolution filter,
s is the step size of the convolution filter sliding over the input data,
therefore, the output data of the k-th CNN layer is:
Figure BDA00029351671700000514
wherein,
Figure BDA00029351671700000515
representing the 1 st channel output of the k-th layer output matrix,
Figure BDA00029351671700000516
represents the 2 nd channel output of the k-th layer output matrix,
Figure BDA00029351671700000517
c for k output matrixkThe output of each channel is used as the output,
Figure BDA00029351671700000518
representing the size of a matrix dimensionIs composed of
Figure BDA00029351671700000519
Hk-1And Wk-1Respectively representing the number of rows and the number of columns of the matrix,
m denotes the total number of rows and the total number of columns of the convolution filter,
s is the step size of the convolution filter sliding over the input data,
Ckrepresenting the number of channels output by the k layer convolutional neural network;
similarly, in order to perform nonlinear transformation on data, each layer of the convolutional neural network needs to be followed by an activation function, so that the transformation formula of each layer of the convolutional neural network can be abbreviated as follows:
Figure BDA00029351671700000520
wherein,
Figure BDA0002935167170000061
as output data of the k-th layer of the CNN,
Figure BDA0002935167170000062
and
Figure BDA0002935167170000063
respectively representing the weight tensor and bias vector of the k-th layer,
Figure BDA0002935167170000064
is the output of the k-1 layer of CNN;
f (·) is the ReLu activation function f (x) max (0, x).
Further, the joint partitioning comprises joint partitioning an ASIS modular structure:
ASIS is composed of 2 paths, where in1 and in2 paths are both the outputs of the network of PointNet, i.e.
Figure BDA0002935167170000065
U is an output characteristic matrix and is matched with the output characteristic matrix; in1 is a semantic-aware example partition, the semantic feature matrix is first passed through a full-link layer with a ReLU activation function, and at this time, the semantic feature matrix of in2 is added to in1, which can be described as: in1+ FC (in2), FC (·) is fully connected FC operation; then outputting the data as out1 through a full connection layer, wherein the output shape is NXE, N represents the number of all point cloud data, E is the embedding dimension, the embedding of the point cloud refers to the example relation between points, namely, the points close to each other in the space belong to the same example, and the path can distinguish the point cloud data into different examples;
in addition, the in2 path is a semantic segmentation fused with the instance, and a fixed number of other points adjacent to each point embedded in the instance space, including the other points, are searched through a K neighbor algorithm; after the K neighbor algorithm, point clouds belonging to the same instance can be obtained, and then the aggregation is realized with the in2 path.
Further, the ASIS modular structure includes a network structure of ASIS joint partitioning modules:
the ReLu activation function can enable the network to have the capability of learning nonlinear characteristics, and the formula is as follows:
f(s)=max(0,s)
where s is the input sample;
we use the K-nearest neighbor algorithm, the details of which can be described as follows:
S-A, calculating the distance between each point in the known category datA set and the current point, and using A Euclidean distance formulA:
Figure BDA0002935167170000066
where d represents the distance between each point and the current point,
(x0,y0,z0) Expressed as the coordinates of the current point cloud,
(x1,y1,z1) Expressed as another point cloud coordinate;
S-B, calculating the distances between all datA in the datA set and the current point by the calculation method in S-A, and sequencing the datA according to increasing order to obtain [ d1,d2,...,dN],
d1Represents the distance between the 1 st point and the current point,
dNrepresents the distance between the nth point and the current point,
n represents the number of all point cloud data;
S-C, selection [ d1,d2,...,dN]The first p points in the list have the first distance to the current point;
S-D, determining the frequency of the p points in the category;
S-E, returning the category with the highest frequency of the previous p points as the prediction classification of the current point;
the example segmentation classifier is realized by a mean shift clustering method, the classifier takes an example segmentation characteristic matrix with semantic perception capability as output, and the calculation mode of the offset mean is as follows:
Figure BDA0002935167170000071
wherein S ishRepresents a high-dimensional sphere region having a radius h with x as the center point, and v is included in ShNumber of points in range, xiIs shown to be contained in ShThe center point update formula for points within the range is as follows:
xt+1=Mt+xt
wherein M istIs an offset mean value obtained in a t state; x is the number oftIs the center in the t state, xt+1Which is the center of time t + 1.
Further, the S3 includes:
s3-1, adopting an S3DIS indoor point cloud data set manufactured by the Stanford university computer science and the civil engineering and environmental engineering system together as a PA-Net training data set, wherein an input point cloud matrix only comprises coordinate information (x, y, z) and color information (R, G, B) of each point, the category of data set annotation comprises building elements, furniture elements and sundries, the building elements comprise ceilings, floors, walls, columns, beams, windows and doors, and the furniture elements comprise tables, chairs, bookcases, sofas and blackboards; other elements with less occurrence times or not in the interest are classified as sundries;
s3-2, training the PA-Net by using the S3DIS indoor point cloud data set;
and S3-3, training the model for multiple times by continuously adjusting the training parameters, and selecting the model with the best effect as the model used in S3.
Further, the S4 includes:
s4-1, inputting the point cloud data of the indoor building scene needing to be identified, which is collected in the S1, into the PA-Net model trained in the S2;
s4-2, a main network of the PA-Net network framework is responsible for performing primary feature extraction on input point cloud, and the point cloud data needs to be subjected to feature extraction through multilayer convolution and pooling to obtain a primary feature matrix;
s4-3, segmenting the feature matrix by using a joint segmentation module in PA-Net to obtain an example segmentation feature matrix with semantic perception capability;
and S4-4, sending the obtained example segmentation feature matrix with the semantic perception information into an example segmentation classifier, and performing example segmentation to obtain an example segmentation matrix.
Further, the S5 includes:
s5-1, according to the IFC standard, selecting Autodesk Revit desktop application software as an IFC generator;
and S5-2, sending the obtained example segmentation matrix into an IFC generator to obtain a BIM model.
Further, the point cloud data is data containing three-dimensional coordinates and RGB color information;
the training data set is S3DIS established by Stanford university according to indoor scenes;
the scanner is a Matterport three-dimensional rapid scanner;
the point cloud data of the indoor building scene to be identified needs to be taken out of the small furniture or sundries first, so as to ensure accurate identification after successful acquisition.
In conclusion, due to the adoption of the technical scheme, the method and the device can help to complete the conversion from the point cloud data to the BIM model when dealing with the building field of building repair and BIM model loss, and reduce the consumption of manpower and material resources to a certain extent.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a system of the present invention for a federated split network;
FIG. 2 is a PointNet network architecture of the present invention;
FIG. 3 is an architecture of the Joint partitioning (ASIS) module of the present invention;
FIG. 4 is a flow chart of steps of an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The invention provides a BIM three-dimensional reconstruction method based on deep learning, which comprises the following steps:
s1, collecting point cloud data of the indoor building scene to be identified by adopting a scanner, wherein the point cloud data is data containing three-dimensional coordinates and RGB color information;
s2, designing a PA-Net network adopted by the invention, wherein the PA-Net network is mainly used for carrying out feature extraction on point cloud data and carrying out joint segmentation on a feature matrix;
s3, training the deep learning framework PA-Net provided by the invention by adopting Stanford university according to an S3DIS data set established in an indoor scene to obtain a trained model;
s4, inputting the point cloud data of the indoor building scene to be identified, which is acquired in S1, into the PA-Net model trained in S3, and outputting to obtain an instance partition matrix, namely the point cloud instance partition matrix containing specific labels;
and S5, reversely generating the BIM by utilizing the IFC standard on the segmentation matrix containing the semantic information example obtained in the step S4.
The scanner is a Matterport three-dimensional fast scanner,
the point cloud data of the indoor building scene to be identified needs to be taken out of the small furniture or sundries first, so as to ensure accurate identification after successful acquisition.
The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S2 comprises the following steps:
the PA-Net network comprises a point cloud processing network PointNet and a joint segmentation module ASIS,
s2-1, designing and building a point cloud processing network PointNet, wherein the network module mainly completes feature extraction of point cloud data;
s2-2, designing a joint segmentation module ASIS to complete semantic segmentation and instance segmentation of the feature matrix extracted by the PointNet.
The following explains the network structure of PA-Net:
the PA-Net network construction is completed under a tensoflow platform, and referring to a combined segmentation network system shown in FIG. 1, a PA-Net network framework can be used for simultaneously carrying out instance segmentation and semantic segmentation on a 3D point cloud and closely connecting the instance segmentation and the semantic segmentation together. The whole joint division network system consists of 3 parts: a main network (PointNet) in charge of point cloud feature extraction, wherein the structure of the PointNet can refer to FIG. 2; a joint segmentation module which is responsible for fusing examples and semantic information, namely an ASIS module; and the classifier is responsible for example segmentation.
Input point cloud matrix
Figure BDA0002935167170000111
Only coordinate information (x, y, z) and color information (R, G, B) of each point are included,
Figure BDA0002935167170000112
the size of the representation matrix dimension is N × 6, where N represents the number of all point cloud data. And a main network of the PA-Net network framework is responsible for performing primary feature extraction on the input point cloud and obtaining a feature matrix. In the network, feature extraction is carried out on the point cloud data through a multi-layer convolutional layer Conv2D and a pooling layer, so that a classification target is found, wherein the sizes of convolution kernels of the multi-layer convolutional layer Conv2D are (1, 1) and (3, 3); the process is similar to a funnel, the feature map size is gradually reduced, and the subsequent semantic segmentation and instance segmentation need to restore the classified feature map to the original image size. The point cloud feature extraction main network is realized by using PointNet, the PointNet adopts twice space conversion, and a rotating matrix obtained by the twice space conversion can refer to a T-Net network structure in fig. 2. FIG. 2 shows that the first input conversion is to adjust the point cloud in the space, and the second feature conversion is to extract the point cloudAnd (4) carrying out feature alignment on the 64-dimensional features, namely, carrying out transformation on the point cloud on a feature layer surface, and carrying out matrix multiplication on a high-dimensional rotation matrix and the point cloud data to carry out spatial position correction. The maximum pooling layer can provide global characteristics for the whole point cloud, and the 2D convolution and full connection layer (FC) in the network are used for extracting the basic characteristics of data to finally obtain an output characteristic matrix
Figure BDA0002935167170000113
Figure BDA0002935167170000114
The size of the representation matrix dimension is N × 40, where N represents the number of all point cloud data. It can therefore be concluded that the PointNet series network can directly handle unordered point clouds without any pre-processing, such as conversion into depth images or voxels, etc.
Referring to fig. 2, the PointNet network structure is described in detail as follows:
for the fully connected layer, the k-1 layer and the k layer, which are used in the network structure, any neuron of the k-1 layer is connected with all nodes of the k layer, namely, each neuron of the k layer is obtained by weighting and summing the k-1 layer. Is provided with
Figure BDA0002935167170000115
Is the output vector of the k-1 layer of a Fully Connected (FC) network, l k-1,11 st neuron, l, of layer k-1 representing a fully connected networkk-1,2Representing the 2 nd neuron at layer k-1 of the fully-connected network,
Figure BDA0002935167170000116
represents the output of the nth neuron at layer k-1 of the fully-connected network, so the output of each neuron on the k-layer neural network is:
Figure BDA0002935167170000121
wherein N isk-1The number of the neurons on the hidden layer of the k-1 layer is shown,
Wk(i, j) weight matrix W representing the k-th hidden layerkThe ith row and the jth column of (1),
lk-1(j) the jth column offset representing the k-1 th layer,
bk(i) bias vector b representing the k-th hidden layerkThe number of the ith row of (a),
Wk(i) weight matrix W representing the k-th hidden layerkThe number of the ith row of (a),
lk-1indicating the bias of the k-1 th layer i.e. the input to the k-1 th layer network,
(·)Tthe transpose of the matrix is represented,
Nkindicates the number of neurons on the k-th hidden layer,
Wkand bkWeight matrix and bias vector for the k-th hidden layer respectively,
the output of the k-th hidden layer is then:
lk=Wklk-1+bk
wherein, WkAnd bkRespectively is a weight matrix and a bias vector of a k-th hidden layer;
lk-1is the input of the k-1 layer network;
will find lkAs input to the next network layer.
For the 2D convolutional layer we use in the network structure, a 2D convolutional layer network is composed of multiple convolutional filters, each filter processes data on a different channel, and the data is convolved and summed through a sliding window. Is provided with
Figure BDA0002935167170000122
Is output data of the k-1 layer of CNN, wherein
Figure BDA0002935167170000123
The output of the k-1 th layer is shown,
Figure BDA0002935167170000124
1 st pass representing the k-1 th layer output matrixThe output of the channel is carried out,
Figure BDA0002935167170000125
represents the 2 nd channel output of the k-1 th layer output matrix,
Figure BDA0002935167170000126
c for representing k-1 layer output matrixk-1The output of each channel is used as the output,
Figure BDA0002935167170000127
dimension size H representing k-1 layer outputk-1×Wk-1×Ck-1
Figure BDA0002935167170000128
Represents the output of the c-th channel of the k-1 th layer output matrix,
Figure BDA0002935167170000129
dimension size H representing k-1 layer outputk-1×Wk-1,Hk-1And Wk-1Respectively representing the number of rows and columns of the matrix, Ck-1And the number of channels output by the k-1 layer convolutional neural network is represented. Then, the output of the ith row and the jth column on the c channel of the kth convolutional neural network is:
Figure BDA0002935167170000131
wherein,
Figure BDA0002935167170000132
and
Figure BDA0002935167170000133
respectively representing the weight tensor and bias vector of the k-th layer,
Figure BDA0002935167170000134
representing a vector dimension of Ck
CkRepresents the number of channels output by the k-th convolutional neural network,
Ck-1represents the number of channels output by the k-1 layer convolutional neural network,
m denotes the total number of rows and the total number of columns of the convolution filter,
Figure BDA0002935167170000135
represents the c-th channel, z represents the incremental number of channels for the k-1 layer, the weight tensor for the k-th layer,
CKdenotes the number of channels, CK=1...Ck-1
m represents the number of rows of the convolution filter,
n represents the number of columns of the product filter,
Figure BDA0002935167170000136
is the output of the k-1 th layer of the z-channel CNN,
Figure BDA0002935167170000137
representing the bias vector for the c-th channel of the k-th layer,
s is the step size of the convolution filter sliding over the input data,
therefore, we can obtain the output matrix on the c channel on the k convolutional neural network as:
Figure BDA0002935167170000138
wherein,
Figure BDA0002935167170000139
represents the output of row 1 and column 1 on the c channel on the k convolutional neural network,
Figure BDA00029351671700001310
representing the c-th channel on the k-th convolutional neural networkThe output of row 2 and column 1 on the lane,
Figure BDA00029351671700001311
indicating the c channel on the k convolutional neural network
Figure BDA00029351671700001312
The output of column 1 of the row,
Figure BDA00029351671700001313
represents the output of row 1 and column 2 on the c channel on the k convolutional neural network,
Figure BDA0002935167170000141
represents the output of row 2 and column 2 on the c channel on the k convolutional neural network,
Figure BDA0002935167170000142
indicating the c channel on the k convolutional neural network
Figure BDA0002935167170000143
The output of column 2 of the row,
Figure BDA0002935167170000144
represents line 1, on the c channel on the k convolutional neural network
Figure BDA0002935167170000145
The output of the column is then,
Figure BDA0002935167170000146
represents line 2, on the c channel on the k convolutional neural network
Figure BDA0002935167170000147
The output of the column is then,
Figure BDA0002935167170000148
indicating the c channel on the k convolutional neural network
Figure BDA0002935167170000149
Go to the first
Figure BDA00029351671700001410
The output of the column is then,
Hk-1and Wk-1Respectively representing the number of rows and the number of columns of the matrix,
m denotes the total number of rows and the total number of columns of the convolution filter,
s is the step size of the convolution filter sliding over the input data,
therefore, the output data of the k-th CNN layer is:
Figure BDA00029351671700001411
wherein,
Figure BDA00029351671700001412
representing the 1 st channel output of the k-th layer output matrix,
Figure BDA00029351671700001413
represents the 2 nd channel output of the k-th layer output matrix,
Figure BDA00029351671700001414
c for k output matrixkThe output of each channel is used as the output,
Figure BDA00029351671700001415
the dimension of the expression matrix is
Figure BDA00029351671700001416
Hk-1And Wk-1Respectively representing the number of rows and the number of columns of the matrix,
m denotes the total number of rows and the total number of columns of the convolution filter,
s is the step size of the convolution filter sliding over the input data,
Ckrepresenting the number of channels output by the k layer convolutional neural network;
similarly, in order to perform nonlinear transformation on data, each layer of the convolutional neural network needs to be followed by an activation function, so that the transformation formula of each layer of the convolutional neural network can be abbreviated as follows:
Figure BDA00029351671700001417
wherein,
Figure BDA00029351671700001418
as output data of the k-th layer of the CNN,
Figure BDA00029351671700001419
and
Figure BDA00029351671700001420
respectively representing the weight tensor and bias vector of the k-th layer,
Figure BDA0002935167170000151
is the output of the k-1 layer of CNN;
f (·) is the ReLu activation function f (x) max (0, x).
Referring to FIG. 3, the Joint partitioning (ASIS) modular structure, ASIS is composed of 2 paths, where in1 and in2 paths are both the outputs of the network of PointNet, i.e., the
Figure BDA0002935167170000152
U is a feature matrix of the output and is matched with each otherAnd (6) mixing. in1 is a semantic-aware example partition, the semantic feature matrix is first passed through a full-link layer with a ReLU activation function, and at this time, the semantic feature matrix of in2 is added to in1, which can be described as: in1+ FC (in2), FC (·) is Fully Connected (FC) operation; and then outputting the data as out1 through a full connection layer, wherein the output shape is N × E, where N represents the number of all point cloud data, and E is the embedding dimension, and the embedding of the point cloud refers to the example relationship between points, that is, points close in space belong to the same example. The way may distinguish the point cloud data into different instances. In addition, the in2 path is a semantic segmentation fused with the instance, and the rest points with fixed quantity adjacent to each point embedded in the instance space, including the rest points, are searched by a K neighbor algorithm; the reason for using the K-nearest neighbor algorithm here is that, since the point clouds belong to the same instance, they are close to each other in space, and the points of different instances are separated from each other. After the K neighbor algorithm, the point cloud belonging to the same instance can be obtained, and then Aggregation is realized with in2 paths, and A in FIG. 3 represents Aggregation. And finally, outputting the data as out2 through a full connection layer, wherein the output shape is NxC, N represents the number of all point cloud data, and C represents the number of semantic categories. Through the combined segmentation module, the feature matrix further strengthens the feature connotation and becomes an example segmentation feature matrix with semantic perception capability, and the feature matrix is superior to the feature matrix directly used. Finally, the final instance label can be obtained by using an instance segmentation classifier realized by mean-shift clustering.
For the network structure of the ASIS joint partitioning module, the full connection layer is used as above, and its details are as follows:
the ReLu activation function can enable the network to have the capability of learning nonlinear characteristics, and the formula is as follows:
f(s)=max(0,s),
where s is the input sample.
We use the K-nearest neighbor algorithm, the details of which can be described as follows:
S-A, calculating the distance between each point in the known category datA set and the current point, and using A Euclidean distance formulA:
Figure BDA0002935167170000161
where d represents the distance between each point and the current point,
(x0,y0,z0) Expressed as the coordinates of the current point cloud,
(x1,y1,z1) Represented as another point cloud coordinate.
S-B, calculating the distances between all datA in the datA set and the current point by the calculation method in S-A, and sequencing the datA according to increasing order to obtain [ d1,d2,...,dN],
d1Represents the distance between the 1 st point and the current point,
dNrepresents the distance between the nth point and the current point,
n represents the number of all point cloud data.
S-C, selection [ d1,d2,...,dN]The first p points of (1) are the first points from the current point.
And S-D, determining the frequency of the p points in the category.
And S-E, returning the category with the highest occurrence frequency of the previous p points as the prediction classification of the current point.
The example segmentation classifier used by us can be realized by mean-shift clustering, and the classifier takes an example segmentation feature matrix with semantic perception capability as output, and can obtain semantic labels and example labels of predicted points. The following lists that the mean shift clustering algorithm relates to a formula, wherein the offset mean calculation mode is as follows:
Figure BDA0002935167170000162
wherein S ishRepresents a high-dimensional sphere region having a radius h with x as the center point, and v is included in ShNumber of points in range, xiIs shown to be contained in ShThe center point update formula for points within the range is as follows:
xt+1=Mt+xt
wherein M istIs an offset mean value obtained in a t state; x is the number oftIs the center in the t state, xt+1Which is the center of time t + 1.
The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S3 comprises the following steps:
s3-1, adopting an S3DIS indoor point cloud data set manufactured by Stanford university computer science and civil engineering and environmental engineering system together as a PA-Net training data set, wherein an input point cloud matrix only comprises coordinate information (x, y, z) and color information (R, G, B) of each point, the category of data set annotation comprises 13 categories of building elements, furniture elements and sundries, wherein the building elements comprise ceilings, floors, walls, columns, beams, windows and doors, the furniture elements comprise tables, chairs, bookcases, sofas and blackboards, and other elements with fewer occurrence times or without interest are classified into the sundries. Notably, because the dataset has an instance-level annotation for each point cloud, meaning that noise attributed to clutter is removed, no point cloud denoising is required when using the dataset. The S3DIS indoor point cloud data set meets the requirements needed by point cloud example segmentation, so the data set is selected as a segmentation and experiment object.
And S3-2, training the PA-Net by using the S3DIS indoor point cloud data set.
And S3-3, training the model for multiple times by continuously adjusting the training parameters, and selecting the model with the best effect as the model used in S3.
The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S4 comprises the following steps:
s4-1, inputting the point cloud data of the indoor building scene needing to be identified, which is collected in the S1, into the PA-Net model trained in the S2;
s4-2, a main network of the PA-Net network framework is responsible for performing primary feature extraction on input point cloud, and the point cloud data needs to be subjected to feature extraction through multilayer convolution and pooling to obtain a primary feature matrix;
s4-3, segmenting the feature matrix by using a joint segmentation module in PA-Net to obtain an example segmentation feature matrix with semantic perception capability;
and S4-4, sending the obtained example segmentation feature matrix with the semantic perception information into an example segmentation classifier, and performing example segmentation to obtain an example segmentation matrix.
The force segmentation classifier has semantic perception capability, and can perform instance segmentation according to an instance segmentation matrix with semantic perception information, and the segmented result is the instance segmentation matrix, so that the force segmentation classifier does not contain semantic information.
The BIM three-dimensional model reconstruction method based on PA-Net is characterized in that the S5 comprises the following steps:
s5-1, according to the IFC standard, selecting Autodesk Revit desktop application software as an IFC generator;
and S5-2, sending the obtained example segmentation matrix into an IFC generator to obtain a BIM model.
The flow chart of the steps of the embodiment of the invention is shown in FIG. 4:
s001, start.
And S002, collecting point cloud data of the indoor building scene to be identified by adopting a Matterport scanner so as to generate a required example partition matrix.
S003, the model is trained using the S3DIS indoor point cloud data set, which is made by the Stanford university computer science together with the civil engineering and environmental engineering system and is collected by a Matterport scanner, and the data set contains three-dimensional coordinates (x, y, z) and color (R, G, B) information of the point cloud.
And S004, the data set needs to be segmented for the training and testing of the PA-Net, and all samples of the S3DIS indoor point cloud data set are divided into a training set and a testing set in a ratio of 9: 1. The training set is used for training the model, and the testing set is used for testing the model.
S005, designing the PA-Net training scheme as follows: setting the cycle traversal time epoch as 1000, namely cyclically traversing 1000 training sets; carrying out random disordering sequence processing on the training set samples; the number of training samples, batch-size, is set to 200, i.e. 200 samples are sent into the network per training round; the optimizer uses ADAM, i.e. adaptive moment estimation; the loss function uses MSE, i.e. mean square error; the conditions for saving the model are set to save the model every 10 rounds of training.
S006, loading the trained model, sending all sample characteristics of the test set into the model as input data for prediction, comparing the predicted target value with the actual target value, namely calculating the MSE (mean square error) of the predicted target value and the actual target value, and averaging the MSE obtained by all samples. When MSE is less than 10-3In this case, the generalization performance of the model is considered to be good, and online prediction is possible, and step S007 is executed. When MSE is greater than 10-3In the meantime, it is considered that the generalization performance of the model is weak, and the hyper-parameters in the network need to be adjusted, the step S005 is executed to retrain, and the model is stored and tested again. Wherein the ADAM optimizer parameter updating formula is as follows:
Figure BDA0002935167170000181
Figure BDA0002935167170000191
wherein, W represents a gradient,
b represents the deviation of the measured value,
w and b are used to update the network parameters,
alpha is a value representing the learning rate,
Figure BDA0002935167170000192
an exponential moving average of the gradient is represented,
Figure BDA0002935167170000193
which represents the square of the gradient of the square,
Figure BDA0002935167170000194
is a very small number, the prevented denominator is 0.
The mean square error MSE is calculated as follows:
Figure BDA0002935167170000195
wherein, yqRepresents the qth real value of the data,
Figure BDA0002935167170000196
represents the qth predictor of the network, for a total of T data.
And S007, continuously repeating the step S5, and selecting the PA-Net model with the best effect to generate an example segmentation matrix.
S008, inputting the collected point cloud data of the indoor building scene to be identified into a trained PA-Net model, and outputting the data PA-Net through calculation to obtain an example segmentation matrix H.
And S009, the example uses the Autodesk Revit desktop application software as an IFC generator, and the example division matrix H is used as the input of the IFC generator, so that the required BIM indoor building model can be obtained.
And S010, ending.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (10)

1. A BIM three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps:
s1, acquiring point cloud data of the indoor building scene to be identified by adopting a scanner;
s2, constructing the PA-Net network, and extracting the point cloud data and jointly dividing the feature matrix by the network;
s3, training the PA-Net network by adopting a training data set to obtain a trained model;
s4, inputting the point cloud data of the indoor building scene needing to be identified, which is collected in the S1, into the PA-Net model trained in the S3, and outputting to obtain an example segmentation matrix;
and S5, reversely generating the BIM by utilizing the IFC standard on the segmentation matrix containing the semantic information example obtained in the step S4.
2. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S2 comprises:
the PA-Net network comprises a point cloud processing network PointNet and a joint segmentation module ASIS,
s2-1, designing and building a point cloud processing network PointNet, wherein the network module is used for completing feature extraction of point cloud data;
s2-2, designing a joint segmentation module ASIS to complete semantic segmentation and instance segmentation of the feature matrix extracted by the PointNet.
3. The BIM three-dimensional reconstruction method based on deep learning of claim 2, wherein the S3 comprises:
the joint division network system consists of 3 parts: firstly, a main network PointNet in charge of point cloud feature extraction; a joint segmentation module which is responsible for fusing examples and semantic information, namely an ASIS module; the classifier responsible for example segmentation;
input point cloud matrix
Figure FDA0002935167160000011
Only coordinate information (x, y, z) and color information (R, G, B) of each point are included,
Figure FDA0002935167160000012
the size of the matrix dimension is represented as Nx 6, and N represents the number of all point cloud data;
and a main network of the PA-Net network framework is responsible for performing primary feature extraction on the input point cloud and obtaining a feature matrix.
4. The BIM three-dimensional reconstruction method based on deep learning of claim 2, wherein the PointNet comprises a PointNet network structure:
order to
Figure FDA0002935167160000021
Is the output vector of the k-1 layer of a fully connected FC network, lk-1,11 st neuron, l, of layer k-1 representing a fully connected networkk-1,2Representing the 2 nd neuron at layer k-1 of the fully-connected network,
Figure FDA0002935167160000022
represents the output of the nth neuron at layer k-1 of the fully-connected network, so the output of each neuron on the k-layer neural network is:
Figure FDA0002935167160000023
wherein N isk-1The number of the neurons on the hidden layer of the k-1 layer is shown,
Wk(i, j) weight matrix W representing the k-th hidden layerkThe ith row and the jth column of (1),
lk-1(j) the jth column offset representing the k-1 th layer,
bk(i) bias vector b representing the k-th hidden layerkThe number of the ith row of (a),
Wk(i) weight matrix W representing the k-th hidden layerkThe number of the ith row of (a),
lk-1indicating the bias of the k-1 th layer i.e. the input to the k-1 th layer network,
(·)Tthe transpose of the matrix is represented,
Nkindicates the number of neurons on the k-th hidden layer,
Wkand bkWeight matrix and bias vector for the k-th hidden layer respectively,
the output of the k-th hidden layer is then:
lk=Wklk-1+bk
wherein, WkAnd bkRespectively is a weight matrix and a bias vector of a k-th hidden layer;
lk-1is the input of the k-1 layer network;
will find lkAs input to the next network layer;
the 2D convolutional layer network consists of a plurality of convolutional filters, each filter processes data on different channels, and the data are subjected to convolutional summation through a sliding window;
order to
Figure FDA0002935167160000031
Is output data of the k-1 layer of CNN, wherein
Figure FDA0002935167160000032
The output of the k-1 th layer is shown,
Figure FDA0002935167160000033
representing the 1 st channel output of the k-1 th layer output matrix,
Figure FDA0002935167160000034
represents the 2 nd channel output of the k-1 th layer output matrix,
Figure FDA0002935167160000035
c for representing k-1 layer output matrixk-1The output of each channel is used as the output,
Figure FDA0002935167160000036
dimension size H representing k-1 layer outputk-1×Wk-1×Ck-1
Figure FDA0002935167160000037
Represents the output of the c-th channel of the k-1 th layer output matrix,
Figure FDA0002935167160000038
dimension size H representing k-1 layer outputk-1×Wk-1,Hk-1And Wk-1Respectively representing the number of rows and columns of the matrix, Ck-1Representing the number of channels output by the k-1 layer convolutional neural network; then, the output of the ith row and the jth column on the c channel of the kth convolutional neural network is:
Figure FDA0002935167160000039
wherein,
Figure FDA00029351671600000310
and bkRespectively representing the weight tensor and bias vector of the k-th layer,
Ckrepresents the number of channels output by the k-th convolutional neural network,
Ck-1represents the number of channels output by the k-1 layer convolutional neural network,
m denotes the total number of rows and the total number of columns of the convolution filter,
Figure FDA00029351671600000311
represents the c-th channel, z represents the incremental number of channels for the k-1 layer, the weight tensor for the k-th layer,
CKdenotes the number of channels, CK=1...Ck-1
m represents the number of rows of the convolution filter,
n represents the number of columns of the product filter,
Figure FDA00029351671600000312
is the output of the k-1 th layer of the z-channel CNN,
Figure FDA00029351671600000313
representing the bias vector for the c-th channel of the k-th layer,
s is the step size of the convolution filter sliding over the input data,
therefore, we can obtain the output matrix on the c channel on the k convolutional neural network as:
Figure FDA0002935167160000041
wherein,
Figure FDA0002935167160000042
represents the output of row 1 and column 1 on the c channel on the k convolutional neural network,
Figure FDA0002935167160000043
represents the output of row 2 and column 1 on the c channel on the k convolutional neural network,
Figure FDA0002935167160000044
indicating the c channel on the k convolutional neural network
Figure FDA0002935167160000045
The output of column 1 of the row,
Figure FDA0002935167160000046
represents the output of row 1 and column 2 on the c channel on the k convolutional neural network,
Figure FDA0002935167160000047
represents the output of row 2 and column 2 on the c channel on the k convolutional neural network,
Figure FDA0002935167160000048
indicating the c channel on the k convolutional neural network
Figure FDA0002935167160000049
The output of column 2 of the row,
Figure FDA00029351671600000410
represents line 1, on the c channel on the k convolutional neural network
Figure FDA00029351671600000411
The output of the column is then,
Figure FDA00029351671600000412
represents line 2, on the c channel on the k convolutional neural network
Figure FDA00029351671600000413
The output of the column is then,
Figure FDA00029351671600000414
indicating the c channel on the k convolutional neural network
Figure FDA00029351671600000415
Go to the first
Figure FDA00029351671600000416
The output of the column is then,
Hk-1and Wk-1Respectively representing the number of rows and the number of columns of the matrix,
m denotes the total number of rows and the total number of columns of the convolution filter,
s is the step size of the convolution filter sliding over the input data,
therefore, the output data of the k-th CNN layer is:
Figure FDA00029351671600000417
wherein,
Figure FDA00029351671600000418
representing the 1 st channel output of the k-th layer output matrix,
Figure FDA00029351671600000419
represents the 2 nd channel output of the k-th layer output matrix,
Figure FDA00029351671600000420
c for k output matrixkThe output of each channel is used as the output,
Figure FDA00029351671600000421
the dimension of the expression matrix is
Figure FDA00029351671600000422
Hk-1And Wk-1Respectively representing the number of rows and the number of columns of the matrix,
m denotes the total number of rows and the total number of columns of the convolution filter,
s is the step size of the convolution filter sliding over the input data,
Ckrepresenting the number of channels output by the k layer convolutional neural network;
similarly, in order to perform nonlinear transformation on data, each layer of the convolutional neural network needs to be followed by an activation function, so that the transformation formula of each layer of the convolutional neural network can be abbreviated as follows:
Figure FDA0002935167160000051
wherein,
Figure FDA0002935167160000052
as output data of the k-th layer of the CNN,
Figure FDA0002935167160000053
and
Figure FDA0002935167160000054
respectively representing the weight tensor and bias vector of the k-th layer,
Figure FDA0002935167160000055
is the output of the k-1 layer of CNN;
f (·) is the ReLu activation function f (x) max (0, x).
5. The BIM three-dimensional reconstruction method based on deep learning of claim 2, wherein the joint segmentation comprises joint segmentation of ASIS modular structure:
ASIS is composed of 2 paths, where in1 and in2 paths are both the outputs of the network of PointNet, i.e.
Figure FDA0002935167160000056
U being an outputFeature matrices and are matched with each other; in1 is an example partition, the semantic feature matrix is first passed through the full connection layer with the ReLU activation function, and at this time, the semantic feature matrix of in2 is added to in1, which can be described as: in1+ FC (in2), FC (·) is fully connected FC operation; then outputting the data as out1 through a full connection layer, wherein the output shape is NXE, N represents the number of all point cloud data, E is the embedding dimension, the embedding of the point cloud refers to the example relation between points, namely, the points close to each other in the space belong to the same example, and the path can distinguish the point cloud data into different examples;
in addition, the in2 path is a semantic segmentation fused with the instance, and a fixed number of other points adjacent to each point embedded in the instance space, including the other points, are searched through a K neighbor algorithm; after the K neighbor algorithm, point clouds belonging to the same instance can be obtained, and then the aggregation is realized with the in2 path.
6. The BIM three-dimensional reconstruction method based on deep learning of claim 5, wherein the ASIS module structure comprises a network structure of an ASIS joint segmentation module:
the ReLu activation function can enable the network to have the capability of learning nonlinear characteristics, and the formula is as follows:
f(s)=max(0,s),
where s is the input sample;
we use the K-nearest neighbor algorithm, the details of which can be described as follows:
S-A, calculating the distance between each point in the known category datA set and the current point, using the Euclidean distance formulA,
Figure FDA0002935167160000061
where d represents the distance between each point and the current point,
(x0,y0,z0) Expressed as the coordinates of the current point cloud,
(x1,y1,z1) Watch (A)Shown as another point cloud coordinate;
S-B, calculating the distances between all datA in the datA set and the current point by the calculation method in S-A, and sequencing the datA according to increasing order to obtain [ d1,d2,...,dN],
d1Represents the distance between the 1 st point and the current point,
dNrepresents the distance between the nth point and the current point,
n represents the number of all point cloud data;
S-C, selection [ d1,d2,...,dN]The first p points in the list have the first distance to the current point;
S-D, determining the frequency of the p points in the category;
S-E, returning the category with the highest frequency of the previous p points as the prediction classification of the current point;
the example segmentation classifier is realized by a mean shift clustering method, the classifier takes an example segmentation characteristic matrix with semantic perception capability as output, and the calculation mode of the offset mean is as follows:
Figure FDA0002935167160000062
wherein S ishRepresents a high-dimensional sphere region having a radius h with x as the center point, and v is included in ShNumber of points in range, xiIs shown to be contained in ShThe center point update formula for points within the range is as follows:
xt+1=Mt+xt
wherein M istIs an offset mean value obtained in a t state; x is the number oftIs the center in the t state, xt+1Which is the center of time t + 1.
7. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S3 comprises:
s3-1, adopting an S3DIS indoor point cloud data set manufactured by the Stanford university computer science and the civil engineering and environmental engineering system together as a PA-Net training data set, wherein an input point cloud matrix only comprises coordinate information (x, y, z) and color information (R, G, B) of each point, the category of data set annotation comprises building elements, furniture elements and sundries, the building elements comprise ceilings, floors, walls, columns, beams, windows and doors, and the furniture elements comprise tables, chairs, bookcases, sofas and blackboards;
s3-2, training the PA-Net by using the S3DIS indoor point cloud data set;
and S3-3, training the model for multiple times by continuously adjusting the training parameters, and selecting the model with the best effect as the model used in S3.
8. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S4 comprises:
s4-1, inputting the point cloud data of the indoor building scene needing to be identified, which is collected in the S1, into the PA-Net model trained in the S2;
s4-2, a main network of the PA-Net network framework is responsible for performing primary feature extraction on input point cloud, and the point cloud data needs to be subjected to feature extraction through multilayer convolution and pooling to obtain a primary feature matrix;
s4-3, segmenting the feature matrix by using a joint segmentation module in PA-Net to obtain an example segmentation feature matrix with semantic perception capability;
and S4-4, sending the obtained example segmentation feature matrix with the semantic perception information into an example segmentation classifier, and performing example segmentation to obtain an example segmentation matrix.
9. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the S5 comprises:
s5-1, according to the IFC standard, selecting Autodesk Revit desktop application software as an IFC generator;
and S5-2, sending the obtained example segmentation matrix into an IFC generator to obtain a BIM model.
10. The BIM three-dimensional reconstruction method based on deep learning of claim 1, wherein the point cloud data is a data containing three-dimensional coordinates and RGB color information;
the training data set is S3DIS established by Stanford university according to indoor scenes;
the scanner is a Matterport three-dimensional rapid scanner;
the point cloud data of the indoor building scene to be identified needs to be taken out of the small furniture or sundries first, so as to ensure accurate identification after successful acquisition.
CN202110160200.3A 2021-02-05 2021-02-05 BIM three-dimensional reconstruction method based on deep learning Pending CN112785694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110160200.3A CN112785694A (en) 2021-02-05 2021-02-05 BIM three-dimensional reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110160200.3A CN112785694A (en) 2021-02-05 2021-02-05 BIM three-dimensional reconstruction method based on deep learning

Publications (1)

Publication Number Publication Date
CN112785694A true CN112785694A (en) 2021-05-11

Family

ID=75760989

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110160200.3A Pending CN112785694A (en) 2021-02-05 2021-02-05 BIM three-dimensional reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN112785694A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256640A (en) * 2021-05-31 2021-08-13 北京理工大学 Method and device for partitioning network point cloud and generating virtual environment based on PointNet
CN113673423A (en) * 2021-08-19 2021-11-19 丽水学院 Point cloud feature extraction method based on affinity and sparsity matrix
WO2023155113A1 (en) * 2022-02-18 2023-08-24 Huawei Technologies Co.,Ltd. Computer-implemented building modeling method and system
CN116704137A (en) * 2023-07-27 2023-09-05 山东科技大学 Reverse modeling method for point cloud deep learning of offshore oil drilling platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349247A (en) * 2018-04-08 2019-10-18 哈尔滨工业大学 A kind of indoor scene CAD 3D method for reconstructing based on semantic understanding
CN111192270A (en) * 2020-01-03 2020-05-22 中山大学 Point cloud semantic segmentation method based on point global context reasoning
WO2022252274A1 (en) * 2021-05-31 2022-12-08 北京理工大学 Point cloud segmentation and virtual environment generation method and apparatus based on pointnet network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110349247A (en) * 2018-04-08 2019-10-18 哈尔滨工业大学 A kind of indoor scene CAD 3D method for reconstructing based on semantic understanding
CN111192270A (en) * 2020-01-03 2020-05-22 中山大学 Point cloud semantic segmentation method based on point global context reasoning
WO2022252274A1 (en) * 2021-05-31 2022-12-08 北京理工大学 Point cloud segmentation and virtual environment generation method and apparatus based on pointnet network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHARLES R. QI等: "PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation", 2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2017 (2017-12-31), pages 77 - 85 *
XINLONG WANG等: "Associatively segmenting instances and semantics in point clouds", 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, 31 December 2019 (2019-12-31), pages 4091 - 4100 *
严飞: "基于Mean Shift算法的目标跟踪系统设计", 电子测量技术, vol. 43, no. 23, 31 December 2020 (2020-12-31), pages 6 - 11 *
朱攀,史健勇: "基于AISI网络的BIM三维重建方法研究", 图学学报, vol. 41, no. 5, 31 October 2020 (2020-10-31), pages 839 - 846 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113256640A (en) * 2021-05-31 2021-08-13 北京理工大学 Method and device for partitioning network point cloud and generating virtual environment based on PointNet
CN113256640B (en) * 2021-05-31 2022-05-24 北京理工大学 Method and device for partitioning network point cloud and generating virtual environment based on PointNet
CN113673423A (en) * 2021-08-19 2021-11-19 丽水学院 Point cloud feature extraction method based on affinity and sparsity matrix
WO2023155113A1 (en) * 2022-02-18 2023-08-24 Huawei Technologies Co.,Ltd. Computer-implemented building modeling method and system
CN116704137A (en) * 2023-07-27 2023-09-05 山东科技大学 Reverse modeling method for point cloud deep learning of offshore oil drilling platform
CN116704137B (en) * 2023-07-27 2023-10-24 山东科技大学 Reverse modeling method for point cloud deep learning of offshore oil drilling platform

Similar Documents

Publication Publication Date Title
CN112785694A (en) BIM three-dimensional reconstruction method based on deep learning
CN111476181B (en) Human skeleton action recognition method
CN111079561A (en) Robot intelligent grabbing method based on virtual training
CN108549844A (en) A kind of more people's Attitude estimation methods based on multi-layer fractal network and joint relatives' pattern
CN102360494B (en) Interactive image segmentation method for multiple foreground targets
CN112560918B (en) Dish identification method based on improved YOLO v3
CN108171249B (en) RGBD data-based local descriptor learning method
CN112101430A (en) Anchor frame generation method for image target detection processing and lightweight target detection method
CN108280488A (en) Object identification method is captured based on shared neural network
CN110990608A (en) Three-dimensional model retrieval method based on Simese structure bidirectional long-time and short-time memory network
CN109740672B (en) Multi-stream feature distance fusion system and fusion method
CN110516537A (en) A kind of face age estimation method based on from step study
CN113128424A (en) Attention mechanism-based graph convolution neural network action identification method
CN110110794A (en) The image classification method that neural network parameter based on characteristic function filtering updates
Toda et al. Real-time 3d point cloud segmentation using growing neural gas with utility
CN112509017A (en) Remote sensing image change detection method based on learnable difference algorithm
CN103839280B (en) A kind of human body attitude tracking of view-based access control model information
CN116844004A (en) Point cloud automatic semantic modeling method for digital twin scene
Fan et al. Image-based approach to reconstruct curling in continuum structures
CN114120045A (en) Target detection method and device based on multi-gate control hybrid expert model
CN112418445A (en) Intelligent site selection fusion method based on machine learning
CN106296747A (en) Robust multi-model approximating method based on structure decision diagram
CN110826604A (en) Material sorting method based on deep learning
Chen et al. Pointformer: A dual perception attention-based network for point cloud classification
CN110610494A (en) Multi-person collaborative interactive image segmentation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination