CN113920256A - Three-dimensional reconstruction method, device and equipment for large scene - Google Patents

Three-dimensional reconstruction method, device and equipment for large scene Download PDF

Info

Publication number
CN113920256A
CN113920256A CN202111529101.4A CN202111529101A CN113920256A CN 113920256 A CN113920256 A CN 113920256A CN 202111529101 A CN202111529101 A CN 202111529101A CN 113920256 A CN113920256 A CN 113920256A
Authority
CN
China
Prior art keywords
room
dimensional
sub
embedded
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111529101.4A
Other languages
Chinese (zh)
Inventor
方璐
郑添
张国庆
季梦奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202111529101.4A priority Critical patent/CN113920256A/en
Publication of CN113920256A publication Critical patent/CN113920256A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention discloses a three-dimensional reconstruction method, a three-dimensional reconstruction device and three-dimensional reconstruction equipment for a large scene. Constructing a plurality of sub-graphs according to the collected RGBD key frames, and constructing an initial three-dimensional model corresponding to a target scene based on the plurality of sub-graphs; performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic tags and example tags contained in the initial three-dimensional model; detecting a room contained in the initial three-dimensional model according to the semantic tag, and determining the room as a new room; performing loop detection on the historical room according to the instance tag contained in the new room to obtain the historical room which belongs to the same room as the new room; optimizing the poses of the sub-images, and constructing a target three-dimensional model based on the optimized sub-images. The three-dimensional reconstruction method for the large scene provided by the embodiment of the invention can realize the three-dimensional reconstruction of the large-scale scene and improve the precision of the three-dimensional reconstruction.

Description

Three-dimensional reconstruction method, device and equipment for large scene
Technical Field
The embodiment of the invention relates to the technical field of image processing, in particular to a three-dimensional reconstruction method, a three-dimensional reconstruction device and three-dimensional reconstruction equipment for a large scene.
Background
Indoor three-dimensional reconstruction is an important problem in the field of computer vision, and the aim of the indoor three-dimensional reconstruction is to obtain a three-dimensional geometric model of an indoor scene. With the popularization of consumer-grade RGBD depth cameras in recent years, the reconstruction of indoor three-dimensional models by using the RGBD depth cameras is a popular research direction. Compared with the traditional laser ranging equipment, the RGBD three-dimensional modeling has the advantages of low cost and strong portability of the sensor, and is carried on a plurality of portable equipment such as a mobile phone and a tablet personal computer at present. The method for reconstructing the indoor three-dimensional model by using the mobile equipment has great significance for mobile phone intelligent application. For example, virtual reality gaming or intelligent navigation require knowledge of the geometry of the scene.
Existing three-dimensional modeling methods can achieve relatively high-precision geometric modeling in a small range, but are limited by the complexity of a reconstruction system, and the area of a scene which can be reconstructed by the methods is usually limited to a room level. When the reconstruction area is increased, the problems of insufficient memory and great performance reduction inevitably occur in the existing method.
Disclosure of Invention
The embodiment of the invention provides a three-dimensional reconstruction method, a three-dimensional reconstruction device and three-dimensional reconstruction equipment for a large scene, which can realize the three-dimensional reconstruction of the large-scale scene and improve the three-dimensional reconstruction precision.
In a first aspect, an embodiment of the present invention provides a three-dimensional reconstruction method for a large scene, including:
constructing a plurality of sub-graphs according to the collected RGBD key frames, and constructing an initial three-dimensional model corresponding to a target scene based on the plurality of sub-graphs;
performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic tags and example tags contained in the initial three-dimensional model;
detecting a room contained in the initial three-dimensional model according to the semantic tag, and determining the room as a new room;
performing loop detection on the historical room according to the instance tag contained in the new room to obtain the historical room which belongs to the same room as the new room;
optimizing the poses of the sub-images, and constructing a target three-dimensional model based on the optimized sub-images.
In a second aspect, an embodiment of the present invention further provides a device for three-dimensional reconstruction of a large scene, including:
the initial three-dimensional model building module is used for building a plurality of sub-graphs according to the collected RGBD key frames and building an initial three-dimensional model corresponding to the target scene based on the sub-graphs;
the semantic and example segmentation module is used for performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic tags and example tags contained in the initial three-dimensional model;
a new room determining module, configured to detect, according to the semantic tag, a room included in the initial three-dimensional model, and determine the room as a new room;
the loop detection module is used for carrying out loop detection on the historical room according to the example label contained in the new room to obtain the historical room which belongs to the same room as the new room;
and the pose optimization module is used for optimizing the poses of the sub-images and constructing a target three-dimensional model based on the optimized sub-images.
In a third aspect, an embodiment of the present invention further provides a computer device, where the computer device includes: comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the three-dimensional reconstruction method of the large scene according to the embodiment of the invention when executing the program.
The embodiment of the invention discloses a three-dimensional reconstruction method, a three-dimensional reconstruction device, a three-dimensional reconstruction equipment and a storage medium for a large scene. Constructing a plurality of sub-graphs according to the collected RGBD key frames, and constructing an initial three-dimensional model corresponding to a target scene based on the plurality of sub-graphs; performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic tags and example tags contained in the initial three-dimensional model; detecting a room contained in the initial three-dimensional model according to the semantic tag, and determining the room as a new room; performing loop detection on the historical room according to the instance tag contained in the new room to obtain the historical room which belongs to the same room as the new room; optimizing the poses of the sub-images, and constructing a target three-dimensional model based on the optimized sub-images. According to the three-dimensional reconstruction method for the large scene, provided by the embodiment of the invention, when a new room and a history room detected by loop belong to the same room, the poses of a plurality of sub-images are optimized, the three-dimensional reconstruction of the large-scale scene can be realized, and the three-dimensional reconstruction precision is improved.
Drawings
Fig. 1 is a flowchart of a three-dimensional reconstruction method of a large scene in a first embodiment of the present invention;
FIG. 2 is a diagram illustrating a three-dimensional sparse convolutional neural network according to a first embodiment of the present invention;
fig. 3 is a schematic structural diagram of a three-dimensional reconstruction apparatus for a large scene according to a second embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer device in a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a three-dimensional reconstruction method for a large scene according to an embodiment of the present invention, where the present embodiment is applicable to a case of three-dimensional reconstruction for a large-scale scene, and the method may be executed by a three-dimensional reconstruction apparatus for a scene, where the apparatus may be composed of hardware and/or software, and may be generally integrated in a device having a three-dimensional reconstruction function for a scene, and the device may be an electronic device such as a server or a server cluster. As shown in fig. 1, the method specifically includes the following steps:
s110, constructing a plurality of sub-graphs according to the collected RGBD key frames, and constructing an initial three-dimensional model corresponding to the target scene based on the plurality of sub-graphs.
Wherein the RGBD keyframes may be acquired using an RGB-D depth sensor. Each sub-graph may be constructed from a set number of RGBG key frames. The set number can be set arbitrarily, for example: a sub-graph is constructed every 12 RGBD key frames. In this embodiment, a subgraph strategy is adopted to realize the reconstruction of the three-dimensional model, that is, firstly, the acquired RGBD key frame is constructed into a plurality of subgraphs, and then the three-dimensional model is constructed according to the plurality of subgraphs.
Specifically, the process of constructing multiple sub-graphs according to the collected RGBD keyframes may be: constructing a first set number of RGBD key frames into a sub-graph; storing the subgraph into a set queue; and when the set queue is full, sending the sub-graph which is stored in the set queue to an external memory for storage, simplifying the sub-graph, and storing the simplified sub-graph into a set cache.
In this embodiment, a setting queue (for storing subgraphs) is maintained, and the maximum number of cacheable subgraphs of the setting queue needs to be set first, and the maximum number can be determined according to the performance of the computer. The RGB-D depth sensor collects RGBD key frames in real time, when the collected RGBD key frames reach a set number, the RGBD key frames of the set number are fused in a distance field truncation mode to construct a new sub-graph, and the new sub-graph is stored in a set queue. And setting the subgraphs in the queue as high-precision three-dimensional models.
When the setting queue is full, sending the subgraph stored in the setting queue at the earliest time (namely the subgraph at the head of the queue) to an external memory for storage, simplifying the subgraph at the head of the queue, and storing the simplified subgraph in a setting cache.
The sub-graph simplification mode can be realized by adopting any three-dimensional grid simplification method. The simplified subgraph is a three-dimensional model which is small in size and reserves most of set information.
Accordingly, the process of constructing the initial three-dimensional model corresponding to the target scene based on the plurality of sub-images may be: and constructing an initial three-dimensional model corresponding to the target scene based on the subgraphs in the set queue and the simplified subgraphs.
Specifically, the high-precision sub-images and the simplified sub-images in the set queue are put together for rendering, so that an initial three-dimensional model corresponding to the target scene is constructed.
And S120, performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic labels and example labels contained in the initial three-dimensional model.
The semantic segmentation can be understood as predicting which type of label each three-dimensional point in the initial three-dimensional model belongs to; example segmentation may be understood as distinguishing different individuals belonging to the same class of semantics on the basis of semantic segmentation. In this embodiment, the semantic label may be an object category involved in the house, for example: walls, floors, windows, tables, chairs, etc. In this embodiment, the semantic segmentation and the instance segmentation of the initial three-dimensional model may be implemented by using a machine learning model.
When the initial three-dimensional model is subjected to semantic segmentation and example segmentation, if the number of subgraphs participating in the construction of the three-dimensional model exceeds a set value, in order to reduce memory consumption and improve the prediction speed, the three-dimensional semantic and example segmentation is only carried out on the subgraphs with the latest set value number in the three-dimensional model each time. Wherein the set value may be set to 20.
And S130, detecting the room contained in the initial three-dimensional model according to the semantic label, and determining the room as a new room.
The semantic tags may include ground tags and wall tags. In this embodiment, connectivity of a three-dimensional point whose semantic label is a wall may be detected, if the three-dimensional point of the wall forms a closed region, a space enclosed by the closed region is a detected new room, otherwise, the initial three-dimensional model does not include a room.
Optionally, detecting a room included in the initial three-dimensional model according to the semantic tag, and determining the room as a new room may be: determining a plane formed by three-dimensional points with semantic labels as the ground as a target plane; projecting the three-dimensional points with semantic labels as walls to a target plane along the normal direction to obtain a two-dimensional structure chart; performing connectivity detection on the two-dimensional structure diagram, and if a closed area is formed by three-dimensional points of a wall projected to a target plane, the initial three-dimensional model comprises a room; and determining the three-dimensional space corresponding to the closed area as a new room.
Specifically, firstly, the ground is taken as a target plane, and three-dimensional points with semantic labels as walls are projected onto the target plane along the normal direction of the target plane to form a two-dimensional structure chart; and then detecting connectivity of the two-dimensional structure chart, and if a closed area is formed by three-dimensional points of the wall projected to the target plane, determining that the three-dimensional space corresponding to the closed area is a new room. And finally, extracting and processing the three-dimensional model of the new room, thereby obtaining semantic tags and example tags contained in the new room and facilitating subsequent processing.
And S140, performing loop detection on the historical room according to the example label contained in the new room, and obtaining the historical room which belongs to the same room as the new room.
Wherein a history room may be understood as a room detected in a history period. The purpose of the loop detection of the historical rooms is to judge whether the new room belongs to the same room as one of the historical rooms.
Optionally, performing loop detection on the history room according to the instance tag included in the new room, and the process of obtaining the history room belonging to the same room as the new room may be: acquiring a first embedded vector set of at least one instance contained in a new room and a second embedded vector set of at least one instance contained in a historical room; calculating the similarity of the new room and the historical room based on the first embedded vector set and the second embedded vector set; sequencing the similarity from big to small, and determining the historical rooms with the second set number in the front of the sequence as target rooms; and performing geometric verification on the new room and the target room to obtain a historical room which belongs to the same room as the new room.
Each room comprises at least one instance, each instance corresponds to one embedded vector, and therefore each room corresponds to one embedded vector set. Obtaining the embedded vector corresponding to the instance may be understood as performing feature extraction on the instance, and obtaining a vector formed by the features of the instance, that is, the embedded vector may be understood as a feature vector of the instance.
Specifically, the manner of obtaining the second embedded vector of the at least one instance contained in the history room of the first embedded vector of the at least one instance contained in the new room may be: acquiring first three-dimensional subgraphs respectively corresponding to at least one instance contained in a new room, and inputting the first three-dimensional subgraphs into a set convolutional neural network to obtain a first embedded vector set; and acquiring second three-dimensional subgraphs respectively corresponding to at least one instance contained in the history room, and inputting the second three-dimensional subgraphs into a set convolutional neural network to obtain a second embedded vector set.
In this embodiment, each of the divided instances corresponds to a three-dimensional sub-graph in the three-dimensional model. And inputting at least one first three-dimensional subgraph corresponding to the new room into a set convolutional neural network to obtain a first embedded vector set. And inputting at least one second three-dimensional subgraph corresponding to the historical room into a set convolutional neural network to obtain a second embedded vector set. The number of the history rooms may be 1 or more, and for a plurality of history rooms, a second embedded vector set corresponding to each history room needs to be determined.
Wherein, the set convolutional neural network may be a three-dimensional sparse convolutional neural network, including: sparse convolutional layers, normal convolutional layers, maximum pooling layers, and full-link layers. Exemplarily, fig. 2 is an exemplary diagram of a three-dimensional sparse convolutional neural network in the present embodiment, as shown in fig. 2, SSC represents a sparse convolutional layer, SC represents a normal convolutional layer, MP refers to a maximum pooling layer, and FC is a full connection layer. For example, SSC 8 × (3,1) represents a sparse convolution layer with an output channel number of 8, a convolution kernel size of 3, and a step size of 1, and the output of the network is a vector with a dimension of 256. The loss function used for the three-dimensional sparse convolutional neural network training is as follows:
Figure 100002_DEST_PATH_IMAGE002
(ii) a Wherein d (,) is a Euclidean distance function, and f (,) represents the set convolutional neural network; piAnd PjPoint cloud data for three-dimensional subgraphs of two randomly selected examples,
Figure 100002_DEST_PATH_IMAGE004
to be Pi Randomly rotating the resulting point cloud along the z-axis; m is a boundary parameter and can be set to 0.5. The network can be pre-trained on the ScanNet public dataset.
The similarity between rooms can be represented by the distance between rooms, wherein the larger the distance is, the smaller the room similarity is represented, and conversely, the smaller the distance is, the larger the room similarity is represented.
Specifically, the way of calculating the similarity between the new room and the historical room based on the first embedded vector set and the second embedded vector set may be: calculating the distance between each first embedded vector in the first embedded vector set and each second embedded vector in the second embedded vector set to obtain a plurality of distance values; determining a first number of embedded vectors contained in the first set of embedded vectors and a second number of embedded vectors contained in the second set of embedded vectors; calculating the product of the first number of embedded vectors and the second number of embedded vectors; and the summation value and the product of the distances are subjected to quotient operation, and the similarity of the new room and the historical room is obtained.
The calculation process of the similarity can be expressed as the following formula:
Figure 100002_DEST_PATH_IMAGE006
where P denotes a new room, Q denotes a history room,
Figure 100002_DEST_PATH_IMAGE008
for the ith first embedded vector, the vector is,
Figure 100002_DEST_PATH_IMAGE010
for the jth second embedded vector,
Figure 100002_DEST_PATH_IMAGE012
for the first set of embedded vectors,
Figure 100002_DEST_PATH_IMAGE014
a potential representing the first set of embedded vectors, i.e., a first number of embedded vectors contained in the first set of embedded vectors;
Figure DEST_PATH_IMAGE016
Figure DEST_PATH_IMAGE018
a potential representing the second set of embedded vectors, i.e., a second number of embedded vectors contained by the second set of embedded vectors.
In the above-mentioned formula,
Figure DEST_PATH_IMAGE020
representing the distance between the ith first embedding vector and the jth second embedding vector.
Specifically, the way of calculating the distance between each first embedded vector in the first embedded vector set and each second embedded vector in the second embedded vector set may be: calculating Euclidean distances between each first embedded vector in the first embedded vector set and each second embedded vector in the second embedded vector set; calculating the distance according to the Euclidean distance according to the following formula:
Figure DEST_PATH_IMAGE022
wherein, in the step (A),
Figure 560635DEST_PATH_IMAGE008
for the ith first embedded vector, the vector is,
Figure 884300DEST_PATH_IMAGE010
is the jth second embedding vector, d is the Euclidean distance between the first embedding vector and the second embedding vector,
Figure DEST_PATH_IMAGE024
is a first set value and is used as a second set value,
Figure DEST_PATH_IMAGE026
representing the distance between the two embedded vectors,
Figure DEST_PATH_IMAGE028
is a constant.
Specifically, after the similarity between the new room and each history room is obtained, the similarity is sorted from large to small, the history rooms with the second set number in the front of the sorting are extracted as target rooms, and then the new room and each target room are subjected to geometric verification to obtain the history rooms which belong to the same room as the new room. Wherein the second set number may be set to 15.
Specifically, geometric verification is performed on the new room and the target room, and a historical room belonging to the same room as the new room is obtained by: for each target room, forming an embedded vector pair by a first embedded vector and a second embedded vector which are closest to the Euclidean distance to obtain a plurality of embedded vector pairs; acquiring a three-dimensional sub-graph pair corresponding to the embedded vector pair; wherein the example three-dimensional subgraph pair is composed of a first three-dimensional subgraph and a second three-dimensional subgraph; extracting the gravity centers of a first three-dimensional subgraph and a second three-dimensional subgraph in the three-dimensional subgraph pair to obtain a plurality of gravity center pairs; fitting a transformation matrix to the plurality of gravity pairs according to a random sampling consistency algorithm; and if the proportion of the gravity center pair participating in the fitting exceeds a second set value, the target room and the new room are the same room.
Assume that a set of rooms P and Qi is given, where P is the new room and Qi is the ith target room. For each first embedding vector in P, a second embedding vector is selected in Qi that is the closest euclidean distance to the first embedding vector, constituting an embedding vector pair. And then obtaining an example three-dimensional sub-image pair corresponding to the embedded vector pair, and extracting the gravity centers of two three-dimensional sub-images in the three-dimensional sub-image pair to obtain a gravity center pair. For multiple barycentric pairs, a Random Sample Consensus algorithm (RANSAC) is used to fit the transformation matrix between P and Qi. Pairs of centroids that participate in the fitting are referred to as interior pairs, and pairs of centroids that do not participate in the fitting are referred to as exterior pairs. If the ratio of the inner point pair to the total gravity center pair exceeds a second set value, P and Qi form a set of loops, namely P and Qi form the same room.
S150, optimizing the poses of the multiple sub-images, and constructing a target three-dimensional model based on the multiple optimized sub-images.
In this embodiment, the pose of the subgraph can be optimized by using Levenberg-Marquardt (LM).
In this embodiment, when optimizing the pose of the sub-graph, the sub-graph constraint item, the room constraint item, and the gravity constraint item need to be acquired, and then the pose of the sub-graph is optimized based on the sub-graph constraint item, the room constraint item, and the gravity constraint item.
Wherein the connection between the subgraphs is completed by matching the frames. For a pair of matched frames, some corresponding points in a three-dimensional space are obtained through two-dimensional feature matching and filtering, and the distances of all the corresponding points form a subgraph constraint item which is shown as follows:
Figure DEST_PATH_IMAGE030
wherein, in the step (A),
Figure DEST_PATH_IMAGE032
representing a set of all matching sub-graph constructs,
Figure DEST_PATH_IMAGE034
the pose of the sub-graph i is represented,
Figure DEST_PATH_IMAGE036
representing the pose of sub-graph j matching sub-graph i,
Figure DEST_PATH_IMAGE038
representing the pose of the matching frame within sub-graph i relative to sub-graph i,
Figure DEST_PATH_IMAGE040
the representation represents the pose of the matching frame within sub-graph j relative to sub-graph j,
Figure DEST_PATH_IMAGE042
is a set of corresponding points, and is,
Figure DEST_PATH_IMAGE044
to match the number of sub-graph pairs.
In the above loop detection of the history rooms according to the example tags included in the new room, the history rooms belonging to the same room as the new room are obtained, and the new room and the history rooms form a set of loops. The gravity center pairs participating in the fitting of the random sampling consistency algorithm in one set of loops form constraints of the two rooms, and all the loop room sets can form a room constraint term, which is expressed as follows:
Figure DEST_PATH_IMAGE046
wherein, in the step (A),
Figure DEST_PATH_IMAGE048
representing the set of all looped groups of rooms,
Figure DEST_PATH_IMAGE050
the pose of the room i is represented,
Figure DEST_PATH_IMAGE052
showing the pose of room j forming a loop with the room,
Figure DEST_PATH_IMAGE054
is a group of inner point pairs, and the inner point pairs,
Figure DEST_PATH_IMAGE056
the number of room groups looped back.
In this embodiment, when the range of the reconstructed scene is large, the error of visual feature matching may cause the reconstructed ground to be curved. Through semantic information of the subgraphs, planes of each subgraph and the ground of a room can be extracted, constraint in the gravity direction can be realized by aligning normal vectors of the ground, and a calculation formula of a gravity constraint term is as follows:
Figure DEST_PATH_IMAGE058
wherein, in the step (A),
Figure DEST_PATH_IMAGE060
a rotation item of the pose is shown, G is the gravity direction,
Figure DEST_PATH_IMAGE062
is a normal vector of the ground surface,
Figure DEST_PATH_IMAGE064
the pose of the sub-graph i is represented,
Figure DEST_PATH_IMAGE066
the pose of the room i is represented,
Figure DEST_PATH_IMAGE068
representing the potential of the subgraph set, namely the number of subgraphs;
Figure DEST_PATH_IMAGE070
representing the potential of the set of rooms, i.e., the number of rooms.
The hardware components for implementing the embodiment may be: an RGB-D camera and a portable computer, both connected via a Universal Serial Bus (USB) or bluetooth or wireless network. The RGB-D camera is used for collecting RGBD data, and the portable computer is used for reconstructing a three-dimensional model according to the RGBD data.
According to the technical scheme of the embodiment, a plurality of sub-graphs are constructed according to the collected RGBD key frame, and an initial three-dimensional model corresponding to a target scene is constructed based on the plurality of sub-graphs; performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic tags and example tags contained in the initial three-dimensional model; detecting a room contained in the initial three-dimensional model according to the semantic tag, and determining the room as a new room; performing loop detection on the historical room according to the example label contained in the new room to obtain the historical room which belongs to the same room as the new room; and optimizing the poses of the multiple sub-images, and constructing a target three-dimensional model based on the multiple optimized sub-images. According to the three-dimensional reconstruction method for the scene, provided by the embodiment of the invention, when a new room and a history room detected by loop belong to the same room, the poses of a plurality of sub-images are optimized, the three-dimensional reconstruction of a large-scale scene can be realized, and the three-dimensional reconstruction precision is improved.
Example two
Fig. 3 is a schematic structural diagram of a three-dimensional reconstruction apparatus for a large scene according to a second embodiment of the present invention. As shown in fig. 3, the apparatus includes:
the initial three-dimensional model building module 210 is configured to build a plurality of sub-graphs according to the acquired RGBD key frames, and build an initial three-dimensional model corresponding to the target scene based on the plurality of sub-graphs;
a semantic and instance segmentation module 220, configured to perform semantic segmentation and instance segmentation on the initial three-dimensional model to obtain semantic tags and instance tags included in the initial three-dimensional model;
a new room determining module 230, configured to detect, according to the semantic tag, a room included in the initial three-dimensional model, and determine the room as a new room;
a loop detection module 240, configured to perform loop detection on the history room according to the instance tag included in the new room, so as to obtain a history room belonging to the same room as the new room;
and the pose optimization module 250 is used for optimizing the poses of the sub-images and constructing a target three-dimensional model based on the optimized sub-images.
Optionally, the initial three-dimensional model building module 210 is further configured to:
constructing a first set number of RGBD key frames into a sub-graph;
storing the subgraph into a set queue;
when the set queue is full, sending the sub-graph which is stored in the set queue to an external memory for storage, simplifying the sub-graph, and storing the simplified sub-graph into a set cache;
and constructing an initial three-dimensional model corresponding to the target scene based on the subgraphs in the set queue and the simplified subgraphs.
Optionally, the semantic tags include ground tags and wall tags; the new room determination module 230, further configured to:
determining a plane formed by three-dimensional points with semantic labels as the ground as a target plane;
projecting the three-dimensional points with semantic labels as walls to a target plane along the normal direction to obtain a two-dimensional structure chart;
performing connectivity detection on the two-dimensional structure diagram, and if a closed area is formed by three-dimensional points of a wall projected to a target plane, the initial three-dimensional model comprises a room;
and determining the three-dimensional space corresponding to the closed area as a new room.
Optionally, the loop detection module 240 is further configured to:
acquiring a first embedded vector set of at least one instance contained in a new room and a second embedded vector set of at least one instance contained in a historical room;
calculating the similarity of the new room and the historical room based on the first embedded vector set and the second embedded vector set;
sequencing the similarity from big to small, and determining the historical rooms with the second set number in the front of the sequence as target rooms;
and performing geometric verification on the new room and the target room to obtain a historical room which belongs to the same room as the new room.
Optionally, the loop detection module 240 is further configured to:
acquiring first three-dimensional subgraphs respectively corresponding to at least one instance contained in a new room, and inputting the first three-dimensional subgraphs into a set convolutional neural network to obtain a first embedded vector set;
and acquiring second three-dimensional subgraphs respectively corresponding to at least one instance contained in the history room, and inputting the second three-dimensional subgraphs into a set convolutional neural network to obtain a second embedded vector set.
Optionally, the convolutional neural network is set to include a sparse convolutional layer, a common convolutional layer, a maximum pooling layer and a full-link layer; and setting a convolutional neural network to train according to the following loss function:
Figure DEST_PATH_IMAGE071
wherein d (,) is a Euclidean distance function, and f (·) represents a set convolutional neural network; piAnd PjPoint cloud data for three-dimensional subgraphs of two randomly selected examples,
Figure DEST_PATH_IMAGE072
to be PiEdge ofRandomly rotating the obtained point cloud on the z axis; m is a boundary parameter.
Optionally, the loop detection module 240 is further configured to:
calculating the distance between each first embedded vector in the first embedded vector set and each second embedded vector in the second embedded vector set to obtain a plurality of distance values;
determining a first number of embedded vectors contained in the first set of embedded vectors and a second number of embedded vectors contained in the second set of embedded vectors;
calculating the product of the first number of embedded vectors and the second number of embedded vectors;
and the summation value and the product of the distances are subjected to quotient operation, and the similarity of the new room and the historical room is obtained.
Optionally, the loop detection module 240 is further configured to:
calculating Euclidean distances between each first embedded vector in the first embedded vector set and each second embedded vector in the second embedded vector set;
calculating the distance according to the Euclidean distance according to the following formula:
Figure DEST_PATH_IMAGE073
wherein, in the step (A),
Figure 183080DEST_PATH_IMAGE008
for the ith first embedded vector, the vector is,
Figure 250393DEST_PATH_IMAGE010
is the jth second embedding vector, d is the euclidean distance,
Figure DEST_PATH_IMAGE075
is a first set value and is used as a second set value,
Figure 205711DEST_PATH_IMAGE028
is a constant.
Optionally, the loop detection module 240 is further configured to:
for each target room, forming an embedded vector pair by a first embedded vector and a second embedded vector which are closest to the Euclidean distance to obtain a plurality of embedded vector pairs;
acquiring a three-dimensional sub-graph pair corresponding to the embedded vector pair; the three-dimensional subgraph pair consists of a first three-dimensional subgraph and a second three-dimensional subgraph;
extracting the gravity centers of a first three-dimensional subgraph and a second three-dimensional subgraph in the three-dimensional subgraph pair to obtain a plurality of gravity center pairs;
fitting a transformation matrix to the plurality of gravity pairs according to a random sampling consistency algorithm;
and if the proportion of the gravity center pair participating in the fitting exceeds a second set value, the target room and the new room are the same room.
Optionally, the pose optimization module 250 is further configured to:
acquiring a subgraph constraint item, a room constraint item and a gravity constraint item;
and optimizing the poses of the multiple sub-images based on the sub-image constraint item, the room constraint item and the gravity constraint item.
The device can execute the methods provided by all the embodiments of the invention, and has corresponding functional modules and beneficial effects for executing the methods. For details not described in detail in this embodiment, reference may be made to the methods provided in all the foregoing embodiments of the present invention.
EXAMPLE III
Fig. 4 is a schematic structural diagram of a computer device according to a third embodiment of the present invention. FIG. 4 illustrates a block diagram of a computer device 312 suitable for use in implementing embodiments of the present invention. The computer device 312 shown in FIG. 4 is only an example and should not bring any limitations to the functionality or scope of use of embodiments of the present invention. Device 312 is a computing device that typically functions for three-dimensional reconstruction of a scene.
As shown in FIG. 4, computer device 312 is in the form of a general purpose computing device. The components of computer device 312 may include, but are not limited to: one or more processors 316, a storage device 328, and a bus 318 that couples the various system components including the storage device 328 and the processors 316.
Bus 318 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Computer device 312 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 312 and includes both volatile and nonvolatile media, removable and non-removable media.
Storage 328 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 330 and/or cache Memory 332. The computer device 312 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 334 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, and commonly referred to as a "hard drive"). Although not shown in FIG. 4, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a Compact disk-Read Only Memory (CD-ROM), a Digital Video disk (DVD-ROM), or other optical media) may be provided. In these cases, each drive may be connected to bus 318 by one or more data media interfaces. Storage 328 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
Program 336 having a set (at least one) of program modules 326 may be stored, for example, in storage 328, such program modules 326 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which may comprise an implementation of a network environment, or some combination thereof. Program modules 326 generally carry out the functions and/or methodologies of embodiments of the invention as described herein.
The computer device 312 may also communicate with one or more external devices 314 (e.g., keyboard, pointing device, camera, display 324, etc.), with one or more devices that enable a user to interact with the computer device 312, and/or with any devices (e.g., network card, modem, etc.) that enable the computer device 312 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 322. Also, computer device 312 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), etc.) and/or a public Network, such as the internet, via Network adapter 320. As shown, network adapter 320 communicates with the other modules of computer device 312 via bus 318. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the computer device 312, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processor 316 executes programs stored in the storage 328 to perform various functional applications and data processing, such as implementing the three-dimensional reconstruction method of a large scene provided by the above-described embodiments of the present invention.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (12)

1. A three-dimensional reconstruction method of a large scene is characterized by comprising the following steps:
constructing a plurality of sub-graphs according to the collected RGBD key frames, and constructing an initial three-dimensional model corresponding to a target scene based on the plurality of sub-graphs;
performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic tags and example tags contained in the initial three-dimensional model;
detecting a room contained in the initial three-dimensional model according to the semantic tag, and determining the room as a new room;
performing loop detection on the historical room according to the instance tag contained in the new room to obtain the historical room which belongs to the same room as the new room;
optimizing the poses of the sub-images, and constructing a target three-dimensional model based on the optimized sub-images.
2. The method of claim 1, wherein constructing multiple subgraphs from the collected RGBD keyframes comprises:
constructing a first set number of RGBD key frames into a sub-graph;
storing the subgraph into a set queue;
when the set queue is full, sending the sub-graph which is stored in the set queue to an external memory for storage, simplifying the sub-graph, and storing the simplified sub-graph into a set cache;
correspondingly, constructing an initial three-dimensional model corresponding to the target scene based on the plurality of sub-images comprises:
and constructing an initial three-dimensional model corresponding to the target scene based on the subgraphs in the set queue and the simplified subgraphs.
3. The method of claim 1, wherein the semantic tags include floor tags and wall tags; detecting a room included in the initial three-dimensional model according to the semantic tags, and determining the room as a new room, wherein the method comprises the following steps:
determining a plane formed by three-dimensional points with semantic labels as the ground as a target plane;
projecting the three-dimensional points with semantic labels as walls to the target plane along the normal direction to obtain a two-dimensional structure chart;
performing connectivity detection on the two-dimensional structure diagram, wherein if a closed area is formed by three-dimensional points of a wall projected to the target plane, the initial three-dimensional model comprises a room;
and determining the three-dimensional space corresponding to the closed area as a new room.
4. The method of claim 1, wherein performing loop detection on the history room according to an instance tag included in the new room, and obtaining the history room belonging to the same room as the new room comprises:
acquiring a first embedded vector set of at least one instance contained in the new room and a second embedded vector set of at least one instance contained in a historical room;
calculating a similarity of the new room to a historical room based on the first set of embedded vectors and the second set of embedded vectors;
sequencing the similarity from big to small, and determining the historical rooms with the second set number in the front of the sequence as target rooms;
and performing geometric verification on the new room and the target room to obtain a historical room which belongs to the same room as the new room.
5. The method of claim 4, wherein obtaining a first embedded vector of at least one instance contained in the new room and a second embedded vector of at least one instance contained in the history room comprises:
acquiring first three-dimensional subgraphs respectively corresponding to at least one instance contained in the new room, and inputting the first three-dimensional subgraphs into a set convolutional neural network to obtain a first embedded vector set;
and acquiring second three-dimensional subgraphs respectively corresponding to at least one instance contained in the history room, and inputting the second three-dimensional subgraphs into the set convolutional neural network to obtain a second embedded vector set.
6. The method of claim 5, wherein the set convolutional neural network comprises a sparse convolutional layer, a normal convolutional layer, a maximum pooling layer, and a full-link layer; and the set convolutional neural network is trained according to the following loss function:
Figure DEST_PATH_IMAGE002
wherein d (,) is a Euclidean distance function, and f (,) represents the set convolutional neural network; piAnd PjPoint cloud data for three-dimensional subgraphs of two randomly selected examples,
Figure DEST_PATH_IMAGE004
to be PiRandomly rotating the resulting point cloud along the z-axis; m is a boundary parameter.
7. The method of claim 4, wherein computing the similarity of the new room to a historical room based on the first set of embedded vectors and the second set of embedded vectors comprises:
calculating the distance between each first embedded vector in the first embedded vector set and each second embedded vector in the second embedded vector set to obtain a plurality of distance values;
determining a first number of embedded vectors contained in the first set of embedded vectors and a second number of embedded vectors contained in the second set of embedded vectors;
calculating a product of the first number of embedded vectors and the second number of embedded vectors;
and the summation value of the distances is subjected to quotient operation with the product to obtain the similarity of the new room and the historical room.
8. The method of claim 7, wherein calculating a distance between each first embedded vector in the first set of embedded vectors and each second embedded vector in the second set of embedded vectors comprises:
calculating Euclidean distances between each first embedded vector in the first embedded vector set and each second embedded vector in the second embedded vector set respectively;
calculating the distance according to the Euclidean distance according to the following formula:
Figure DEST_PATH_IMAGE006
wherein, in the step (A),
Figure DEST_PATH_IMAGE008
for the ith first embedded vector, the vector is,
Figure DEST_PATH_IMAGE010
is the jth second embedding vector, d is the Euclidean distance between the first embedding vector and the second embedding vector,
Figure DEST_PATH_IMAGE012
is a first set value and is used as a second set value,
Figure DEST_PATH_IMAGE014
is a constant.
9. The method of claim 5, wherein geometrically validating the new room and the target room to obtain a historical room belonging to the same room as the new room comprises:
for each target room, forming an embedded vector pair by a first embedded vector and a second embedded vector which are closest to the Euclidean distance to obtain a plurality of embedded vector pairs;
acquiring a three-dimensional sub-image pair corresponding to the embedded vector pair; the three-dimensional subgraph pair consists of a first three-dimensional subgraph and a second three-dimensional subgraph;
extracting the gravity centers of a first three-dimensional subgraph and a second three-dimensional subgraph in the three-dimensional subgraph pair to obtain a plurality of gravity center pairs;
fitting a transformation matrix to the plurality of pairs of centers of gravity according to a random sampling consistency algorithm;
and if the proportion of the gravity center pair participating in the fitting exceeds a second set value, the target room and the new room are the same room.
10. The method of any of claims 1-9, wherein optimizing the pose of the plurality of sub-graphs comprises:
acquiring a subgraph constraint item, a room constraint item and a gravity constraint item;
optimizing the poses of the plurality of sub-graphs based on the sub-graph constraint item, the room constraint item, and the gravity constraint item.
11. An apparatus for three-dimensional reconstruction of large scenes, comprising:
the initial three-dimensional model building module is used for building a plurality of sub-graphs according to the collected RGBD key frames and building an initial three-dimensional model corresponding to the target scene based on the sub-graphs;
the semantic and example segmentation module is used for performing semantic segmentation and example segmentation on the initial three-dimensional model to obtain semantic tags and example tags contained in the initial three-dimensional model;
a new room determining module, configured to detect, according to the semantic tag, a room included in the initial three-dimensional model, and determine the room as a new room;
the loop detection module is used for carrying out loop detection on the historical room according to the example label contained in the new room to obtain the historical room which belongs to the same room as the new room;
and the pose optimization module is used for optimizing the poses of the sub-images and constructing a target three-dimensional model based on the optimized sub-images.
12. A computer device, the device comprising: comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method for three-dimensional reconstruction of a large scene according to any one of claims 1 to 10 when executing the program.
CN202111529101.4A 2021-12-15 2021-12-15 Three-dimensional reconstruction method, device and equipment for large scene Pending CN113920256A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111529101.4A CN113920256A (en) 2021-12-15 2021-12-15 Three-dimensional reconstruction method, device and equipment for large scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111529101.4A CN113920256A (en) 2021-12-15 2021-12-15 Three-dimensional reconstruction method, device and equipment for large scene

Publications (1)

Publication Number Publication Date
CN113920256A true CN113920256A (en) 2022-01-11

Family

ID=79248810

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111529101.4A Pending CN113920256A (en) 2021-12-15 2021-12-15 Three-dimensional reconstruction method, device and equipment for large scene

Country Status (1)

Country Link
CN (1) CN113920256A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114494610A (en) * 2022-04-14 2022-05-13 清华大学 Intelligent understanding system and device for real-time reconstruction of large scene light field
CN117274536A (en) * 2023-11-22 2023-12-22 北京飞渡科技股份有限公司 Live-action three-dimensional model reconstruction method and device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111815776A (en) * 2020-02-04 2020-10-23 山东水利技师学院 Three-dimensional building fine geometric reconstruction method integrating airborne and vehicle-mounted three-dimensional laser point clouds and streetscape images

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111815776A (en) * 2020-02-04 2020-10-23 山东水利技师学院 Three-dimensional building fine geometric reconstruction method integrating airborne and vehicle-mounted three-dimensional laser point clouds and streetscape images

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
TIAN ZHENG 等: "Building Fusion: Semantic-aware Structural Building-scale 3D Reconstruction", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE ( EARLY ACCESS )》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114494610A (en) * 2022-04-14 2022-05-13 清华大学 Intelligent understanding system and device for real-time reconstruction of large scene light field
CN114494610B (en) * 2022-04-14 2022-08-02 清华大学 Intelligent understanding system and device for real-time reconstruction of large scene light field
CN117274536A (en) * 2023-11-22 2023-12-22 北京飞渡科技股份有限公司 Live-action three-dimensional model reconstruction method and device
CN117274536B (en) * 2023-11-22 2024-02-20 北京飞渡科技股份有限公司 Live-action three-dimensional model reconstruction method and device

Similar Documents

Publication Publication Date Title
Qi et al. Review of multi-view 3D object recognition methods based on deep learning
Arth et al. Wide area localization on mobile phones
JP6765487B2 (en) Computer implementation methods using artificial intelligence, AI systems, and programs
CN108509848B (en) The real-time detection method and system of three-dimension object
WO2021175050A1 (en) Three-dimensional reconstruction method and three-dimensional reconstruction device
Aubry et al. Painting-to-3D model alignment via discriminative visual elements
Sun et al. A dataset for benchmarking image-based localization
JP5722502B2 (en) Planar mapping and tracking for mobile devices
EP2770783B1 (en) A wearable information system having at least one camera
EP3408848A1 (en) Systems and methods for extracting information about objects from scene information
US20170013195A1 (en) Wearable information system having at least one camera
Pan et al. Rapid scene reconstruction on mobile phones from panoramic images
CN113920256A (en) Three-dimensional reconstruction method, device and equipment for large scene
CN110363077A (en) Sign Language Recognition Method, device, computer installation and storage medium
US20220222832A1 (en) Machine learning framework applied in a semi-supervised setting to perform instance tracking in a sequence of image frames
CN112927363A (en) Voxel map construction method and device, computer readable medium and electronic equipment
Hirzer et al. Smart hypothesis generation for efficient and robust room layout estimation
Shalaby et al. Algorithms and applications of structure from motion (SFM): A survey
Zhang et al. Research on 3D architectural scenes construction technology based on augmented reality
Alam et al. A review of recurrent neural network based camera localization for indoor environments
Ding et al. An efficient 3D model retrieval method based on convolutional neural network
CN113284237A (en) Three-dimensional reconstruction method, system, electronic equipment and storage medium
Geng et al. SANet: A novel segmented attention mechanism and multi-level information fusion network for 6D object pose estimation
US20220398283A1 (en) Method for fast and better tree search for reinforcement learning
Yang et al. MLFNet-point cloud semantic segmentation convolution network based on multi-scale feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20220111

WD01 Invention patent application deemed withdrawn after publication