CN115953535A

CN115953535A - Three-dimensional reconstruction method and device, computing equipment and storage medium

Info

Publication number: CN115953535A
Application number: CN202310001732.1A
Authority: CN
Inventors: 姜博文; 邬书哲; 杨培麟
Original assignee: Shenzhen Huawei Cloud Computing Technology Co ltd
Current assignee: Shenzhen Huawei Cloud Computing Technology Co ltd
Priority date: 2023-01-03
Filing date: 2023-01-03
Publication date: 2023-04-11

Abstract

The application provides a three-dimensional reconstruction method, a three-dimensional reconstruction device, a computing device and a computer-readable storage medium. The method comprises the following steps: acquiring scene data acquired by acquisition equipment aiming at an original scene; dividing the scene data into a plurality of subclasses, wherein each subclass in the plurality of subclasses comprises an internal node exclusive to the subclass and a common node shared with an adjacent subclass, and the subclass is a set of images, voxels, point clouds and/or grids; determining a relative geometric transformation relation between local coordinate systems of two adjacent sub-classes according to a common node between every two adjacent sub-classes in the plurality of sub-classes; determining an absolute geometric transformation relation of a local coordinate system of the subclass relative to a global coordinate system according to the relative geometric transformation relations of the subclasss; and constructing a three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses. According to the method and the device, the motion trail of the acquisition equipment can be accurately restored, and an accurate three-dimensional model is formed.

Description

Three-dimensional reconstruction method and device, computing equipment and storage medium

Technical Field

The present application relates to the field of machine vision technologies, and in particular, to a three-dimensional reconstruction method, a three-dimensional reconstruction apparatus, a computing device, and a computer-readable storage medium.

Background

Three-dimensional reconstruction techniques are known in the art. The existing three-dimensional reconstruction technology is difficult to balance reconstruction cost, speed and precision, and has the problems of over-high cost, over-low reconstruction speed and insufficient precision of a reconstructed three-dimensional model. Some three-dimensional reconstruction techniques (e.g., split-zone offline reconstruction techniques) attempt to address these issues. However, when the subclasses are aggregated, the trajectories of the acquisition devices of the subclasses are not well aligned, so that the reconstructed three-dimensional model still has low accuracy.

Disclosure of Invention

An object of the present application is to provide a three-dimensional reconstruction method, a three-dimensional reconstruction apparatus, a computing device, and a computer-readable storage medium that can achieve reconstruction cost, speed, and accuracy, and can achieve good alignment of an acquisition device trajectory among subclasses.

In one aspect, the present application provides a three-dimensional reconstruction method, including: acquiring scene data acquired by acquisition equipment aiming at an original scene; dividing the scene data into a plurality of subclasses, wherein each subclass in the plurality of subclasses comprises an internal node exclusive to the subclass and a common node shared with an adjacent subclass, and the subclass is a set of images, voxels, point clouds and/or grids; determining a relative geometric transformation relation between local coordinate systems of two adjacent sub-classes according to a common node between every two adjacent sub-classes in the plurality of sub-classes; determining an absolute geometric transformation relation of a local coordinate system of the subclass relative to a global coordinate system according to the relative geometric transformation relations of the subclasss; and constructing a three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses.

According to the method and the device, the absolute pose of each subclass is solved and optimized through the relative geometric transformation relation between the subclasses, and two strongly coupled variables of the relative geometric transformation relation and the camera pose can be prevented from being put together for optimization, so that higher reconstruction efficiency is achieved, and the camera tracks are aligned better.

According to a particular embodiment of the present application, the dividing of the scene data into a plurality of sub-classes comprises: the scene data is divided into a plurality of sub-classes by a graph cut algorithm.

According to this embodiment, the idea of the graph cut algorithm is that the image data is divided into two parts, such that the connection between the two parts is minimal, while the connection inside each part is maximal. The scene data are classified through the graph cut algorithm, a plurality of subclasses with high internal data association degree can be obtained, meanwhile, fewer connection nodes among the subclasses are guaranteed, and accurate and efficient segmentation is facilitated.

According to a particular embodiment of the present application, there are a plurality of common nodes between two adjacent sub-classes. Determining a relative geometric transformation relationship between local coordinate systems of two adjacent sub-classes according to a common node between each two adjacent sub-classes in the plurality of sub-classes, including: and determining a plurality of relative geometric transformation relations between the local coordinate systems of the two adjacent sub-classes according to a plurality of common nodes between every two adjacent sub-classes in the plurality of sub-classes, wherein the relative geometric transformation relations form a group of relative geometric transformation relations. Determining an absolute geometric transformation relation of a local coordinate system of the subclass relative to a global coordinate system according to relative geometric transformation relations of the subclasss, wherein the determining comprises the following steps: and determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the multiple groups of relative geometric transformation relations of the subclasss.

According to the embodiment, because a plurality of common nodes exist between the subclasses, a plurality of possible transformation relations exist, all the possible transformation relations are taken into consideration and are put together for optimization, so that the real transformation relations between the subclasses can be accurately judged, the movement tracks of the cameras between the subclasses can be better connected, and an accurate three-dimensional model is constructed.

According to a particular embodiment of the present application, determining an absolute geometric transformation relationship of a local coordinate system of a sub-class with respect to a global coordinate system based on a plurality of sets of relative geometric transformation relationships of the sub-classes comprises: and calculating the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system through an L1 loss function according to a plurality of groups of relative geometric transformation relations of the subclass.

According to the embodiment, the L1 loss function is used for optimizing multiple groups of relative geometric transformation relations among all the subclasses, so that simple and efficient calculation can be realized, the optimization target of least L1 loss is set, the whole optimization process is enabled to be rapid and controllable, and the purpose of accurately judging the position and posture of the subclasses is achieved.

According to a particular embodiment of the present application, constructing a three-dimensional model of an original scene from absolute geometric transformation relationships of a plurality of sub-classes comprises: determining the position and three-dimensional point of the acquisition equipment of the subclass in a global coordinate system according to the absolute geometric transformation relation of the subclass; and constructing a three-dimensional model of the original scene according to the poses and the three-dimensional points of the acquisition equipment of the subclasses.

According to the embodiment, the pose of the acquisition equipment or the pose of the camera is calculated through the absolute geometric transformation relation, so that the three-dimensional point is calculated, the calculation resource can be well saved, and the problem that the two variables of the geometric transformation relation and the camera pose which are coupled with each other are put together for optimization, so that convergence is difficult is solved.

According to a particular embodiment of the present application, the three-dimensional reconstruction method further comprises: obtaining a model area to be updated, which is sketched on a three-dimensional model of an original scene by a user; deleting the model area to be updated; merging the scene data of the rest model areas into updated pre-subclasses; dividing scene data of an update scene into a plurality of update subclasses; selecting a common node between the pre-update subclass and the update subclass on the boundary of the rest model area; merging the pre-update sub-classes and the plurality of update sub-classes into a plurality of post-update sub-classes; and constructing a three-dimensional model of the updated scene according to the plurality of updated subclasses.

The embodiment provides an updating method of a three-dimensional model. Because the three-dimensional model of the original scene is obtained by the three-dimensional reconstruction method, the updating of the three-dimensional model can be realized by the method steps of the embodiment, so that the operation is simpler and quicker. Specifically, the user only needs to select a model area to be updated or replaced, and select a common node between the subclass of the updated scene and the subclass of the original scene, so that the model can be updated. The method and the device can support heterogeneous data and can ensure that the updated scene and the original scene are well fused.

According to a particular embodiment of the present application, the three-dimensional reconstruction method further comprises: acquiring a boundary between the three-dimensional model of the extended scene and the three-dimensional model of the original scene; merging the scene data of the original scene into an extended pre-subclass; dividing the scene data of the extended scene into a plurality of extended subclasses; selecting a common node between the pre-extension subclass and the extension subclass on the boundary; merging the pre-expansion subclass and the multiple expansion subclasses into multiple post-expansion subclasses; and constructing a three-dimensional model of the expanded scene according to the plurality of expanded subclasses.

The embodiment provides an expansion method of a three-dimensional model. The expansion method is based on the three-dimensional reconstruction method, so that simpler and quicker scene expansion can be realized. Specifically, the user only needs to select a common node between the subclass of the expanded scene and the subclass of the original scene, and the expansion of the three-dimensional model can be realized. The expansion method can support heterogeneous data and can ensure that the expanded scene and the original scene are well fused.

According to a particular embodiment of the present application, the relative geometric transformation relationship includes a relative scale transformation relationship, a relative rotation transformation relationship, and a relative translation transformation relationship, and the absolute geometric transformation relationship includes an absolute scale transformation relationship, an absolute rotation transformation relationship, and an absolute translation transformation relationship.

According to the embodiment, the meaning of the geometric transformation can be further clarified by determining the geometric transformation as the scale transformation, the rotation transformation and the translation transformation, the operation of the geometric transformation is simplified, and the implementation of the geometric transformation is simpler and more efficient.

According to a particular embodiment of the application, the acquisition device comprises one or more of an inertial sensor, a lidar, an ultrasonic radar, a millimeter wave radar, a visible light camera and an infrared camera.

According to the embodiment, a user can adopt one or more acquisition devices to acquire data of a scene, and no matter which combination mode is adopted, the three-dimensional reconstruction method can be used for constructing the three-dimensional model, so that the application scene of the application is wider.

In another aspect, the present application provides a three-dimensional reconstruction apparatus, comprising: the first acquisition module is used for acquiring scene data acquired by acquisition equipment aiming at an original scene; the system comprises a first segmentation module, a second segmentation module and a third segmentation module, wherein the first segmentation module is used for dividing scene data into a plurality of subclasses, and each subclass in the plurality of subclasses comprises an internal node exclusive to the subclass and a common node shared with an adjacent subclass; the first determining module is used for determining the relative geometric transformation relation between the local coordinate systems of the two adjacent sub-classes according to the common node between each two adjacent sub-classes in the plurality of sub-classes; the second determining module is used for determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the relative geometric transformation relations of the plurality of subclasses; and the first construction module is used for constructing a three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses.

According to a particular embodiment of the present application, the first segmentation module is further configured to: the scene data is divided into a plurality of sub-classes by a graph cut algorithm.

According to a particular embodiment of the present application, there are a plurality of common nodes between two adjacent sub-classes, the first determining module is further configured to: determining a plurality of relative geometric transformation relations between the local coordinate systems of two adjacent sub-classes to form a group of relative geometric transformation relations according to a plurality of common nodes between every two adjacent sub-classes in the plurality of sub-classes. Wherein the second determination module is further configured to: and determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the multiple groups of relative geometric transformation relations of the subclasses.

According to a particular embodiment of the present application, the second determination module is further configured to: and calculating the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system through an L1 loss function according to a plurality of groups of relative geometric transformation relations of the subclass.

According to a particular embodiment of the application, the first building module is further configured to: determining the position and the three-dimensional point of the acquisition equipment of the subclass in the global coordinate system according to the absolute geometric transformation relation of the subclass; and constructing a three-dimensional model of the original scene according to the poses and the three-dimensional points of the acquisition equipment of the subclasses.

According to a particular embodiment of the present application, the three-dimensional reconstruction apparatus further comprises: the second acquisition module is used for acquiring a model area to be updated, which is sketched on the three-dimensional model of the original scene by a user; the deleting module is used for deleting the model area to be updated; the first merging module is used for merging the scene data of the rest model areas into update pre-subclasses; the second segmentation module is used for dividing the scene data of the updated scene into a plurality of update subclasses; the first selection module is used for selecting a common node between the pre-update subclass and the update subclass on the boundary of the rest model area; the second merging module is used for merging the pre-update subclasses and the plurality of update subclasses into a plurality of post-update subclasses; and the second construction module is used for constructing a three-dimensional model of the updated scene according to the plurality of updated subclasses.

According to a particular embodiment of the present application, the three-dimensional reconstruction apparatus further comprises: the third acquisition module is used for acquiring a boundary between the three-dimensional model of the extended scene and the three-dimensional model of the original scene; the third merging module is used for merging the scene data of the original scene into an extended pre-subclass; the third segmentation module is used for segmenting the scene data of the extended scene into a plurality of extended subclasses; the second selection module is used for selecting a common node between the pre-expansion subclass and the expansion subclass on the boundary; a fourth merging module, configured to merge the pre-extension subclass and the multiple extension subclasses into multiple post-extension subclasses; and the third construction module is used for constructing a three-dimensional model of the expanded scene according to the plurality of expanded subclasses.

In another aspect, the present application provides a computing device comprising a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the three-dimensional reconstruction method described above.

In another aspect, the present application provides a computer-readable storage medium storing a computer program for executing the above-described three-dimensional reconstruction method.

In another aspect, the present application provides a computer program product comprising program code which, when executed by a computer, causes the computer to implement the three-dimensional reconstruction method described above.

Any one of the above-provided apparatuses, or computer-readable storage media or computer program products, is configured to execute the above-provided method, so that the beneficial effects achieved by the apparatus can refer to the beneficial effects of the corresponding schemes in the corresponding methods provided above, and details are not described herein again.

Drawings

Embodiments of the present application are described in detail below with reference to the attached drawing figures, wherein:

FIG. 1 illustrates a logical architecture diagram of various embodiments of the present application;

fig. 2 shows a schematic flow diagram of a three-dimensional reconstruction method according to an embodiment of the present application;

fig. 3 illustrates a reconstruction effect diagram of the three-dimensional reconstruction method according to the embodiment of fig. 2;

fig. 4 shows a schematic flow diagram of a three-dimensional reconstruction method according to another embodiment of the present application;

fig. 5 shows a schematic flow diagram of a three-dimensional reconstruction method according to another embodiment of the present application;

fig. 6 shows a schematic structural diagram of a three-dimensional reconstruction apparatus according to an embodiment of the present application;

fig. 7 shows a schematic structural diagram of a three-dimensional reconstruction apparatus according to another embodiment of the present application;

fig. 8 shows a schematic structural diagram of a three-dimensional reconstruction apparatus according to another embodiment of the present application;

FIG. 9 shows a schematic structural diagram of a computing device according to an embodiment of the present application.

Detailed Description

The present application is described in detail below with reference to specific embodiments in order to make the concept and idea of the present application more clearly understood by those skilled in the art. It is to be understood that the embodiments presented herein are only a few of all embodiments that the present application may have. Those skilled in the art who review this disclosure will readily appreciate that many modifications, variations, and alternatives are possible in part or in whole for the embodiments discussed below, and such modifications, variations, and alternatives are contemplated as being within the scope of the claimed invention.

As used herein, the terms "a," "an," and the like are not intended to mean that there is only one of the described items, but rather that the description is directed to only one of the described items, which may have one or more. As used herein, the terms "comprises," "comprising," and other similar words are intended to refer to logical interrelationships, and are not to be construed as referring to spatial structural relationships. For example, "a includes B" is intended to mean that logically B belongs to a, and not that spatially B is located inside a. Furthermore, the terms "comprising," "including," and other similar words are to be construed as open-ended, rather than closed-ended. For example, "a includes B" is intended to mean that B belongs to a, but B does not necessarily constitute all of a, and a may also include other elements such as C, D, E, and the like.

As used herein, the terms "first," "second," and the like are not intended to imply any order, quantity, or importance, but rather are used to distinguish one element from another. The terms "embodiment," "present embodiment," "an embodiment," "one embodiment," and "one embodiment" herein do not mean that the pertinent description applies to only one particular embodiment, but rather that the description may apply to yet another embodiment or embodiments. Those of skill in the art will understand that any of the descriptions given herein for one embodiment may be substituted, combined, or otherwise combined with the descriptions given herein for one or more other embodiments, as new embodiments may be created by those of skill in the art, and are intended to fall within the scope of the present application.

In the embodiments of the present application, three-dimensional reconstruction may refer to building a mathematical model suitable for computer representation and processing on a three-dimensional object, which is a basis for processing, operating and analyzing properties of the three-dimensional object in a computer environment, and is also a key technology for building virtual reality expressing an objective world in a computer. For example, three-dimensional reconstruction may refer to mathematical processes and computer techniques that recover three-dimensional information (shape, etc.) of an object using two-dimensional projections or images. In computer vision, three-dimensional reconstruction refers to the process of reconstructing three-dimensional information from single-view or multi-view images. Since the information of a single view is incomplete, the three-dimensional reconstruction requires the use of empirical knowledge. The multi-view three-dimensional reconstruction (binocular positioning similar to human) is relatively easy, and the method is to calibrate the camera, namely to calculate the relation between the image coordinate system of the camera and the world coordinate system. And then reconstructing three-dimensional information by using the information in the plurality of two-dimensional images.

Three-dimensional reconstruction is a hot topic in the field of computer vision and computer graphics, and three-dimensional scene reconstruction is one of the most critical technologies, and a three-dimensional map of a scene (mainly refers to a scene three-dimensional representation containing point clouds, visual feature points and descriptors thereof, which has position information and is mainly used for positioning, navigation and the like), a three-dimensional model (usually refers to a scene three-dimensional representation containing surfaces, textures and even materials, which is consistent with the appearance of a real scene and is mainly used for visual display, virtual and real superposition and the like) and the like have wide application in practice and have extremely high application value. In particular, in the field of extended reality (XR), applications such as virtual reality (AR) inspection, mixed Reality (MR) urban navigation, MR smart business circles, holographic conferences, remote assistance and the like all require a three-dimensional map and model of a scene as a basis; in the field of robots, a three-dimensional map of a scene cannot be separated from positioning, navigation, route planning and the like of a mobile robot, and the three-dimensional map has strong dependence on the construction capability of a large-scale high-precision three-dimensional scene map. With the rise of the concept of the metauniverse, application scenes such as consumption shopping, automatic driving simulation, open world games and the like in a virtual scene all put requirements on the capability of reconstructing a large-scale three-dimensional scene with high efficiency. The intelligent city and other scenes also need to be modeled in large-scale scenes such as parks and cities, can be used as a visual base of digital twins, and is used for management, planning, measurement, simulation and the like of the cities.

Three-dimensional scene reconstruction usually adopts equipment such as three-dimensional laser, single lens reflex, panoramic camera to gather original scene data, and other equipment also includes inertial measurement unit/inertial navigation/inertial sensor (IMU), depth camera, cell-phone etc. can adopt multiple sensor to gather various different data simultaneously and be used for three-dimensional scene to rebuild in many times. With the collected data as input, three-dimensional scene reconstruction generally uses technologies such as laser simultaneous localization and mapping (SLAM), visual-inertial SLAM, and Structure From Motion (SFM) to construct a three-dimensional map and a three-dimensional model.

Three-dimensional reconstruction based on various SLAM technologies belongs to an online reconstruction or online mapping technology, has high reconstruction efficiency, can perform scanning (data acquisition) and reconstruction at the same time, but has limited precision, can only be used for time sequence continuous data (such as videos or continuous laser point cloud frames) and is easily influenced by factors such as rapid movement; the laser SLAM benefits from the advantages of hardware, can ensure higher geometric accuracy, but has high equipment cost, no portability and difficulty in large-scale popularization. The SFM-based three-dimensional reconstruction belongs to an off-line reconstruction or off-line mapping technology, has large calculation amount, cannot achieve real-time or quasi-real-time performance, needs to complete data acquisition and then carries out reconstruction, but has high reconstruction precision and can be used for disordered image data. The SFM mainly takes images as input, so that a three-dimensional map based on visual features can be naturally constructed while point clouds are constructed, the requirements on equipment are low, a portable panoramic camera, a single-lens reflex camera, a mobile phone and the like can be adopted, and under the condition, laser point cloud data can be used for enhancing the geometric accuracy. In summary, some three-dimensional scene reconstruction methods in the art rely on high-cost and heavy hardware devices, or the accuracy is difficult to guarantee, or the reconstruction efficiency is obviously insufficient, so that many difficulties and problems often exist in practical applications.

After the three-dimensional map and the model of the scene are reconstructed, because the real world is constantly changed and the scene is likely to be constantly expanded, the problem of scene updating (local replacement and expansion) is also faced, namely, after the scene is changed, how to locally replace the three-dimensional map and the model in the field, when the scene needs to be expanded, how to combine the new three-dimensional map and the model into the existing map and model, and after the three-dimensional scene is updated, the replaced and newly added parts are ensured to keep better consistency with the original three-dimensional scene, and the 3D point in the space does not have obvious fracture, dislocation, ghost images and the like.

In order to update the existing three-dimensional scene, the most direct method is to perform three-dimensional reconstruction again after new data is acquired, and although the method can better keep the consistency of the three-dimensional scene, the calculation amount is large, and the data acquired before and after the calculation is required to have homogeneity, that is, the data are acquired by the same type of equipment and have the same kind of data. The other scheme is that after reconstruction is carried out based on newly acquired data, a part of an existing three-dimensional scene is replaced by a new three-dimensional scene or the new three-dimensional scene is spliced with the existing three-dimensional scene, heterogeneous data can be supported by the scheme, but the consistency of seamless splicing of the three-dimensional scene is difficult to guarantee, deviation is easy to occur at the edge of updating particularly due to manual operation, and an automatic method has obvious defects in efficiency and accuracy.

Based on the above analysis, some technical solutions in the art are significantly insufficient in one or more aspects of usability, efficiency, accuracy, and the like of three-dimensional scene reconstruction, and are difficult to effectively support updating and expansion of three-dimensional scenes, and especially under the condition of heterogeneous data, efficiency and accuracy are relatively limited.

Some three-dimensional scene reconstruction techniques in the art can be classified into two broad categories, namely online reconstruction and offline reconstruction, wherein offline reconstruction schemes can be further classified into two categories, namely global reconstruction and partitioned reconstruction. In addition, some techniques separately consider the problem of updating the three-dimensional scene.

The online reconstruction mainly refers to scene reconstruction by adopting various SLAM methods, and the scene reconstruction can be synchronously performed while a sensor is used for shooting a scene (acquiring scene data). In the related field of robots, online reconstruction is the most common scene reconstruction means, a robot carries one or more sensors, environmental data scanning and three-dimensional map construction are carried out simultaneously, a three-dimensional map is generated in real time during operation of the robot, and a three-dimensional point cloud map of a scene is usually constructed by laser SLAM through laser equipment.

The method carries out feature matching on data of adjacent frames in a time window, calculates and optimizes a pose (camera pose/image pose, which can refer to three-dimensional position and three-dimensional rotation angle of an image, namely transformation of a visible 3D point on the image under a world coordinate system) by adopting methods such as an Iterative Closest Point (ICP), kalman filtering, graph optimization and the like so as to carry out registration, obtain a local three-dimensional scene, and then gradually generate the whole three-dimensional scene. In order to reduce equipment cost and ensure reconstruction accuracy, some technologies in the field uniformly extract features such as curvature in regions and remove problematic feature points on the basis of using a single line laser radar.

The off-line reconstruction mainly refers to scene reconstruction based on an SFM method, and reconstruction is performed after data acquisition is completed. In the scheme, an image set is usually used as input, feature points on the image and corresponding feature descriptors are extracted, image matching is performed to obtain a matching relation between the images, then algorithms such as n-point perspective (PNP), triangulation, bundle adjustment/Bundle Adjustment (BA) and the like are further adopted to estimate the camera pose and 3D point coordinates of a scene, wherein pose estimation is a core problem concerned by SFM and is a basis for calculating a (sparse point or dense point) three-dimensional model of the scene. Typical SFM methods include both incremental SFM and global SFM.

The incremental SFM method selects a pair of initial images to reconstruct, then gradually selects new images to register to the existing model, and performs global optimization through BA after registering one or more images, and representative methods such as Photo tours, visual motion inference structure (Visual SFM), adaptive motion inference structure (asmm), and the like. The incremental reconstruction method is generally inefficient because registration needs to be performed on a per image basis and a global BA needs to be performed once per registered image in order to reduce error accumulation, and it is still difficult to avoid drift.

The global SFM then uses a motion averaging method to solve the global pose for all cameras at one time, and then executes a global BA, which is a representative method such as a one-dimensional motion structure from motion (1 dsfm), least squares deviation (LUD), and so on. Motion averaging is a key technology of the global SFM, and includes two aspects of rotation averaging and translation averaging, such as efficient and robust large-scale rotation averaging (ERLS-RA). Compared with the incremental SFM, the global SFM considers all the spatially adjacent relative constraints from the global area, can avoid the drift caused by the accumulation of errors, and represents a method such as a global structure-from-motion by spatial similarity estimation (GSA-SFM) that solves the absolute pose of each star map (and thus each camera in the star map) in the global coordinate system, specifically, it forms the local adjacent cameras into a star map, the centers of spatially adjacent star maps such as i and j are simultaneously included in the star maps i and j, the relative scale, relative rotation, and relative translation between the adjacent star maps are calculated by means of the relative scale, relative rotation, and relative translation between the two star maps i and j coordinate systems between the centers of the adjacent star maps (represented as i and j), and then the relative constraints between the two adjacent star maps are summed into an equation set, and the absolute constraints between all the adjacent star maps (the spatial adjacent star maps) are satisfied by minimizing the L1 loss, and the absolute constraints between all the cameras in the global coordinate system are solved, thereby obtaining the solution of the absolute pose in each global coordinate system.

SFM is mostly based on geometry, and currently, there is also a deep learning-based method, but the application range is limited, and the method is still in the exploration stage. The deep learning-based method can be roughly divided into two methods, one is that a depth map and a camera pose are predicted respectively or jointly by using a network, the luminosity consistency is minimized, and a direct prediction depth map and a camera pose optimized by geometric constraint assistance are introduced, wherein the representative method is BA-Net; and the other method is to directly predict voxels, point clouds or grids, such as a depth symbolic distance function DeepsSDF, an octNet, a deformation network DeformNet, a three-dimensional recursive reconstruction neural network 3D-R2N2 and other methods.

Although the offline reconstruction scheme has high precision, for a large-scale scene, the reconstruction efficiency is low and the reconstruction time is long due to the large data volume. For the problem, a partial method adopts a partitioned reconstruction scheme, and the main idea is to cluster images, divide a scene into different partial scenes, reconstruct the partial scenes respectively (usually adopting an incremental SFM method), and then fuse the partial three-dimensional scenes (usually adopting a motion average algorithm), thereby obtaining the final complete three-dimensional scene. The representative method comprises a graph cut-based method, a spanning tree-based method and the like, the scheme integrates the advantages of the incremental SFM and the global SFM to a certain extent, and the reconstruction speed is increased on the premise of ensuring the robustness.

A representative method for the local off-line reconstruction includes the steps of using a graph-partitioned efficient large-scale motion inference structure (divConq-SFM), a very large-scale motion inference structure (VLSG-SFM) through distributed motion averaging to obtain multiple subgraphs corresponding to local scenes, performing the local reconstruction (estimating the pose of each camera in the local subclass coordinate system in the subclass) on each subgraph in a distributed manner, and alternately performing the global motion averaging on inter nodes (namely common nodes and boundary nodes shared by the overall scene subgraphs) between the overall scene subgraphs and the intra nodes (namely internal nodes) in the subgraphs, and simultaneously solving the transformation from the local coordinate system to the global coordinate system, the global motion averaging and the global pose transformation, and the main steps of:

1) Extracting feature points of all images of the whole scene, performing image matching by means of feature point matching, and constructing an Edge Graph (EG), wherein the edge graph is a graph constructed by taking the matching relation of image pairs as edges based on image matching results.

2) The entire EG is partitioned into multiple subgraphs using a graph cut algorithm.

3) And locally reconstructing each sub-graph in a distributed mode, and estimating the local poses of all cameras in each sub-class.

4) And (3) calculating scale transformation and similarity transformation from each subclass coordinate system to a global coordinate system by using a DivConq-SFM algorithm, and aggregating subclasses to a whole area as an initial value of 5-step Motion Averaging (MA) loss optimization.

5) And constructing local MA (local rotation averaging (LRA) and Local Translation Averaging (LTA)) loss functions for the intra-intra pairs and the intra-inter pairs connected in the local sub-space. For inter-inter pairs shared between globally spatially adjacent subclasses, inter-intra pairs within a subclass construct global MA (global rotation averaging, GRA) and Global Translation Averaging (GTA) loss functions. And alternately carrying out LRA (line of sight), LTA (linear transformation) and GRA (grazing), GTA (GTA) optimization, optimizing the global pose of the inter camera, and carrying out scale transformation and similarity transformation from a local subclass coordinate system to a global coordinate system.

In practice, due to a change of a real scene or a change of a scene range, a reconstructed three-dimensional scene often needs to be updated, including local replacement or expansion of the three-dimensional scene. The most direct method for updating the three-dimensional scene is to perform three-dimensional reconstruction again after new data are collected, and although the method can better keep the consistency of the three-dimensional scene, the calculated amount is large, and the data collected before and after are required to have homogeneity, namely the data are collected by the same type of equipment and have the same kind of data.

The other method is to register and fuse newly acquired data with the existing three-dimensional scene, for example, firstly determine updated parts by setting an image frame age threshold, calculating point cloud difference degree or manual intervention selection, and then perform point cloud registration or image registration, and perform fusion of new and old three-dimensional scenes by adopting a rollback chart building process, downsampling combination, octree storage and other modes, so as to update the three-dimensional scene. Such schemes directly use data for fusion, and can only deal with the situation of homogeneous data.

In another scheme, after reconstruction is performed on the basis of newly acquired data, a part of an existing three-dimensional scene is replaced by a new three-dimensional scene or the new three-dimensional scene is spliced with the existing three-dimensional scene, heterogeneous data can be supported theoretically by the scheme, but the method has no related design, in addition, the consistency of seamless splicing of the three-dimensional scene is difficult to guarantee, deviation is easy to occur at the edge of updating particularly through manual operation, and the automatic method has obvious defects in the aspects of efficiency and accuracy.

The disadvantages of the online reconstruction scheme are: 1) The scheme based on the laser SLAM is usually applied to the real-time positioning and mapping of a robot, or the scheme is applied to the MR space mapping and scene model construction under the condition of laser equipment, the method depends on special equipment, is high in cost and has no portability, so that the method cannot be popularized in a large scale, and in addition, the laser is not friendly to the construction of a map based on visual characteristics, and an additional camera is needed for calibration. 2) In the scheme based on the visual SLAM (including the visual-inertial SLAM and the like), the emphasis is on online operation and real-time performance, so that the lightweight features and an optimization mode are adopted, the reconstruction accuracy is difficult to guarantee, and the application of the scheme on the three-dimensional scene updating is limited. In addition, the online reconstruction scheme can only be used for time-series continuous data such as video or continuous laser point cloud frames, so the application range is limited.

The drawbacks of the global offline reconstruction scheme are: 1) The scheme based on the incremental SFM is sensitive to selection of an initial image pair, errors are easily accumulated to cause drift in iterative registration images, BA needs to be repeatedly executed, the calculation amount of an algorithm is large, and the reconstruction time is long, for example, the method can be completed in more than 2 weeks in the reconstruction scene of 6000 images by open source benchmarking software COLMAP. 2) The scheme based on the global SFM is easily influenced by matching external points, has high calculation efficiency but poor accuracy, needs to acquire all data again under the condition that a three-dimensional scene needs to be updated, is slightly heavy from zero global overall optimization, and lacks agility. 3) The scheme based on deep learning is currently in an exploration phase, a deep network is mainly adopted for depth estimation and pose estimation, a method for directly predicting voxels and the like usually depends on large-scale labeling data and is only suitable for small-range indoor scenes, and under the condition that the scenes need to be expanded, the method is limited by the generalization of the deep network, and the deep network is often required to be retrained by collecting expanded area data.

The shortcomings of the partitioned off-line reconstruction scheme are: in contrast, the schemes take the reconstruction speed and precision into consideration, but the methods mainly support reconstruction based on images, the adopted optimization mode has the problems of sensitive precision of scale initialization and strong coupling of two types of variables of optimization solution, heterogeneous data is not supported, and no scheme aiming at three-dimensional scene updating exists.

For example, considering a representative method VLSG-SFM, performing local and global motion average optimization iteratively, optimizing the camera granularity, wherein the optimized variable scale is large, and simultaneously, two types of mutual influence variables of similarity transformation between camera poses and subclasses are optimized in a loss function, so that the optimization process is difficult to converge due to strong coupling of variables, and belongs to a pathological problem, and therefore the reconstruction efficiency and accuracy cannot be guaranteed. Specifically, the method comprises the following steps:

1) When the global loss function constructed by the merging subgraphs is optimized, the truth value in the translation loss function depends on the relative scale between the subclasses and the relative translation between the cameras in the subclasses, which are calculated in the initialization process. Because the relative scale between the subclasses of the initial calculation is not accurate (often the relative scale between some subclasses deviates from the true value seriously), and the inter belongs to more than two subclasses, the deviation exists in the true value translation in the loss calculation by selecting the relative translation of which subclass, and the relative translation between the cameras in the subclasses may also be inaccurate. The true value in the translation loss is very inaccurate, and the optimization of the global loss function of the clustering subclass is difficult to be correctly guided, so that the tracks of cameras after the clustering subclass are difficult to align, break occurs, and the 3D point has a ghost problem.

2) The global motion average loss function comprises two types of variables which are influenced mutually by similarity transformation between camera poses and subclasses, the two types of variables are strongly coupled and optimized together, and because the two types of variables are inaccurate in the initialization process and have no definite and correct optimization direction in the optimization process, the problems of slow optimization, difficulty in convergence, falling into a local extreme value and the like occur, and reconstruction failure occurs.

Some three-dimensional scene update schemes in the art have disadvantages: the three schemes are all universal reconstruction methods, and do not consider and specially design the updating of the three-dimensional scene, so that the three schemes are not self-updated and are not friendly to the updating of the three-dimensional scene. The scheme of directly introducing newly acquired data to reconstruct the whole scene for updating has low efficiency and poor expandability, and the reconstruction reliability is also reduced along with the enlargement of the scene scale. The newly collected data are fused to the original scene one by one through matching, so that the efficiency can be improved to a certain extent, but the efficiency still requires homogeneous data, and the application range is limited. The scheme of using new data to reconstruct a local three-dimensional scene independently needs to replace or splice the local three-dimensional scene with the existing three-dimensional scene, and some schemes in the field lack an effective overall optimization method, so that the consistency of seamless splicing of the three-dimensional scene is difficult to guarantee, especially, deviation is easy to occur at the updated edge if manual operation is carried out, and the automatic optimization by adopting a VLSG-SFM-like mode has obvious defects in the aspects of efficiency and accuracy.

In a large scene situation, some three-dimensional scene reconstruction schemes in the field are difficult to compromise and balance in aspects of reconstruction cost, speed, precision and the like, or rely on expensive laser equipment (laser-based online reconstruction), or difficult to ensure reconstruction accuracy (image-based online reconstruction), or have low reconstruction efficiency (overall offline reconstruction). In the case that the reconstructed scene needs to be updated locally or the boundary needs to be expanded outwards, some reconstruction schemes in the field lack a suitable three-dimensional scene update mechanism, and the update scheme (which re-acquires data) reconstructed from zero has the problems of low efficiency, poor expandability, dependence on homogeneous data, and the like.

On the basis of the partitioned off-line reconstruction scheme, the method and the device have the advantages that the image is only used as necessary input and the reconstruction accuracy is high, further, the problem of camera track alignment in the clustering subclass process is fundamentally solved by redesigning the optimization framework and the optimization function under the condition that the relative constraint of the global space adjacent cameras is met, the convergence speed is remarkably accelerated, the purpose of more accurate large scene reconstruction is achieved, and the balance among the aspects of cost, speed, accuracy and the like is achieved. On the basis of regional reconstruction, layered and decoupled optimization schemes are designed, on one hand, the method is naturally suitable for carrying out local replacement or outward expansion on the existing scene, on the other hand, heterogeneous data can be supported and used, and meanwhile, the accuracy and consistency of the updated three-dimensional scene can be ensured.

On the basis of the partitioned off-line reconstruction scheme, the embodiment of the application designs a new three-dimensional scene reconstruction system, which has higher reconstruction speed, solves the problems of camera track fracture and 3D point ghost which easily occur in the process of reconstructing an aggregated sub-region after partitioned reconstruction, naturally has a mechanism (replacement and expansion) for updating a three-dimensional scene, and also supports the use of heterogeneous data.

As shown in fig. 1, the core of each embodiment of the present application is a new hierarchical decoupling optimization framework, specifically, given all data of a certain real scene, after matching, dividing the data into a plurality of data subsets, and correspondingly obtaining a plurality of local scenes, which is called a local scene as a sub-graph, a data point (such as an image) common between spatially adjacent sub-graphs is an inter node, and a node inside a sub-graph is an intra node, where the hierarchical decoupling optimization framework has the following two main points:

1) Regarding each subgraph as a whole, computing transformation relations, such as scale transformation, similarity transformation and the like, between the subgraphs by means of a plurality of inter nodes between the subgraphs adjacent to the space, wherein the computation is realized by globally considering the whole scene, constructing constraint relations and establishing optimization problems based on the transformation (scale transformation, similarity transformation) between all the subgraphs adjacent to the space, for example, constructing a linear equation system by the relation of absolute and relative poses, and then solving the absolute scale, absolute rotation and coordinates of the origin of each subgraph (local) coordinate system under the global coordinate system by minimizing L1 loss.

2) And fixing the scale transformation and the similarity transformation from the sub-image coordinate system to the global coordinate system (namely the absolute pose of the origin of the sub-image coordinate system in the global coordinate system), fixing the local pose of intra nodes of the sub-image, and optimizing the absolute pose of inter nodes of the sub-image in the global coordinate system, wherein the constraint relation can be constructed and the optimization problem can be established through the relative pose between the inter-intra in each sub-image.

From the two points, the framework constructs optimization problems respectively at the sub-graph abstract level and inside the sub-graph (data point fine level) to solve, and the optimization of the two levels is decoupled with each other. Under the design, the optimization problem of each layer has good definition, and the scale of the optimization item and the scale of the parameter are controllable, so that the method has better convergence and faster convergence speed compared with other methods in the field, thereby not only improving the reconstruction speed, but also enhancing the reconstruction precision. The sub-graph can be a local scene divided from an overall scene, can also be a local scene needing to be replaced, or can be a newly expanded local scene, so the framework is suitable for the first reconstruction of the three-dimensional scene, and naturally supports the updating and expansion of the existing three-dimensional scene, meanwhile, due to the characteristics of layering and decoupling, under the condition of heterogeneous data, the interior of the sub-graph can be independently reconstructed, then the relative pose between local areas is calculated by means of the local absolute pose of the shared inter nodes between all adjacent local areas in space, and the relative pose is used as the constraint of a local coordinate system of the overall optimization sub-graph, the overall absolute pose of the local areas is obtained through overall optimization, so that the accuracy of the local scene being fused into the overall scene is ensured, and under the condition of homogeneous data, the overall pose of data points at the level can be further optimized, and the overall pose of data points at the intersection of the local scenes is further finely adjusted.

The embodiments of the Application can be deployed as a cloud platform at the cloud end, can also be deployed offline or in a way of a soft and hard all-in-one machine, and can also be a set of cloud service Application Program Interfaces (APIs) which are matched with each other.

The embodiments of the application can be applied to outdoor (garden, city, rural street, building and the like) and indoor scene reconstruction, can use a panoramic camera to collect scene images (the red track in the upper figure is the track of the camera in the collection process), and can also use other photographing equipment or laser equipment to collect data to reconstruct a three-dimensional scene.

Embodiments of the present application have many possible product forms, including but not limited to a cloud platform, desktop software, a soft and hard all-in-one machine, or a set of cloud service APIs that cooperate with each other. Under the product forms, the system mainly provides functions of data uploading/importing, user sketching of an updating area, three-dimensional scene updating (replacing and expanding), zero reconstruction of a large-scale new area, layering optimization options (abstract subgraph granularity and fine camera granularity optimization), visualization of layering optimization results (camera tracks and 3D points) and the like for a user.

The embodiment of the application is suitable for the mapping of the acquisition mode of the panoramic camera. The embodiments of the application are also suitable for mapping image information obtained by aiming at different acquisition equipment such as laser, laser radar, depth camera, monocular camera, binocular camera and the like, and different feature matching algorithms can be selected according to different acquisition modes. Meanwhile, an optimization mode of adding sensors based on the embodiments of the present application is also conceivable, that is, by fusing data of multiple sensors, on the basis of the reconstruction scheme of the embodiments of the present application, pose information is calculated and optimized, and a more accurate global pose based on multiple sensors is obtained.

A three-dimensional reconstruction method according to an embodiment of the present application is described below with reference to fig. 2 and 3. The three-dimensional reconstruction method may be performed by a computing device.

As shown in fig. 2, according to the present embodiment, the three-dimensional reconstruction method includes steps S210 to S250. Each step will be described in detail below.

S210, scene data acquired by the acquisition equipment aiming at the original scene are acquired.

The original scene may be an original scene that has not been updated, expanded or altered, and the purpose of three-dimensional reconstruction is to first model the original scene in three dimensions.

The capture device may be any device that can capture information about the shape, appearance, and location of the scene. For three-dimensional reconstruction, common acquisition devices include inertial sensors, lidar, ultrasonic radar, millimeter-wave radar, visible light cameras, and infrared cameras. The data collected by the inertial sensor are direction and position data about a scene or a part of the scene, the data collected by the laser radar are point clouds, the data collected by the visible light camera are images, and the data collected by the infrared camera are infrared images. The visible light camera may be further classified into a panoramic camera, a depth camera, a monocular camera, a binocular camera, and the like.

S220, dividing the scene data into a plurality of sub-classes, wherein each sub-class in the plurality of sub-classes comprises an internal node exclusive to the sub-class and a common node shared with an adjacent sub-class, and the sub-classes are a set of images, voxels, point clouds and/or grids.

A subclass (or subgraph) may refer to a collection of data that is composed of data that have some similarity or proximity in the scene data. When the data is a two-dimensional image, the subclass may be a set of a plurality of two-dimensional images. When the data is a laser point cloud, the subclass may be a collection of multiple three-dimensional points (or multiple point cloud regions). When the data is a three-dimensional image, the sub-class may be a set of multiple voxels. When the data is acquired by other acquisition devices such as millimeter wave radar, ultrasonic radar, and/or inertial sensor, the subclass may be a set of multiple grids.

An internal node (intra node) is a data node inside a certain subclass, which is specific to that subclass. The common node (inter node) is a data node shared between two adjacent subclasses. When the data is an image (visible light image, infrared image, etc.), the node may be an image. When the data is a point cloud, the node may be three-dimensional coordinate information of a plurality of points.

The scene data may be divided into a plurality of sub-classes by a graph cut algorithm.

The graph cut algorithm may refer to associating an image segmentation problem with a minimal segmentation problem of the graph, first mapping the image as a weighted undirected graph, where each node corresponds to each pixel in the image, each edge connects a pair of adjacent pixels, the weight of an edge represents a non-negative similarity in terms of gray, color or texture between adjacent pixels, and a segmentation of the image is a cut of the graph, where each region being segmented corresponds to a sub-graph in the graph. For example, the basic idea of graph cut is to use the max flow algorithm to minimize the cut, dividing the elements into two disjoint subsets. Graph partitioning, for example, is an operation that partitions a weighted directed graph in a data structure. For example, segmentation may refer to segmentation of an image into foreground and background using a min-cut max-flow algorithm.

When the scene data is an image, the graph cutting process may be that feature points are extracted from all images of the whole scene, image matching is performed by means of feature point matching, an edge graph is constructed, then minimum cutting is solved by a maximum flow algorithm, and the whole EG is divided into a plurality of sub-graphs.

And S230, determining the relative geometric transformation relation between the local coordinate systems of the two adjacent sub-classes according to the common node between each two adjacent sub-classes in the plurality of sub-classes.

The geometric transformation includes the interchange of the two coordinate systems by geometric changes (scaling, rotation, translation, etc.). The relative geometric transformation relationship may represent a relative scale transformation relationship, a relative rotation transformation relationship, and a relative translation transformation relationship between the two sub-classes with respect to each other. The absolute geometric transformation relation may represent an absolute scale transformation relation, an absolute rotation transformation relation, and an absolute translation transformation relation of one subclass with respect to the global coordinate system or the entire three-dimensional model.

The local coordinate system may refer to a coordinate system embodied by all data (images, point clouds, etc.) in one subclass. When the data of a subclass is an image, the local coordinate system of the subclass represents a coordinate system capable of determining the relative positions of pixel points in all (two-dimensional) images included in this subclass. For example, the local coordinate system may be a coordinate system in which the optical center of a camera that captures a certain image is the origin and the optical axis is the x-axis. When the sub-class is a point cloud, the local coordinate system of the sub-class may be a coordinate system that is capable of determining the relative positions of all three-dimensional points in this sub-class.

The relative geometric transformation relationship between the local coordinate systems of two adjacent sub-classes is determined according to a common node between each two adjacent sub-classes in the plurality of sub-classes, and the specific implementation manner of the method can be that since the pose or geometric transformation of the (one) common node relative to the local coordinate system of each of the two adjacent sub-classes is known, the relative geometric transformation relationship between the two sub-classes can be found by taking the common node as a bridge.

There may be a plurality of common nodes between each two adjacent subclasses. In this case, a plurality of relative geometric transformation relations constituting a set of relative geometric transformation relations between the local coordinate systems of two adjacent sub-classes may be determined based on a plurality of common nodes between each two adjacent sub-classes of the plurality of sub-classes.

Usually, there are a plurality of common nodes shared between two adjacent sub-classes, and therefore there are a plurality of relative geometric transformation relationships calculated by the common nodes, and it is inaccurate to solve the true relative geometric transformation relationship between two sub-classes by using any relative geometric transformation relationship. For convenience of description, a plurality of relative geometric transformation relationships calculated between two neighbors through a plurality of common nodes can be regarded as a set of relative geometric transformation relationships.

And S240, determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the relative geometric transformation relations of the plurality of subclasses.

The relative geometric transformation relationships of the multiple sub-classes may refer to all the relative geometric transformation relationships existing in the sub-classes formed by all the acquired scene data, or may refer to the relative geometric transformation relationships existing between all the sub-classes formed by the data of the part of the scene to be modeled.

The determination of the global coordinate system may be performed as desired. For example, one sub-class of local coordinate system may be selected as the global coordinate system, and then the transformation relationship of all other local coordinate systems with respect to the global coordinate system is solved.

The specific implementation mode of determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the relative geometric transformation relations of the subclasses can be that all the relative geometric transformation relations are substituted into an equation set, then the equation set is optimized through a loss function, so that the uniquely determined relative geometric transformation relations among all the adjacent subclasses in the subclasses are determined, and then the absolute geometric transformation relations of all the local coordinate systems are obtained according to the uniquely determined relative geometric transformation relations.

When there are a plurality of common nodes between two adjacent sub-classes, the absolute geometric transformation relationship of the local coordinate system of the sub-class with respect to the global coordinate system can be determined according to the sets of relative geometric transformation relationships of the plurality of sub-classes. The multiple sets of relative geometric transformation relationships of the sub-classes may refer to relative geometric transformation relationships between all adjacent two sub-classes in all the sub-classes, each two adjacent sub-classes having a set of relative geometric transformation relationships therebetween, and thus all the sub-classes (or all the sub-classes that need attention) have multiple sets of relative geometric transformation relationships.

The absolute geometric transformation relationship of the local coordinate system of the subclass with respect to the global coordinate system is determined from the sets of relative geometric transformation relationships of the plurality of subclasses, and may be calculated by an L1 penalty function. The absolute geometric transformation relationship of the local coordinate system of the subclass relative to the global coordinate system is calculated through the L1 loss function, which may be specifically that the relative geometric transformation relationships of all the groups of all the subclasses are substituted into an equation set, the solutions of the equation sets when the loss is minimum are solved through the L1 loss function, and then the uniquely determined relative geometric transformation relationship between any two adjacent subclasses is determined according to the solutions of the equation set.

In S240, regarding each subclass as a node (i.e., the origin of the subclass coordinate system), comprehensively considering the whole global scene, adding constraints (scale transformation and similarity transformation) among all subclasses calculated in the third step into an equation set, and solving the absolute scale, absolute rotation and coordinates of the origin of each subclass coordinate system in the global coordinate system through L1 loss optimization, i.e., obtaining the scale transformation and similarity transformation from the position and pose of the subclass to the global coordinate system.

The calculation process of S240 is detailed below.

According to the relative scale calculated by the inter nodes between the subclasses, the following equation is obtained:

s _i /s _j ＝S _ij

taking logarithm on two sides to obtain:

log(s _i )-log(s _j )＝log(S _ij )

summarizing all the equations obtained by the relative scale between adjacent subclasses in space, and piling up the equations to obtain

A _s x _s ＝b _s

Wherein A is _s Is a sparse matrix with one element per row being 1, one element being-1, and the others all being 0 _s Is a vector obtained by arranging the logarithm of the absolute scale of all subclasses, b _s Is a vector obtained by arranging the relative scale and taking the logarithm.

By setting the absolute scale of the first subclass to 1, i.e., log(s) ₁ ) =0, the above system of equations is solved by solving the following convex L1 optimization:

similarly, the relative Rotation between spatially adjacent sub-classes is calculated, and the absolute Rotation of each sub-class in the global coordinate system is estimated by the Rotation Averaging method.

Likewise, by means of the computed absolute scale of each subclass, the relative translation between spatially adjacent subclasses in the global coordinate system is computed by the inter node, resulting in the following equation:

R _j (c _i -c _j )＝t _ij

wherein R is _j Is the absolute rotation of the subclass j in the global coordinate system, c _i ，c _j Is the coordinate of the subclass i coordinate system origin, the remainder coordinate system origin in the global coordinate system, t _ij Is the relative translation between the origin of the coordinate system of the subclass i of the global coordinate system and the origin of the coordinate system of the subclass j.

Summarizing all equations obtained from relative translations between spatially adjacent subclasses, and stacking them up to obtain:

A _c x _c ＝b _c

wherein A is _c Is a sparse matrix, every three consecutive rows except containing R _j and-R _j And the remainder are all 0, x _c Is a vector obtained by arranging the coordinates of the origins of all the subclasses in the global coordinate system, b _c Is the vector resulting from the relative translation between the permutation sub-classes.

Similarly, fixing the coordinate of the origin of the first sub-class coordinate system in the global coordinate system as 0, i.e. c1=0, the above equation set is solved by solving the following convex L1 optimization:

according to the above calculation process, S240 has the following advantages:

1) And (3) accuracy: each subclass is regarded as a node, a plurality of relative scales, relative rotation and relative translation between adjacent subclasses in space are calculated through the inter nodes, a plurality of relative scale pose constraints can be obtained because a plurality of inter nodes exist between any two adjacent subclasses in space, any one constraint is not absolutely accurate, a more robust solution meeting all the constraints can be obtained in a summary mode, meanwhile, the relative scale and pose constraints between all the adjacent subclasses in space of the whole scene are summarized in a global optimization mode, and the global optimal solution meeting all the constraints can be integrally solved.

2) High efficiency: because the subclass is regarded as a node (the origin of the subclass coordinate system) for global optimization, the optimization is more efficient and faster.

Fig. 3 visually shows that in S240, each subclass is regarded as a node, and global absolute scale and similarity transformation of each subclass are optimized by means of global L1 loss, and the subclasses are aggregated into a global result. As shown in fig. 3, the improvement of the present embodiment is: 1) The camera trajectories of the connection positions between the subclasses are well aligned; 2) Panoramic camera poses used in acquisition are substantially replicated in the trajectory (eight cameras facing in the circumferential direction around a point); 3) The spatial 3D points estimated for each sub-class are aligned, eliminating the ghosting problem.

For example, S240 may further include further optimizing the absolute geometric transformation relationship of the common node. Namely, scale transformation from a fixed subclass to a global area, similarity transformation, and the local absolute pose of an intra connected with the inter in the subclass, and the absolute pose of the inter in a global coordinate system is constrained and globally optimized by the relative pose between the local inter and the intra.

The following two loss functions are specifically optimized:

wherein Rij is the relative rotation of inter-intra in the subclass in the global coordinate system,

is an absolute rotation of the camera j in the global coordinate system, is>

Is the absolute rotation of the camera i in the local A coordinate system>

Is a rotational transformation of the local area a coordinate system to the global B coordinate system.

Wherein, t _ij Is the relative translation between inter-intra cameras in the local subclass after transformation to global area (actually equal to the local relative translation multiplied by the absolute scale of the subclass in the global coordinate system),

is an absolute rotation of the camera j in the global coordinate system, is>

Is the coordinate of camera j in the global coordinate system, is based on>

Is the coordinate of the camera i in the local A coordinate system, is based on the value of the coordinate value of the camera i in the local A coordinate system>

Is the coordinate of the local sub-class A coordinate system origin in the global B coordinate system>

The absolute geometric transformation relation of the common nodes is further optimized, and the following advantages exist:

1) High efficiency: because the scale transformation from subclasses to the whole area, the similarity transformation and the absolute pose of the intra node of the local coordinate system are fixed, the absolute pose of the intra node connected with the inter in each subclass in the space in the global coordinate system can be kept fixed, which is equivalent to that a plurality of data points in each local area are fixed in the global pose, only the pose of the inter node in the subclass in the global coordinate system is optimized, the convergence speed is greatly accelerated, and the optimization is fast.

2) And (3) accuracy: because the scale transformation from the subclass to the whole area and the similarity transformation are kept unchanged, for the same inter, the constraint of transforming a plurality of inter-intra relative constraints in adjacent subclasses into a global coordinate system is added, and the pose of the inter in the global coordinate system is optimized. Therefore, the optimization further improves the pose of the inter node on the basis of keeping the good overall optimization result of the abstract layer in the hierarchical decoupling.

And S250, constructing a three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses.

Since the absolute geometric transformation relation of the local coordinate system of each subclass in all the subclasses relative to the global coordinate system is known, the absolute position of the pixel point in the data or image in each subclass in the global coordinate system can be determined through the transformation relation, a point cloud model (such as sparse point cloud) in a three-dimensional space can be formed by integrating the absolute positions of all the data, and the point cloud model is further optimized to obtain the three-dimensional model.

Specifically, the pose and the three-dimensional points of the subclasses in the global coordinate system can be determined according to the absolute geometric transformation relation of the subclasses, and then the three-dimensional model of the original scene can be constructed according to the poses and the three-dimensional points of the multiple subclasses.

The capture device pose may refer to the position and pose (including rotation, translation, etc.) of the capture device relative to a coordinate system when capturing a datum. For example, when the data is an image, the pose of the capturing device may indicate the position and rotation angle of the capturing device when capturing an image. The three-dimensional point may be a point restored from the data or image acquired by the acquisition device and having position information in the global coordinate system.

The positions and three-dimensional points of the acquisition equipment of the subclasses can refer to all the positions and three-dimensional points of the acquisition equipment obtained by calculating and solving all the subclasses (or all the subclasses needing attention), a point cloud model can be restored through the positions and the three-dimensional points, and then a complete three-dimensional model can be constructed according to the point cloud model.

A three-dimensional reconstruction method according to another embodiment of the present application is described below with reference to fig. 4. The embodiment provides a method for updating an original scene three-dimensional model constructed by the three-dimensional reconstruction method in the embodiment of fig. 2.

According to the embodiment, local areas of any scale (scene changes caused by road construction, four-season changes, site modification, indoor decoration and the like) in the reconstructed scene can be updated.

The operation flow of the updating method according to the present embodiment is first summarized in general.

Firstly, deleting an area needing to be updated from a reconstructed scene, using all adjacent intra nodes within an inter on a user delineating boundary as a subclass C0, and fixing the global scale and the global pose of a C0 coordinate system. And then, for the area needing to be updated, acquiring data (images and laser point clouds) again on site, constructing EG for new data by means of feature matching, obtaining sub-clusters by graph cutting, expanding and generating adjacent subclasses to share inter nodes, and reconstructing all the subclasses in parallel. Then, each subclass (containing C0) is regarded as a node, scale and similarity transformation between adjacent subclasses is calculated through an inter node, and the absolute scale and pose of each subclass in a global coordinate system are solved through global optimization depending on the constraint between all the adjacent subclasses in space, so that clustering of the subclasses into a reconstructed scene (clustering together from zero reconstruction) is achieved. If the global pose of the shared data points among the subclasses needs to be finely tuned, the pose of the shared data points can be globally optimized through a second layer of the hierarchical decoupling framework.

As shown in fig. 4, the three-dimensional reconstruction method according to the present embodiment includes steps S401 to S412. Each step will be described in detail below.

S401, scene data acquired by the acquisition equipment aiming at the original scene are acquired.

S402, dividing the scene data into a plurality of subclasses, wherein each subclass in the subclasses comprises an internal node exclusive to the subclass and a common node shared with an adjacent subclass.

S403, determining a relative geometric transformation relation between the local coordinate systems of the two adjacent sub-classes according to the common node between each two adjacent sub-classes in the plurality of sub-classes.

S404, determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the relative geometric transformation relations of the plurality of subclasses.

S405, constructing a three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses.

Details regarding S401 to S405 are described above in detail with respect to the embodiment of fig. 2, and are not repeated here.

S406, obtaining a model area to be updated, which is sketched on the three-dimensional model of the original scene by the user.

The model region may refer to a portion of a three-dimensional model reconstructed for an original scene, which portion of the model needs to be updated.

When the model needs to be updated, the user can judge which part of the three-dimensional model needs to be updated or replaced, and then the user operates on the three-dimensional model to outline the part needing to be updated.

And S407, deleting the model area to be updated.

After a part of the model needing to be updated is outlined, the area can be directly deleted, the remained part is the rest model area, and a boundary (boundary line) is formed between the deleted area and the rest area.

And S408, merging the scene data of the rest model areas into the update front subclass.

The remaining model regions are constructed from data previously acquired for the original scene, and the relative and absolute geometric transformation relationships of the part of models and the subclasses used for constructing the models are optimized and calculated, so that all scene data used for constructing the remaining model regions can be directly merged into one subclass, namely, the previous subclass is updated for subsequent calculation.

And S409, dividing the scene data of the updated scene into a plurality of update subclasses.

The updated scene may refer to a portion of the scene that has changed relative to the original scene, which portion of the scene needs to replace the deleted scene model.

Scene data acquired by the updated scene is not optimized, so that the scene data needs to be segmented again and divided into a plurality of subclasses, and an accurate three-dimensional model of the updated scene can be constructed according to the scene data.

And S410, selecting common nodes between the pre-update subclasses and the update subclasses on the boundary of the rest model areas.

The boundary of the remaining model regions may refer to a boundary between the deleted model region and the model region that is not deleted. Since the model is built from data or nodes of the subclass, there must be nodes of the subclass on this boundary. By selecting these nodes, some of the nodes can be used as common nodes for connecting the pre-update subclass and one of the update subclasses.

S411, merging the pre-update subclasses and the plurality of update subclasses into a plurality of post-update subclasses.

After the common node is selected, the pre-update subclass and one of the update subclasses are connected together through the common node to become an adjacent subclass. The pre-update sub-classes and all update sub-classes are merged together to form a new sub-class cluster, i.e., a plurality of post-update sub-classes.

And S412, constructing a three-dimensional model of the updated scene according to the plurality of updated subclasses.

Building a three-dimensional model of the updated scene according to the plurality of updated sub-classes, which may be described with reference to the embodiment of fig. 1 or fig. 2. For example, first, a relative geometric transformation relationship between local coordinate systems of two adjacent post-update sub-classes is determined according to a common node between each two adjacent post-update sub-classes in the plurality of post-update sub-classes, then, an absolute geometric transformation relationship between the local coordinate systems of the post-update sub-classes relative to a global coordinate system is determined according to the relative geometric transformation relationship of the plurality of post-update sub-classes, and finally, a three-dimensional model of the post-update scene is constructed according to the absolute geometric transformation relationship of the plurality of post-update sub-classes.

A three-dimensional reconstruction method according to another embodiment of the present application is described below with reference to fig. 5. The embodiment provides a method for expanding an original scene three-dimensional model constructed by the three-dimensional reconstruction method in the embodiment of fig. 2.

According to the present embodiment, the reconstructed scene may be expanded outward to add a new scene.

The operational flow of the expanding method according to the present embodiment is first summarized in general.

Similar to the updating method, the reconstructed scene partial area can be updated, and intra cameras adjacent to the boundary in the reconstructed scene are regarded as a subclass C0. Acquiring new data of the area to be expanded, and according to the same process in the updating scheme: and (4) obtaining subclasses through graph cutting, and expanding to generate inter nodes- > calculating the scale between the adjacent subclasses, performing similarity transformation- > performing global optimization to solve the absolute scale, absolute rotation and absolute translation of the origin of each subclass coordinate system in the global coordinate system- > aggregating the subclasses- > optimizing the absolute pose of the inter nodes in the global coordinate system.

As shown in fig. 5, the three-dimensional reconstruction method of the present embodiment includes steps S501 to S511. Each step will be described in detail below.

S501, scene data acquired by the acquisition equipment aiming at the original scene are acquired.

S502, dividing the scene data into a plurality of subclasses, wherein each subclass in the subclasses comprises an internal node exclusive to the subclass and a common node shared with an adjacent subclass.

S503, determining a relative geometric transformation relation between the local coordinate systems of the two adjacent sub-classes according to the common node between each two adjacent sub-classes in the plurality of sub-classes.

And S504, determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the relative geometric transformation relations of the plurality of subclasses.

And S505, constructing a three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses.

Details regarding S501 to S505 are described above in detail with respect to the embodiment of fig. 2, and are not repeated here.

S506, obtaining a boundary between the three-dimensional model of the expanded scene and the three-dimensional model of the original scene.

The extended scene may refer to a scene that needs to be added on the basis of the original scene, and may be a scene of a geographical area adjacent to the original scene.

The extended scene is generally contiguous with the original scene, so that a boundary between the extended scene and the original scene can be determined, and the boundary is reflected on the three-dimensional model, namely the boundary between the three-dimensional model of the extended scene and the three-dimensional model of the original scene. This boundary may be determined by delineating or selecting a portion of the three-dimensional model of the original scene while the three-dimensional model of the extended scene is not yet constructed.

And S507, merging the scene data of the original scene into an extended front sub-class.

The scene data of the original scene is optimized and calculated to form a stable and reliable three-dimensional model, so that the data of the original scene does not need to be segmented, and all the scene data acquired aiming at the original scene can be directly merged into a subclass, namely, the front subclass is expanded.

And S508, dividing the scene data of the extended scene into a plurality of extended subclasses.

The scene data acquired for the extended scene is not yet calculated and optimized, and needs to be segmented (for example, by a graph segmentation algorithm) into a plurality of extended subclasses according to the previous ideas and algorithms.

S509, selecting a common node between the pre-extension subclass and the extension subclass on the boundary.

Since the positions of the boundary between the extended scene and the original scene on the original three-dimensional model are known, the three-dimensional model can be selected, and data nodes which can be used as common nodes between the pre-extension subclass and one of the extension subclasses are selected.

And S510, merging the pre-expansion subclass and the plurality of expansion subclasses into a plurality of post-expansion subclasses.

After one pre-extension subclass and one extension subclass are connected through the selected common node, the pre-extension subclass and all the extension subclasses can be combined together to form a new subclass cluster, namely a plurality of post-extension subclasses.

And S511, constructing a three-dimensional model of the expanded scene according to the plurality of expanded subclasses.

According to a plurality of extended subclasses, a three-dimensional model of the extended scene is constructed, and the specific manner of constructing the three-dimensional model can be described with reference to the embodiment of fig. 1 or fig. 2. For example, a relative geometric transformation relationship between local coordinate systems of two adjacent extended sub-classes is determined according to a common node between each two adjacent extended sub-classes in the plurality of extended sub-classes, an absolute geometric transformation relationship between the local coordinate systems of the extended sub-classes relative to a global coordinate system is determined according to the relative geometric transformation relationships of the plurality of extended sub-classes, and finally a three-dimensional model of the extended scene is constructed according to the absolute geometric transformation relationships of the plurality of extended sub-classes.

Based on the method embodiment described in fig. 2, an embodiment of the present application further provides a three-dimensional reconstruction apparatus, and a schematic structural diagram of the three-dimensional reconstruction apparatus is shown in fig. 6. The three-dimensional reconstruction apparatus is used to perform the steps of fig. 2.

According to the present embodiment, the three-dimensional reconstruction apparatus 600 includes: a first obtaining module 610, a first segmenting module 620, a first determining module 630, a second determining module 640, and a first constructing module 650.

The first obtaining module 610 is configured to obtain scene data that is collected by a collection device for an original scene.

The first segmentation module 620 is configured to segment the scene data into a plurality of sub-classes, where each sub-class of the plurality of sub-classes includes an internal node dedicated to the sub-class and a common node shared with an adjacent sub-class, and the sub-classes are a set of images, voxels, point clouds, and/or grids.

The first determining module 630 is configured to determine a relative geometric transformation relationship between local coordinate systems of two adjacent sub-classes according to a common node between each two adjacent sub-classes in the plurality of sub-classes.

The second determining module 640 is configured to determine an absolute geometric transformation relationship between the local coordinate system of the sub-class and the global coordinate system according to the relative geometric transformation relationships of the sub-classes.

The first building module 650 is configured to build a three-dimensional model of an original scene according to absolute geometric transformation relationships of a plurality of subclasses.

Based on the method embodiment described in fig. 4, an embodiment of the present application further provides a three-dimensional reconstruction apparatus, and a schematic structural diagram of the three-dimensional reconstruction apparatus is shown in fig. 7. The three-dimensional reconstruction apparatus is used to perform the steps of fig. 4.

According to the present embodiment, the three-dimensional reconstruction apparatus 700 includes: a first obtaining module 701, a first dividing module 702, a first determining module 703, a second determining module 704, a first constructing module 705, a second obtaining module 706, a deleting module 707, a first combining module 708, a second dividing module 709, a first selecting module 710, a second combining module 711, and a second constructing module 712.

The first obtaining module 701 is configured to obtain scene data collected by a collection device for an original scene.

The first segmentation module 702 is configured to segment the scene data into a plurality of sub-classes, where each sub-class of the plurality of sub-classes includes an internal node dedicated to the sub-class and a common node shared by adjacent sub-classes, and the sub-classes are a set of images, voxels, point clouds, and/or grids.

The first determining module 703 is configured to determine, according to a common node between each two adjacent sub-classes of the multiple sub-classes, a relative geometric transformation relationship between local coordinate systems of the two adjacent sub-classes.

The second determining module 704 is configured to determine an absolute geometric transformation relationship of the local coordinate system of the sub-class with respect to the global coordinate system according to the relative geometric transformation relationships of the plurality of sub-classes.

The first constructing module 705 is configured to construct a three-dimensional model of an original scene according to absolute geometric transformation relations of a plurality of subclasses.

The second obtaining module 706 is configured to obtain a model area to be updated, which is sketched on the three-dimensional model of the original scene by the user.

The deleting module 707 is configured to delete the model area to be updated.

The first merging module 708 is configured to merge the remaining scene data of the model region into the pre-update subclass.

The second partitioning module 709 is configured to partition the scene data of the update scene into a plurality of update subclasses.

The first selection module 710 is configured to select a common node between the pre-update sub-class and the update sub-class on the boundary of the remaining model area.

The second merging module 711 is configured to merge the pre-update sub-class and the multiple update sub-classes into multiple post-update sub-classes.

The second constructing module 712 is configured to construct a three-dimensional model of the updated scene according to the plurality of updated subclasses.

Based on the method embodiment described in fig. 5, an embodiment of the present application further provides a three-dimensional reconstruction apparatus, and a schematic structural diagram of the three-dimensional reconstruction apparatus is shown in fig. 8. The three-dimensional reconstruction apparatus is configured to perform the steps of fig. 5.

According to the present embodiment, the three-dimensional reconstruction apparatus 800 includes: a first obtaining module 801, a first segmentation module 802, a first determination module 803, a second determination module 804, a first construction module 805, a third obtaining module 806, a third merging module 807, a third segmentation module 808, a second selection module 809, a fourth merging module 810 and a third construction module 811.

The first obtaining module 801 is configured to obtain scene data collected by a collection device for an original scene.

The first segmentation module 802 is configured to divide the scene data into a plurality of sub-classes, where each sub-class of the plurality of sub-classes includes an internal node dedicated to the sub-class and a common node shared by adjacent sub-classes, and the sub-classes are a set of images, voxels, point clouds, and/or grids.

The first determining module 803 is configured to determine, according to a common node between each two adjacent sub-classes in the multiple sub-classes, a relative geometric transformation relationship between local coordinate systems of the two adjacent sub-classes.

The second determining module 804 is configured to determine an absolute geometric transformation relationship between the local coordinate system of the sub-class and the global coordinate system according to the relative geometric transformation relationships of the plurality of sub-classes.

The first building module 805 is configured to build a three-dimensional model of an original scene according to absolute geometric transformation relationships of multiple sub-classes.

The third obtaining module 806 is configured to obtain a boundary between the three-dimensional model of the extended scene and the three-dimensional model of the original scene.

The third merging module 807 is configured to merge the scene data of the original scene into an extended pre-sub-category.

Wherein the third partitioning module 808 is configured to partition the scene data of the extended scene into a plurality of extended sub-classes.

The second selecting module 809 is configured to select, on the boundary, a common node between the pre-extension sub-class and the extension sub-class.

The fourth merging module 810 is configured to merge the pre-extension subclass and the plurality of extension subclasses into a plurality of post-extension subclasses.

The third constructing module 811 is configured to construct a three-dimensional model of the extended scene according to the multiple extended subclasses.

It should be noted that, when the three-dimensional reconstruction apparatuses 600, 700, and 800 provided in the embodiments shown in fig. 6, fig. 7, and fig. 8 execute the three-dimensional reconstruction method, only the division of the above functional modules is illustrated, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the three-dimensional reconstruction devices 600, 700, and 800 provided in the above embodiments and the three-dimensional reconstruction method embodiments shown in fig. 2, fig. 4, and fig. 5 respectively belong to the same concept, and specific implementation processes thereof are detailed in the method embodiments and are not described herein again.

Fig. 9 is a schematic hardware configuration diagram of a computing device 900 according to an embodiment of the present application.

Referring to fig. 9, the computing device 900 includes a processor 910, a memory 920, a communication interface 930, and a bus 940, and the processor 910, the memory 920, and the communication interface 930 are connected to each other by the bus 940. The processor 910, memory 920, and communication interface 930 may also be connected by connections other than the bus 940.

The memory 920 may be various types of storage media, such as Random Access Memory (RAM), read-only memory (ROM), non-volatile RAM (NVRAM), programmable ROM (PROM), erasable PROM (EPROM), electrically Erasable PROM (EEPROM), flash memory, optical memory, hard disk, and the like.

Among other things, the processor 910 may be a general-purpose processor, which may be a processor that performs certain steps and/or operations by reading and executing content stored in a memory (e.g., the memory 920). For example, a general purpose processor may be a Central Processing Unit (CPU). The processor 910 may include at least one circuit to perform all or part of the steps of the three-dimensional reconstruction methods provided by the embodiments shown in fig. 2, 4, and 5.

The communication interfaces 930 include input/output (I/O) interfaces, physical interfaces, logical interfaces, and the like that enable interconnection of devices within the computing device 900, as well as interfaces that enable interconnection of the computing device 900 with other devices, such as other computing devices or user devices. The physical interface may be an ethernet interface, a fiber interface, an ATM interface, etc. The communication interface 930 may be external to the input device and the output device. For example, the input means may be a microphone or an array of microphones for capturing a speech input signal; may be a communications network connector for receiving the collected input signals from a cloud or other device; but may also include, for example, a keyboard, mouse, etc. The output means may output various information including the determined distance information, direction information, and the like to the outside. Output devices may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, among others.

The bus 940 may be any type of communication bus, such as a system bus, used to interconnect the processor 910, the memory 920, and the communication interface 930.

The above devices may be respectively disposed on separate chips, or at least a part or all of the devices may be disposed on the same chip. Whether each device is separately located on a different chip or integrated on one or more chips is often dependent on the needs of the product design. The embodiment of the present application does not limit the specific implementation form of the above device.

The computing device 900 shown in fig. 9 is merely exemplary, and in implementations, the computing device 900 may include other components, which are not listed here.

Embodiments of the present application may also be a computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the steps in the three-dimensional reconstruction method according to various embodiments of the present application described hereinabove.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The concepts, principles and concepts of the present application have been described above in detail in connection with specific embodiments (including examples and illustrations). Those skilled in the art will appreciate that the embodiments of the present application are not limited to the above-described forms, and that any possible modifications, substitutions and equivalents of the steps, methods, apparatuses and components of the above-described embodiments may be made by those skilled in the art after reading the present specification, and that such modifications, substitutions and equivalents are to be considered as falling within the scope of the present application. The scope of protection of this application is only governed by the claims.

Claims

1. A method of three-dimensional reconstruction, the method comprising:

acquiring scene data acquired by acquisition equipment aiming at an original scene;

dividing the scene data into a plurality of sub-classes, each sub-class in the plurality of sub-classes comprising an internal node dedicated to the sub-class and a common node shared with an adjacent sub-class, the sub-classes being a set of images, voxels, point clouds and/or meshes;

determining a relative geometric transformation relation between local coordinate systems of each two adjacent sub-classes according to a common node between the two adjacent sub-classes in the plurality of sub-classes;

determining an absolute geometric transformation relation of a local coordinate system of the subclass relative to a global coordinate system according to the relative geometric transformation relations of the subclasses;

and constructing a three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses.

2. The method of claim 1, wherein the dividing the scene data into a plurality of sub-classes comprises:

and dividing the scene data into a plurality of subclasses by a graph cutting algorithm.

3. The three-dimensional reconstruction method of claim 1, wherein there are a plurality of common nodes between two adjacent sub-classes, and the determining the relative geometric transformation relationship between the local coordinate systems of the two adjacent sub-classes according to the common nodes between each two adjacent sub-classes in the plurality of sub-classes comprises:

determining a plurality of relative geometric transformation relations between local coordinate systems of each two adjacent sub-classes according to a plurality of common nodes between each two adjacent sub-classes in the plurality of sub-classes, wherein the plurality of relative geometric transformation relations form a group of relative geometric transformation relations;

wherein, the determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the relative geometric transformation relations of the subclasss comprises:

and determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the multiple groups of relative geometric transformation relations of the subclasses.

4. The three-dimensional reconstruction method of claim 3, wherein said determining an absolute geometric transformation relationship of a local coordinate system of said sub-class with respect to a global coordinate system based on a plurality of sets of relative geometric transformation relationships of said plurality of sub-classes comprises:

and calculating the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system through an L1 loss function according to the multiple groups of relative geometric transformation relations of the subclasss.

5. The method according to claim 1, wherein said constructing a three-dimensional model of the original scene according to the absolute geometric transformation relationships of the sub-classes comprises:

determining the position and the three-dimensional point of the acquisition equipment of the subclass in a global coordinate system according to the absolute geometric transformation relation of the subclass;

and constructing a three-dimensional model of the original scene according to the poses and the three-dimensional points of the acquisition equipment of the subclasses.

6. The three-dimensional reconstruction method of any one of claims 1 to 5, further comprising:

acquiring a model area to be updated, which is sketched on the three-dimensional model of the original scene by a user;

deleting the model area to be updated;

merging the scene data of the rest model areas into updated pre-subclasses;

dividing scene data of an update scene into a plurality of update subclasses;

selecting a common node between the pre-update sub-class and the update sub-class on the boundary of the remaining model region;

merging the pre-update sub-class and the plurality of update sub-classes into a plurality of post-update sub-classes;

and constructing a three-dimensional model of the updated scene according to the plurality of updated subclasses.

7. The three-dimensional reconstruction method according to any one of claims 1 to 5, characterized in that the method further comprises:

acquiring a boundary between the three-dimensional model of the extended scene and the three-dimensional model of the original scene;

merging the scene data of the original scene into an extended pre-subclass;

dividing the scene data of the extended scene into a plurality of extended subclasses;

selecting a common node on the boundary between the pre-extension sub-class and the extension sub-class;

merging the pre-extension subclasses and the plurality of extension subclasses into a plurality of post-extension subclasses;

and constructing a three-dimensional model of the expanded scene according to the plurality of expanded sub-classes.

8. The three-dimensional reconstruction method according to any one of claims 1 to 5, wherein the relative geometric transformation relation comprises a relative scale transformation relation, a relative rotation transformation relation, and a relative translation transformation relation, and the absolute geometric transformation relation comprises an absolute scale transformation relation, an absolute rotation transformation relation, and an absolute translation transformation relation.

9. The three-dimensional reconstruction method according to any one of claims 1 to 5, wherein the acquisition device includes one or more of an inertial sensor, a lidar, an ultrasonic radar, a millimeter wave radar, a visible light camera, and an infrared camera.

10. A three-dimensional reconstruction apparatus, characterized in that the apparatus comprises:

the first acquisition module is used for acquiring scene data acquired by acquisition equipment aiming at an original scene;

the first segmentation module is used for segmenting the scene data into a plurality of subclasses, wherein each subclass in the plurality of subclasses comprises an internal node exclusive to the subclass and a common node shared by adjacent subclasses;

a first determining module, configured to determine, according to a common node between each two adjacent sub-classes in the plurality of sub-classes, a relative geometric transformation relationship between local coordinate systems of the two adjacent sub-classes;

a second determining module, configured to determine, according to the relative geometric transformation relationships of the multiple sub-classes, an absolute geometric transformation relationship of the local coordinate system of the sub-class with respect to the global coordinate system;

and the first construction module is used for constructing the three-dimensional model of the original scene according to the absolute geometric transformation relations of the subclasses.

11. The three-dimensional reconstruction apparatus of claim 10, wherein the first segmentation module is further configured to:

12. The three-dimensional reconstruction apparatus of claim 10, wherein there are a plurality of common nodes between the two adjacent sub-classes, the first determining module is further configured to:

determining a plurality of relative geometric transformation relations between the local coordinate systems of each two adjacent sub-classes to form a group of relative geometric transformation relations according to a plurality of common nodes between each two adjacent sub-classes in the plurality of sub-classes;

wherein the second determination module is further configured to:

and determining the absolute geometric transformation relation of the local coordinate system of the subclass relative to the global coordinate system according to the multiple groups of relative geometric transformation relations of the subclasss.

13. The three-dimensional reconstruction apparatus of claim 12, wherein the second determination module is further configured to:

14. The three-dimensional reconstruction apparatus of claim 10, wherein the first construction module is further configured to:

15. The three-dimensional reconstruction apparatus of any one of claims 10 to 14, further comprising:

the second acquisition module is used for acquiring a model area to be updated, which is sketched on the three-dimensional model of the original scene by a user;

the deleting module is used for deleting the model area to be updated;

the first merging module is used for merging the scene data of the rest model areas into update pre-subclasses;

the second segmentation module is used for dividing the scene data of the updated scene into a plurality of update subclasses;

a first selection module for selecting a common node between the pre-update sub-class and the update sub-class on a boundary of the remaining model region;

a second merging module, configured to merge the pre-update subclass and the multiple update subclasses into multiple post-update subclasses;

and the second construction module is used for constructing a three-dimensional model of the updated scene according to the plurality of updated subclasses.

16. The three-dimensional reconstruction apparatus of any one of claims 10 to 14, further comprising:

a third obtaining module, configured to obtain a boundary between the three-dimensional model of the extended scene and the three-dimensional model of the original scene;

a third merging module, configured to merge the scene data of the original scene into an extended pre-subclass;

the third segmentation module is used for segmenting the scene data of the extended scene into a plurality of extended subclasses;

a second selecting module, configured to select a common node between the pre-extension sub-class and the extension sub-class on the boundary;

a fourth merging module, configured to merge the pre-extension subclass and the multiple extension subclasses into multiple post-extension subclasses;

and the third construction module is used for constructing a three-dimensional model of the expanded scene according to the plurality of expanded subclasses.

17. The three-dimensional reconstruction apparatus according to any one of claims 10 to 14, wherein the relative geometric transformation relationship includes a relative scale transformation relationship, a relative rotation transformation relationship, and a relative translation transformation relationship, and the absolute geometric transformation relationship includes an absolute scale transformation relationship, an absolute rotation transformation relationship, and an absolute translation transformation relationship.

18. The apparatus of any one of claims 10 to 14, wherein the acquisition device comprises one or more of an inertial sensor, a lidar, an ultrasonic radar, a millimeter wave radar, a visible light camera, and an infrared camera.

19. A computing device, characterized in that the computing device comprises a processor and a memory, the processor being configured to execute a computer program stored in the memory to implement the three-dimensional reconstruction method of any one of claims 1 to 9.

20. A computer-readable storage medium, characterized in that it stores a computer program for executing the three-dimensional reconstruction method according to any one of claims 1 to 9.