CN114037804A

CN114037804A - Indoor scene reconstruction method combining body-level feature constraints

Info

Publication number: CN114037804A
Application number: CN202210030559.3A
Authority: CN
Inventors: 韩东; 施晓东; 时荔蕙; 王春龙; 孙镱诚; 丁阳; 乐意; 刘延杰; 陆中祥; 陆萍
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2022-01-12
Filing date: 2022-01-12
Publication date: 2022-02-11

Abstract

The invention provides an indoor scene reconstruction method combining body-level feature constraint, which comprises the following steps of: estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm to realize the separation of the rotation matrix and depth noise; according to a local subgraph fusion algorithm based on point and plane features and based on manifold space, a new plane parameterization method is developed to solve the problems of over-parameterization and local minimum; according to a combination body-level feature constraint algorithm, further utilizing a deep neural network detection technology to expand available information to object-level features; and (3) by taking the ORB-SLAM2 as a point tracking module, expanding the situation that the traditional bundle adjustment model only minimizes the reprojection error according to the establishment of a combined observation constraint framework containing point characteristics, object outsourcing rectangle information and plane information, and integrating to form the rapid construction capability of the indoor scene combined with the body-level characteristic constraint.

Description

Indoor scene reconstruction method combining body-level feature constraints

Technical Field

The invention relates to an indoor scene reconstruction method, in particular to an indoor scene reconstruction method combining body-level feature constraint.

Background

The filtering aspect of the depth image mainly comprises the steps of removing artifacts, smoothing similar areas, deleting outliers and improving the boundary precision of the depth image. At the position of discontinuous depth, the depth map is blurred, and the bilateral filtering method is a common method for edge-preserving and denoising in the aspect of image processing. Kopf et al use joint bilateral filtering to reduce noise. Mueller et al propose a self-adaptive combined trilateral filtering algorithm, which considers the confidence metric problem of selecting a stereo matching algorithm in the aspect of filtering weight and combines a stereo image to filter a depth image. But with a slight deficiency that they do not take into account missing depth data nor the temporal correlation of the Kinect to acquire depth maps.

Depth map hole filling is a very challenging task, especially in disparity maps generated based on stereoscopic systems. Due to the existence of occlusion, an area of unknown disparity value will always be generated, resulting in the generation of a hole. To reduce the impact of the occlusion problem, Deng et al propose a stereo matching algorithm under an energy minimization framework. Wan et al use a two-view geometry to fill in the missing depth information. To generate the content of 3DTV (three-dimensional television), a similar problem is posed in the application of depth map rendering: from one depth map, multiple virtual views are generated to facilitate the repair of the generated holes. In some image restoration algorithms, the missing information is recovered by the supplemental information generated by the two disparity maps. One of the main drawbacks of the above-mentioned hole repairing methods is that these methods are related to stereoscopic vision or multi-view vision, and the lost information is inferred by combining images (visual information) of different viewpoints in the same scene. For the research of the hole repairing method with the RGB-D image, the prior method cannot be directly adopted because the depth sensor acquires both depth information and visual information. Due to the recently published depth sensor-based related papers, mainly focusing on specific application development like animation, motion sensing games, etc., and less research on noise removal. Matyunine et al propose a special depth map denoising method based on spatio-temporal median filtering, which is implemented by motion vector information. Hole repair is achieved by median filtering, which considers only the similarity pixels in the color domain. In addition, similar to the method described above, Patel et al construct a 3D object data set for identification purposes, with a 5 x 5 recursive median filter on one scale for hole repair. Therefore, the above method has problems that: the performance exhibited by the median filtering based method is rather limited since no visual information is taken into account, and the inserted new depth values depend only on spatial position, which will result in an inaccurate depth map being generated. Therefore, the research of more effective denoising and cavity repairing algorithms is a key point for realizing high-efficiency and high-quality three-dimensional reconstruction effects.

Currently, many scholars have conducted intensive research into increasing the resolution of depth maps for depth images captured by TOF cameras. For enhancing the resolution of depth maps produced by depth scanners or TOF cameras, there are mainly interpolation-based methods or graphics-based methods; under certain assumptions Garcia proposes a mrf (markov random field) -based method, which is based on certain assumptions: depth discontinuities in a scene vary with the intensity and brightness of the scene, i.e., within a neighborhood, similar region intensities correspond to similar depths. However, this is done on-the-fly, which tends to cause over-smoothing of the solution. However, these sensors are built on the TOF measurement principle. Patra1et et al propose a method of improving resolution that makes full use of high resolution information obtained from RGB images and HD (high precision) camera video images, but does not take depth images into account and is essentially indistinguishable from three-dimensional reconstruction using several two-dimensional images.

In terms of camera calibration, multi-view geometry (multi-view geometry) theory is a concern. The method relates to several key technologies of camera self-calibration, camera pose determination and the like, the contribution in computer vision is more and more prominent, and the method is more practical and effective in solving the practical application problem, so that the direction may bring a new idea for camera calibration.

The most prominent approach in three-dimensional reconstruction is to use hand-designed descriptors based on orientation histograms, like SIFT and SURF, which are widely applied to a bow (bag of words) model and successfully applied to many different identification problems. However, these descriptors tend to lose some useful information. For example, the initial data provided typically contains much information, such as color, texture, etc., whereas in practice only gray scale information is used, and the rest of the information is not incorporated. This constraint prevents the performance of practical applications that make use of these descriptors. In order for these feature descriptors to contain additional, more global information, researchers have proposed new algorithms, such as color histograms. The Besl and McKay propose an iterative closest point ICP algorithm, and a new idea of three-dimensional reconstruction is opened. Of course, the classical algorithm has its own limitations, and in fact, the method is more suitable for fine tuning situations. Magnusson et al hope to achieve three-dimensional reconstruction by performing coordinate transformations by geometric features, but these three-dimensional data are scattered and the triangular meshes describing these data are not well mastered, so that the experimental results are less than ideal. Recently, for the registration problem of RGB-D images, Henry et al registered the extracted textured surface patches into the model using the ICP algorithm. Meanwhile, a graph-optimization (graph-optimization) method is adopted to improve the mapping precision. Engelhard et al matched the SURF (speeded Up Robust feature) features between RGB-D image frames using the ICP algorithm and optimized the registration estimates. However, these methods do not seamlessly integrate depth information and two-dimensional image information together at the time of data stitching. The three-dimensional reconstruction of domestic research relates to the RGB-D directions and mainly comprises the following steps: liuxin and the like of Chinese academy of sciences provide a rapid object reconstruction method based on a Kinect depth sensor with higher robustness, solve the problem of point cloud registration error, stably obtain a more ideal three-dimensional reconstruction model and obtain a good reconstruction effect on the three-dimensional reconstruction of an object with shielding. If the robot needs to accurately determine the position of an object, six degrees of freedom need to be eliminated, and the Rooibos and the like of the Qinghua university propose an incremental parameterized model by using a Kinect sensor, so that the problem of positioning of the indoor robot is solved, and motion estimation of the six degrees of freedom is realized. Royuan et al of Chongqing post and telecommunications university use the Kinect depth sensor for gesture recognition and convert it into control commands to implement various operations on images. The people of the large opening of Chinese science and technology, and the like, put forward a brand-new virtual-real fusion framework based on depth information by means of Kinect Sensor hardware equipment, and realize visual consistency with high precision. Wangningbo et al at university of Zhejiang combines RGB-D images to perform pedestrian detection. The 3D indoor environment is subjected to map reconstruction by the aid of a depth sensor by the aid of Zhuxiao and other people of Shanghai traffic university, and a good reconstruction effect is achieved.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of providing an indoor scene reconstruction method combining body-level feature constraints aiming at the defects of the prior art.

In order to solve the technical problem, the invention discloses an indoor scene reconstruction method combining body-level feature constraint, which comprises the following steps of:

step 1, estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm, so as to realize separation of the rotation matrix and depth noise;

step 2, constructing an optimized plane parameterization method based on manifold space according to a local subgraph fusion algorithm based on point and plane characteristics;

step 3, expanding available information to object-level features by utilizing a deep neural network detection technology according to a combination object-level feature constraint algorithm;

and step 4, integrating to form the indoor scene construction constrained by the combination body level characteristics by taking ORB-SLAM2 (reference: Ra l Mur-Artal, and Juan D. Tard is Yu s. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D cameras. Arxiv preprintiv: 1610.06475, 2016) as a point tracking module according to the condition of establishing a combined observation constraint frame containing point characteristics, object outsourcing rectangle information and plane information and expanding a beam method adjustment model to only minimize the reprojection errors.

The camera attitude decoupling estimation algorithm in the step 1 of the invention comprises the following steps:

step 1-1, establishing a multi-scale window Gaussian mixture uncertain model;

step 1-2, establishing a new uncertainty model;

and 1-3, establishing an attitude decoupling estimation algorithm.

The multi-scale window Gaussian mixture uncertainty model is based on quantitative analysis of depth observation uncertainty distribution, independent pixel uncertainty and depth sensor error distribution characteristics are combined, and depth observation uncertainty in different window neighborhoods is brought into the multi-scale window Gaussian mixture uncertainty model by Gaussian mixture.

The attitude decoupling estimation algorithm in the steps 1-3 of the invention comprises the following steps:

step 1-3-1, separating the estimation of a rotation matrix from the noise interference of depth values by using visual characteristic point information in a texture image, and estimating the rotation matrix on the basis;

step 1-3-2, estimating a translation vector, and converting optimization of the translation vector into a corresponding energy function by combining observation information in a depth image, wherein the method comprises the following steps:

in the above formula, the first and second carbon atoms are,

three-dimensional coordinates representing a rotation matrix, a feature point

And the pose of the camera

In order for the variables to be optimized,

and

is the pixel value of the projected point of the feature point on the left and right texture images,mthe logarithm of feature points representing a match between two frames,Kis an internal orientation element of the camera,uandva column number and a line number representing the picture,

is composed ofuvA covariance matrix of the observations.

The local subgraph fusion algorithm based on the point and plane features in the step 2 of the invention comprises the following steps:

step 2-1, optimizing the attitude precision of the camera;

and 2-2, generating a local subgraph.

The fusion objective function generated by the local subgraph in step 2-2 of the invention is as follows:

wherein,

and

in order to be a matrix of rotations,

and

is the firstiThe anchor points of the image blocks to which the group corresponds,

and

is thatlAndl+1on the frameiThe set of normal vectors is then used to generate,

is the number of corresponding image blocks between two frames, the energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block;

generating a local subgraph by using points, image blocks and planes as observation information sources based on a plane parameterization method; the global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs is realized by minimizing a fusion objective function; the incremental association mode realizes three-dimensional online reconstruction by introducing a plane data association method.

The combination body-level feature constraint algorithm in the step 3 of the invention comprises the following steps:

step 3-1, local adjustment optimization;

step 3-2, performing multi-constraint condition joint optimization;

and 3-3, denoising and fusing the point cloud by adopting a point cloud fusion method based on a truncation distance function, and extracting a three-dimensional model.

The local adjustment optimization in the step 3-1 of the invention adopts two adjustment methods: firstly, executing map point invariance in a tracking thread to optimize adjustment of a camera attitude variable; the second performs local window adjustment when new key frames are added.

In the step 3-2 of the invention, the multi-constraint condition joint optimization adopts two types of constraint conditions of plane features and object-level boundary features; extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints, and simultaneously including object-level feature information and plane feature constraints in joint optimization, wherein an optimization equation is as follows:

in the above formula, the rotation matrix of the camera

And the plane vector

In order for the variables to be unknown,

as a special orthogonal group, as part of a lie group,

is a corresponding covariance matrix, is associated with the feature point scale,

is a point in the map that is,

is that

And (4) observing.

The joint observation constraint framework based on the object-level features in the step 4 of the invention specifically comprises the following steps:

step 4-1, system construction;

step 4-2, constructing a local map;

4-3, constructing a global map;

in step 4-2, the local map construction adopts an observation model with the scanning posture as a variable, an incremental optimization model SLAM + + is adopted to optimize the scanning posture and the space point coordinates, and the constraint of the capability equation is defined as follows:

wherein,

is that the global three-dimensional space point is converted into a local coordinate system, and then the pixel error projected on the image,

is a data acquisition pose constraint that is,

and

the weights corresponding to the two;

in step 4-3, the global map is constructed by firstly adopting similarity retrieval based on a visual model, secondly adopting a GMS feature matching method to solve wide baseline matching between local maps and finally adopting a general map optimization method to optimize so as to obtain accurate position and attitude information of the local maps under a global coordinate system.

The implementation of the invention is based on the following principle:

and researching a camera attitude decoupling estimation algorithm, and decoupling the rotation estimation and the absolute scale estimation. The algorithm principle is as follows: firstly, a rotation matrix is estimated by means of texture image information, and the influence of depth observation value noise on the rotation matrix is reduced; secondly, the estimation of the absolute translation component and the rotation matrix is separated, and linear and nonlinear factors are separately processed, so that the speed and the stability of the algorithm are improved; and thirdly, the estimation of the absolute translation component uses all observed values, and not only sparse feature points.

The method is characterized by researching a local sub-graph fusion algorithm based on point and plane features, wherein the algorithm principle is to research a traditional sub-graph fusion algorithm based on point features and extend the algorithm to the plane features. The plane features are less affected by noise of the depth observation value, and the plane can provide better structural information in the process of recovering the global three-dimensional structure so as to help the algorithm to converge to the global optimal value. Aiming at the problem that plane features have no effective descriptors, an incremental plane data association method is researched by utilizing a local sub-graph recoverable covariance matrix.

Researching a combination body-level feature constraint algorithm, wherein the algorithm principle is that available information is expanded to object-level features on each key frame by using a deep neural network detection technology;

based on an ORB-SLAM2 as a point tracking module, a combined observation constraint framework containing point features, object outsourcing rectangle information and plane information is researched and established. The principle is that the situation that a traditional beam adjustment model only minimizes a reprojection error is expanded, and an iterative algorithm is rapidly guided to converge to an optimal solution, so that the indoor scene rapid construction capability combined with body-level feature constraint is better integrated and formed.

Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: estimating a rotation matrix by using texture image information to realize the separation of the rotation matrix and depth noise; a new plane parameterization method is developed based on manifold space, and the problems of over-parameterization and local minimum of a three-dimensional reconstruction algorithm of the traditional point characteristics are solved; expanding the available information to object-level features by using a deep neural network detection technology; the ORB-SLAM2 is used as a point tracking module, and the situation that the traditional bundle adjustment model only minimizes the reprojection error is expanded.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

Fig. 1 is a general technical schematic of the present invention.

FIG. 2 is a schematic diagram of a camera pose decoupling estimation algorithm framework of the present invention.

FIG. 3 is a frame diagram of a local subgraph fusion algorithm based on point and plane features according to the invention.

FIG. 4 is a block diagram of the combination body-level feature constraint algorithm framework of the present invention.

FIG. 5 is a schematic diagram of the concept of the joint observation constraint framework research based on object-level features.

Detailed Description

The following describes the embodiments of the present invention in further detail with reference to the accompanying drawings.

The invention relates to an indoor scene reconstruction method combined with body-level feature constraint, as shown in fig. 1, the invention comprises the following steps:

1) researching a camera attitude decoupling estimation algorithm, estimating a rotation matrix by using texture image information, and realizing the separation of the rotation matrix and depth noise;

the method comprises the steps of RGBD characteristic preprocessing result, joint optimization and camera attitude;

wherein the joint optimization includes decoupled camera rotation and feature point constraints and decoupled camera translation and feature point constraints.

2) Researching a local subgraph fusion algorithm based on point and plane features, developing a new plane parameterization method based on manifold space, and solving the problems of over-parameterization and local minimum;

the method comprises the steps of selecting a key frame, generating a local subgraph, forming a three-dimensional point cloud by a global model and accurate posture information;

the local sub-image generation comprises the steps of image block extraction, plane parameterization, local attitude solution and key frame selection;

the global model comprises feature point association, plane data association, observation model definition and fusion algorithm.

3) Researching a combination body-level feature constraint algorithm, and further expanding available information to object-level features by utilizing a deep neural network detection technology;

the method comprises the steps of image sequence, ORG-SLAM2 point tracking and local optimization, object detection, plane feature constraint, object-level boundary constraint, joint optimization, point cloud fusion and high-precision point cloud.

4) The ORB-SLAM2 is used as a point tracking module, a combined observation constraint framework containing point features, object outsourcing rectangle information and plane information is researched and established, the situation that a traditional beam method adjustment model only minimizes reprojection errors is expanded, and the indoor scene rapid construction capability combined with body-level feature constraint is formed in an integrated mode.

The invention aims at the scene of fast construction of indoor scenes combined with body-level feature constraints. Based on the image source information, a virtual model is created for the indoor multi-dimensional space entity object in a digital mode, and all characteristics of the indoor space target are accurately shown through the simulation model.

As shown in fig. 2, the camera pose decoupling estimation algorithm framework of the present invention specifically includes the following steps:

1) and establishing a multi-scale window Gaussian mixture uncertain model. Based on the quantitative analysis of the depth observation uncertainty distribution, a multi-scale window Gaussian mixture uncertainty model is researched and provided by combining the uncertainty of the independent pixel and the error distribution characteristic of the depth sensor, and the depth observation uncertainties in different window neighborhoods are taken into consideration by utilizing Gaussian mixture.

2) And establishing a new uncertainty model. The distribution description which is more real for the depth observation can be generated under the condition of more complex environment, and the observation weight can be measured more finely in optimization.

3) Aiming at the problem that the traditional characteristic points are greatly influenced by depth noise, an attitude decoupling estimation algorithm is provided;

the method comprises the steps of image acquisition through a camera, RGB feature matching, preprocessing, joint optimization and accurate posture information acquisition.

The preprocessing comprises selecting key frames, extracting planes and associating the planes with current feature points;

the joint optimization comprises the steps of decoupling camera rotation and feature point constraint, decoupling camera translation and feature point constraint and feature point and plane constraint.

Separating the estimation of the rotation matrix from the noise interference of the depth value by using the visual characteristic point information in the texture image, and estimating the rotation matrix on the basis; and (4) translation vector estimation, namely, combining observation information in the depth image to convert the optimization of the translation vector into a corresponding energy function.

As shown in fig. 3, the local subgraph fusion algorithm framework based on point and plane features in the present invention specifically includes the following steps:

1) and optimizing the accuracy of the camera posture. The wide existence of plane characteristics in an indoor environment is reasonably utilized, and the camera attitude estimation precision is improved;

2) generating a local subgraph;

the method comprises the steps of obtaining an RGBD image sequence from a camera, solving a local attitude, obtaining a global model and obtaining accurate camera attitude information;

the local attitude solving comprises key frame selection, image block extraction, plane parameterization and local attitude solving;

the global model comprises plane data association, feature point data association, observation model definition and fusion algorithm.

A plane parameterization method is provided based on a lie group space, and in the step of local subgraph generation, points, image blocks and planes are used as observation information sources. The image blocks and planes are more robust to depth noise, so that more accurate and robust local subgraphs are generated. The global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs can be realized by minimizing the following fusion objective function.

Wherein,

and

in order to be a matrix of rotations,

and

and

is thatlAndl+1on the frameiThe set of normal vectors is then used to generate,

is the number of corresponding image blocks between two frames and the energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block.

The incremental association mode realizes efficient three-dimensional online reconstruction by introducing a plane data association method.

As shown in fig. 4, the combination body-level feature constraint algorithm framework of the present invention specifically includes the following steps:

1) and (6) local adjustment optimization. The front end of ORB-SLAM2 is used as a feature point to participate in local optimization. Two special adjustment techniques are proposed in local beam adjustment: tracking, local maps;

a) and executing adjustment of map points and optimizing camera attitude variables in the tracking thread.

The method comprises data preprocessing and attitude estimation. Tracking the local map and the new associated frame several steps.

b) Local window adjustment is performed when new key frames are added.

The method comprises the steps of key frame insertion, map point management, new map construction, local beam adjustment and local key frame management.

2) And (4) performing multi-constraint joint optimization. Two types of constraint conditions of plane features and object-level boundary features are adopted. And extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints.

The method comprises the steps of using an object detector SSD to perform object detection, plane feature constraint, object-level boundary constraint and joint optimization.

Simultaneously, the object-level feature information and the plane feature constraint are included in the joint optimization, and the optimization equation is as follows:

in the above formula, the rotation matrix of the camera

And the plane vector

In order for the variables to be unknown,

as a special orthogonal group, as part of a lie group,

is a point in the map that is,

is that

And (4) observing.

3) And denoising and fusing the point cloud by adopting a TSDF point cloud fusion method, and further extracting a high-precision three-dimensional model.

The method comprises two steps of TSDF-Spalit point cloud fusion and high-precision triangular mesh modeling.

As shown in fig. 5, the research idea of the joint observation constraint framework based on object-level features in the present invention specifically includes the following steps:

1) and (5) constructing a hardware system. Firstly, hardware of the system is constructed, the hardware comprises camera angle setting, camera calibration, rotating pan-tilt control and designing of an efficient drawing mode based on the cooperation of an ARM development board and a common PC, and three cameras are used for simultaneously acquiring data to obtain an RGBD image sequence.

2) Constructing a local map, namely generating a local three-dimensional model based on ARM platform calculation and single acquisition point;

the method comprises the steps of data preprocessing, observation model definition and incremental optimization.

For local map building, an observation model conforming to a hardware configuration is defined according to the specific definition of multiple cameras. The scanning pose is used to replace the single camera pose, and the observation model with the scanning pose as a variable is defined in detail. The partial derivatives of the relevant variables are solved for example with the observation data. In the optimization stage, an incremental optimization model SLAM + + is adopted to optimize the scanning attitude and the space point coordinates, and the constraint of the capability equation is defined as follows:

wherein,

is a data acquisition pose constraint that is,

and

the weights corresponding to the two;

3) constructing a global map, namely calculating based on a PC (personal computer) and combining based on a plurality of acquisition points to generate a global three-dimensional model;

the method comprises the steps of similarity retrieval, wide baseline matching and camera pose optimization.

Similarity retrieval based on a visual model is established, and a GMS feature matching method is adopted to solve the problem of wide baseline matching between local maps. After the observation information between the local maps is obtained, G2O is adopted for optimization, and the accurate position and posture information of the local maps under the global coordinate system is obtained.

4) And constructing a panoramic roaming system. Constructing an indoor panoramic roaming system based on the position posture information of the local map and the global map;

the method comprises panoramic image generation, positioning information and web-based indoor roaming system construction.

The invention has the novel points that the texture image information is utilized to estimate the rotation matrix, thereby realizing the separation of the rotation matrix and the depth noise; a new plane parameterization method is developed based on manifold space, and the problems of over-parameterization and local minimum of a three-dimensional reconstruction algorithm of the traditional point characteristics are solved; expanding the available information to object-level features by using a deep neural network detection technology; the ORB-SLAM2 is used as a point tracking module, and the situation that the traditional bundle adjustment model only minimizes the reprojection error is expanded.

The present invention provides a method and a concept for reconstructing an indoor scene combined with body level feature constraints, and a method and a way for implementing the technical solution are many, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. An indoor scene reconstruction method combining body-level feature constraints is characterized by comprising the following steps:

and 4, by taking the ORB-SLAM2 as a point tracking module, expanding the situation that the adjustment model of the light beam method only minimizes the reprojection error according to the establishment of a combined observation constraint framework containing point characteristics, object outsourcing rectangle information and plane information, and integrating to form the indoor scene construction combined with body-level characteristic constraint.

2. The method of claim 1, wherein the camera pose decoupling estimation algorithm in step 1 comprises the following steps:

step 1-1, establishing a multi-scale window Gaussian mixture uncertain model;

step 1-2, establishing a new uncertainty model;

and 1-3, establishing an attitude decoupling estimation algorithm.

3. The method of claim 2, wherein the multi-scale window Gaussian mixture uncertainty model is based on quantitative analysis of depth observation uncertainty distribution, and combines independent pixel uncertainty and depth sensor error distribution characteristics to incorporate depth observation uncertainty in different window neighborhoods into the multi-scale window Gaussian mixture uncertainty model by Gaussian mixture.

4. The method of claim 3, wherein the pose decoupling estimation algorithm in steps 1-3 comprises the following steps:

in the above formula, the first and second carbon atoms are,

three-dimensional coordinates representing a rotation matrix, a feature point

And the pose of the camera

In order for the variables to be optimized,

and

is composed ofuvA covariance matrix of the observations.

5. The method for reconstructing indoor scene combined with body-level feature constraint according to claim 4, wherein the local subgraph fusion algorithm based on point and plane features in step 2 comprises the following steps:

step 2-1, optimizing the attitude precision of the camera;

and 2-2, generating a local subgraph.

6. The method of claim 5, wherein the indoor scene reconstruction method is based on a combination of the body-level feature constraints: the fusion objective function generated by the local subgraph in the step 2-2 is as follows:

wherein,

and

in order to be a matrix of rotations,

and

and

is thatlAndl+1on the frameiThe set of normal vectors is then used to generate,

is the number of image blocks corresponding between two framesThe energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block;

7. The method of claim 6, wherein the algorithm for reconstructing indoor scene with combination body-level feature constraint in step 3 comprises the following steps:

step 3-1, local adjustment optimization;

step 3-2, performing multi-constraint condition joint optimization;

8. The method for reconstructing indoor scene combined with body-level feature constraint according to claim 7, wherein the local adjustment optimization in step 3-1 adopts two adjustment methods: firstly, executing map point invariance in a tracking thread to optimize adjustment of a camera attitude variable; the second performs local window adjustment when new key frames are added.

9. The method for reconstructing an indoor scene with combination of body-level feature constraints according to claim 8, wherein the multi-constraint joint optimization in step 3-2 adopts two types of constraints of plane features and object-level boundary features; extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints, and simultaneously including object-level feature information and plane feature constraints in joint optimization, wherein an optimization equation is as follows:

in the above formula, the rotation matrix of the camera

And the plane vector

In order for the variables to be unknown,

as a special orthogonal group, as part of a lie group,

is a point in the map that is,

is that

And (4) observing.

10. The method for reconstructing an indoor scene based on the combination of the object-level feature constraints as claimed in claim 9, wherein the joint observation constraint framework based on the object-level features in step 4 specifically includes:

step 4-1, system construction;

step 4-2, constructing a local map;

4-3, constructing a global map;

wherein,

is a data acquisition pose constraint that is,

and

the weights corresponding to the two;