CN114037804A - Indoor scene reconstruction method combining body-level feature constraints - Google Patents
Indoor scene reconstruction method combining body-level feature constraints Download PDFInfo
- Publication number
- CN114037804A CN114037804A CN202210030559.3A CN202210030559A CN114037804A CN 114037804 A CN114037804 A CN 114037804A CN 202210030559 A CN202210030559 A CN 202210030559A CN 114037804 A CN114037804 A CN 114037804A
- Authority
- CN
- China
- Prior art keywords
- point
- local
- constraint
- information
- plane
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 84
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 54
- 239000011159 matrix material Substances 0.000 claims abstract description 35
- 230000004927 fusion Effects 0.000 claims abstract description 26
- 238000010276 construction Methods 0.000 claims abstract description 12
- 238000001514 detection method Methods 0.000 claims abstract description 10
- 238000005516 engineering process Methods 0.000 claims abstract description 9
- 238000012946 outsourcing Methods 0.000 claims abstract description 8
- 238000013528 artificial neural network Methods 0.000 claims abstract description 7
- 238000000926 separation method Methods 0.000 claims abstract description 6
- 238000005457 optimization Methods 0.000 claims description 39
- 239000013598 vector Substances 0.000 claims description 16
- 239000000203 mixture Substances 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 10
- 230000000007 visual effect Effects 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 4
- 238000007500 overflow downdraw method Methods 0.000 claims description 3
- 238000004445 quantitative analysis Methods 0.000 claims description 3
- 125000004432 carbon atom Chemical group C* 0.000 claims description 2
- 238000001914 filtration Methods 0.000 description 8
- 238000011160 research Methods 0.000 description 8
- 238000007781 pre-processing Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 3
- 230000008439 repair process Effects 0.000 description 3
- 208000009115 Anorectal Malformations Diseases 0.000 description 2
- 230000002146 bilateral effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 244000286893 Aspalathus contaminatus Species 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000003780 insertion Methods 0.000 description 1
- 230000037431 insertion Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 230000000153 supplemental effect Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T17/00—Three dimensional [3D] modelling, e.g. data description of 3D objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computer Graphics (AREA)
- Geometry (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
The invention provides an indoor scene reconstruction method combining body-level feature constraint, which comprises the following steps of: estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm to realize the separation of the rotation matrix and depth noise; according to a local subgraph fusion algorithm based on point and plane features and based on manifold space, a new plane parameterization method is developed to solve the problems of over-parameterization and local minimum; according to a combination body-level feature constraint algorithm, further utilizing a deep neural network detection technology to expand available information to object-level features; and (3) by taking the ORB-SLAM2 as a point tracking module, expanding the situation that the traditional bundle adjustment model only minimizes the reprojection error according to the establishment of a combined observation constraint framework containing point characteristics, object outsourcing rectangle information and plane information, and integrating to form the rapid construction capability of the indoor scene combined with the body-level characteristic constraint.
Description
Technical Field
The invention relates to an indoor scene reconstruction method, in particular to an indoor scene reconstruction method combining body-level feature constraint.
Background
The filtering aspect of the depth image mainly comprises the steps of removing artifacts, smoothing similar areas, deleting outliers and improving the boundary precision of the depth image. At the position of discontinuous depth, the depth map is blurred, and the bilateral filtering method is a common method for edge-preserving and denoising in the aspect of image processing. Kopf et al use joint bilateral filtering to reduce noise. Mueller et al propose a self-adaptive combined trilateral filtering algorithm, which considers the confidence metric problem of selecting a stereo matching algorithm in the aspect of filtering weight and combines a stereo image to filter a depth image. But with a slight deficiency that they do not take into account missing depth data nor the temporal correlation of the Kinect to acquire depth maps.
Depth map hole filling is a very challenging task, especially in disparity maps generated based on stereoscopic systems. Due to the existence of occlusion, an area of unknown disparity value will always be generated, resulting in the generation of a hole. To reduce the impact of the occlusion problem, Deng et al propose a stereo matching algorithm under an energy minimization framework. Wan et al use a two-view geometry to fill in the missing depth information. To generate the content of 3DTV (three-dimensional television), a similar problem is posed in the application of depth map rendering: from one depth map, multiple virtual views are generated to facilitate the repair of the generated holes. In some image restoration algorithms, the missing information is recovered by the supplemental information generated by the two disparity maps. One of the main drawbacks of the above-mentioned hole repairing methods is that these methods are related to stereoscopic vision or multi-view vision, and the lost information is inferred by combining images (visual information) of different viewpoints in the same scene. For the research of the hole repairing method with the RGB-D image, the prior method cannot be directly adopted because the depth sensor acquires both depth information and visual information. Due to the recently published depth sensor-based related papers, mainly focusing on specific application development like animation, motion sensing games, etc., and less research on noise removal. Matyunine et al propose a special depth map denoising method based on spatio-temporal median filtering, which is implemented by motion vector information. Hole repair is achieved by median filtering, which considers only the similarity pixels in the color domain. In addition, similar to the method described above, Patel et al construct a 3D object data set for identification purposes, with a 5 x 5 recursive median filter on one scale for hole repair. Therefore, the above method has problems that: the performance exhibited by the median filtering based method is rather limited since no visual information is taken into account, and the inserted new depth values depend only on spatial position, which will result in an inaccurate depth map being generated. Therefore, the research of more effective denoising and cavity repairing algorithms is a key point for realizing high-efficiency and high-quality three-dimensional reconstruction effects.
Currently, many scholars have conducted intensive research into increasing the resolution of depth maps for depth images captured by TOF cameras. For enhancing the resolution of depth maps produced by depth scanners or TOF cameras, there are mainly interpolation-based methods or graphics-based methods; under certain assumptions Garcia proposes a mrf (markov random field) -based method, which is based on certain assumptions: depth discontinuities in a scene vary with the intensity and brightness of the scene, i.e., within a neighborhood, similar region intensities correspond to similar depths. However, this is done on-the-fly, which tends to cause over-smoothing of the solution. However, these sensors are built on the TOF measurement principle. Patra1et et al propose a method of improving resolution that makes full use of high resolution information obtained from RGB images and HD (high precision) camera video images, but does not take depth images into account and is essentially indistinguishable from three-dimensional reconstruction using several two-dimensional images.
In terms of camera calibration, multi-view geometry (multi-view geometry) theory is a concern. The method relates to several key technologies of camera self-calibration, camera pose determination and the like, the contribution in computer vision is more and more prominent, and the method is more practical and effective in solving the practical application problem, so that the direction may bring a new idea for camera calibration.
The most prominent approach in three-dimensional reconstruction is to use hand-designed descriptors based on orientation histograms, like SIFT and SURF, which are widely applied to a bow (bag of words) model and successfully applied to many different identification problems. However, these descriptors tend to lose some useful information. For example, the initial data provided typically contains much information, such as color, texture, etc., whereas in practice only gray scale information is used, and the rest of the information is not incorporated. This constraint prevents the performance of practical applications that make use of these descriptors. In order for these feature descriptors to contain additional, more global information, researchers have proposed new algorithms, such as color histograms. The Besl and McKay propose an iterative closest point ICP algorithm, and a new idea of three-dimensional reconstruction is opened. Of course, the classical algorithm has its own limitations, and in fact, the method is more suitable for fine tuning situations. Magnusson et al hope to achieve three-dimensional reconstruction by performing coordinate transformations by geometric features, but these three-dimensional data are scattered and the triangular meshes describing these data are not well mastered, so that the experimental results are less than ideal. Recently, for the registration problem of RGB-D images, Henry et al registered the extracted textured surface patches into the model using the ICP algorithm. Meanwhile, a graph-optimization (graph-optimization) method is adopted to improve the mapping precision. Engelhard et al matched the SURF (speeded Up Robust feature) features between RGB-D image frames using the ICP algorithm and optimized the registration estimates. However, these methods do not seamlessly integrate depth information and two-dimensional image information together at the time of data stitching. The three-dimensional reconstruction of domestic research relates to the RGB-D directions and mainly comprises the following steps: liuxin and the like of Chinese academy of sciences provide a rapid object reconstruction method based on a Kinect depth sensor with higher robustness, solve the problem of point cloud registration error, stably obtain a more ideal three-dimensional reconstruction model and obtain a good reconstruction effect on the three-dimensional reconstruction of an object with shielding. If the robot needs to accurately determine the position of an object, six degrees of freedom need to be eliminated, and the Rooibos and the like of the Qinghua university propose an incremental parameterized model by using a Kinect sensor, so that the problem of positioning of the indoor robot is solved, and motion estimation of the six degrees of freedom is realized. Royuan et al of Chongqing post and telecommunications university use the Kinect depth sensor for gesture recognition and convert it into control commands to implement various operations on images. The people of the large opening of Chinese science and technology, and the like, put forward a brand-new virtual-real fusion framework based on depth information by means of Kinect Sensor hardware equipment, and realize visual consistency with high precision. Wangningbo et al at university of Zhejiang combines RGB-D images to perform pedestrian detection. The 3D indoor environment is subjected to map reconstruction by the aid of a depth sensor by the aid of Zhuxiao and other people of Shanghai traffic university, and a good reconstruction effect is achieved.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing an indoor scene reconstruction method combining body-level feature constraints aiming at the defects of the prior art.
In order to solve the technical problem, the invention discloses an indoor scene reconstruction method combining body-level feature constraint, which comprises the following steps of:
step 1, estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm, so as to realize separation of the rotation matrix and depth noise;
step 2, constructing an optimized plane parameterization method based on manifold space according to a local subgraph fusion algorithm based on point and plane characteristics;
step 3, expanding available information to object-level features by utilizing a deep neural network detection technology according to a combination object-level feature constraint algorithm;
and step 4, integrating to form the indoor scene construction constrained by the combination body level characteristics by taking ORB-SLAM2 (reference: Ra l Mur-Artal, and Juan D. Tard is Yu s. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D cameras. Arxiv preprintiv: 1610.06475, 2016) as a point tracking module according to the condition of establishing a combined observation constraint frame containing point characteristics, object outsourcing rectangle information and plane information and expanding a beam method adjustment model to only minimize the reprojection errors.
The camera attitude decoupling estimation algorithm in the step 1 of the invention comprises the following steps:
step 1-1, establishing a multi-scale window Gaussian mixture uncertain model;
step 1-2, establishing a new uncertainty model;
and 1-3, establishing an attitude decoupling estimation algorithm.
The multi-scale window Gaussian mixture uncertainty model is based on quantitative analysis of depth observation uncertainty distribution, independent pixel uncertainty and depth sensor error distribution characteristics are combined, and depth observation uncertainty in different window neighborhoods is brought into the multi-scale window Gaussian mixture uncertainty model by Gaussian mixture.
The attitude decoupling estimation algorithm in the steps 1-3 of the invention comprises the following steps:
step 1-3-1, separating the estimation of a rotation matrix from the noise interference of depth values by using visual characteristic point information in a texture image, and estimating the rotation matrix on the basis;
step 1-3-2, estimating a translation vector, and converting optimization of the translation vector into a corresponding energy function by combining observation information in a depth image, wherein the method comprises the following steps:
in the above formula, the first and second carbon atoms are, three-dimensional coordinates representing a rotation matrix, a feature pointAnd the pose of the cameraIn order for the variables to be optimized,andis the pixel value of the projected point of the feature point on the left and right texture images,mthe logarithm of feature points representing a match between two frames,Kis an internal orientation element of the camera,uandva column number and a line number representing the picture,is composed ofuvA covariance matrix of the observations.
The local subgraph fusion algorithm based on the point and plane features in the step 2 of the invention comprises the following steps:
step 2-1, optimizing the attitude precision of the camera;
and 2-2, generating a local subgraph.
The fusion objective function generated by the local subgraph in step 2-2 of the invention is as follows:
wherein,andin order to be a matrix of rotations,andis the firstiThe anchor points of the image blocks to which the group corresponds,andis thatlAndl+1on the frameiThe set of normal vectors is then used to generate,is the number of corresponding image blocks between two frames, the energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block;
generating a local subgraph by using points, image blocks and planes as observation information sources based on a plane parameterization method; the global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs is realized by minimizing a fusion objective function; the incremental association mode realizes three-dimensional online reconstruction by introducing a plane data association method.
The combination body-level feature constraint algorithm in the step 3 of the invention comprises the following steps:
step 3-1, local adjustment optimization;
step 3-2, performing multi-constraint condition joint optimization;
and 3-3, denoising and fusing the point cloud by adopting a point cloud fusion method based on a truncation distance function, and extracting a three-dimensional model.
The local adjustment optimization in the step 3-1 of the invention adopts two adjustment methods: firstly, executing map point invariance in a tracking thread to optimize adjustment of a camera attitude variable; the second performs local window adjustment when new key frames are added.
In the step 3-2 of the invention, the multi-constraint condition joint optimization adopts two types of constraint conditions of plane features and object-level boundary features; extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints, and simultaneously including object-level feature information and plane feature constraints in joint optimization, wherein an optimization equation is as follows:
in the above formula, the rotation matrix of the cameraAnd the plane vectorIn order for the variables to be unknown,as a special orthogonal group, as part of a lie group,is a corresponding covariance matrix, is associated with the feature point scale,is a point in the map that is,is thatAnd (4) observing.
The joint observation constraint framework based on the object-level features in the step 4 of the invention specifically comprises the following steps:
step 4-1, system construction;
step 4-2, constructing a local map;
4-3, constructing a global map;
in step 4-2, the local map construction adopts an observation model with the scanning posture as a variable, an incremental optimization model SLAM + + is adopted to optimize the scanning posture and the space point coordinates, and the constraint of the capability equation is defined as follows:
wherein,is that the global three-dimensional space point is converted into a local coordinate system, and then the pixel error projected on the image,is a data acquisition pose constraint that is,andthe weights corresponding to the two;
in step 4-3, the global map is constructed by firstly adopting similarity retrieval based on a visual model, secondly adopting a GMS feature matching method to solve wide baseline matching between local maps and finally adopting a general map optimization method to optimize so as to obtain accurate position and attitude information of the local maps under a global coordinate system.
The implementation of the invention is based on the following principle:
and researching a camera attitude decoupling estimation algorithm, and decoupling the rotation estimation and the absolute scale estimation. The algorithm principle is as follows: firstly, a rotation matrix is estimated by means of texture image information, and the influence of depth observation value noise on the rotation matrix is reduced; secondly, the estimation of the absolute translation component and the rotation matrix is separated, and linear and nonlinear factors are separately processed, so that the speed and the stability of the algorithm are improved; and thirdly, the estimation of the absolute translation component uses all observed values, and not only sparse feature points.
The method is characterized by researching a local sub-graph fusion algorithm based on point and plane features, wherein the algorithm principle is to research a traditional sub-graph fusion algorithm based on point features and extend the algorithm to the plane features. The plane features are less affected by noise of the depth observation value, and the plane can provide better structural information in the process of recovering the global three-dimensional structure so as to help the algorithm to converge to the global optimal value. Aiming at the problem that plane features have no effective descriptors, an incremental plane data association method is researched by utilizing a local sub-graph recoverable covariance matrix.
Researching a combination body-level feature constraint algorithm, wherein the algorithm principle is that available information is expanded to object-level features on each key frame by using a deep neural network detection technology;
based on an ORB-SLAM2 as a point tracking module, a combined observation constraint framework containing point features, object outsourcing rectangle information and plane information is researched and established. The principle is that the situation that a traditional beam adjustment model only minimizes a reprojection error is expanded, and an iterative algorithm is rapidly guided to converge to an optimal solution, so that the indoor scene rapid construction capability combined with body-level feature constraint is better integrated and formed.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: estimating a rotation matrix by using texture image information to realize the separation of the rotation matrix and depth noise; a new plane parameterization method is developed based on manifold space, and the problems of over-parameterization and local minimum of a three-dimensional reconstruction algorithm of the traditional point characteristics are solved; expanding the available information to object-level features by using a deep neural network detection technology; the ORB-SLAM2 is used as a point tracking module, and the situation that the traditional bundle adjustment model only minimizes the reprojection error is expanded.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Fig. 1 is a general technical schematic of the present invention.
FIG. 2 is a schematic diagram of a camera pose decoupling estimation algorithm framework of the present invention.
FIG. 3 is a frame diagram of a local subgraph fusion algorithm based on point and plane features according to the invention.
FIG. 4 is a block diagram of the combination body-level feature constraint algorithm framework of the present invention.
FIG. 5 is a schematic diagram of the concept of the joint observation constraint framework research based on object-level features.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention relates to an indoor scene reconstruction method combined with body-level feature constraint, as shown in fig. 1, the invention comprises the following steps:
1) researching a camera attitude decoupling estimation algorithm, estimating a rotation matrix by using texture image information, and realizing the separation of the rotation matrix and depth noise;
the method comprises the steps of RGBD characteristic preprocessing result, joint optimization and camera attitude;
wherein the joint optimization includes decoupled camera rotation and feature point constraints and decoupled camera translation and feature point constraints.
2) Researching a local subgraph fusion algorithm based on point and plane features, developing a new plane parameterization method based on manifold space, and solving the problems of over-parameterization and local minimum;
the method comprises the steps of selecting a key frame, generating a local subgraph, forming a three-dimensional point cloud by a global model and accurate posture information;
the local sub-image generation comprises the steps of image block extraction, plane parameterization, local attitude solution and key frame selection;
the global model comprises feature point association, plane data association, observation model definition and fusion algorithm.
3) Researching a combination body-level feature constraint algorithm, and further expanding available information to object-level features by utilizing a deep neural network detection technology;
the method comprises the steps of image sequence, ORG-SLAM2 point tracking and local optimization, object detection, plane feature constraint, object-level boundary constraint, joint optimization, point cloud fusion and high-precision point cloud.
4) The ORB-SLAM2 is used as a point tracking module, a combined observation constraint framework containing point features, object outsourcing rectangle information and plane information is researched and established, the situation that a traditional beam method adjustment model only minimizes reprojection errors is expanded, and the indoor scene rapid construction capability combined with body-level feature constraint is formed in an integrated mode.
The invention aims at the scene of fast construction of indoor scenes combined with body-level feature constraints. Based on the image source information, a virtual model is created for the indoor multi-dimensional space entity object in a digital mode, and all characteristics of the indoor space target are accurately shown through the simulation model.
As shown in fig. 2, the camera pose decoupling estimation algorithm framework of the present invention specifically includes the following steps:
1) and establishing a multi-scale window Gaussian mixture uncertain model. Based on the quantitative analysis of the depth observation uncertainty distribution, a multi-scale window Gaussian mixture uncertainty model is researched and provided by combining the uncertainty of the independent pixel and the error distribution characteristic of the depth sensor, and the depth observation uncertainties in different window neighborhoods are taken into consideration by utilizing Gaussian mixture.
2) And establishing a new uncertainty model. The distribution description which is more real for the depth observation can be generated under the condition of more complex environment, and the observation weight can be measured more finely in optimization.
3) Aiming at the problem that the traditional characteristic points are greatly influenced by depth noise, an attitude decoupling estimation algorithm is provided;
the method comprises the steps of image acquisition through a camera, RGB feature matching, preprocessing, joint optimization and accurate posture information acquisition.
The preprocessing comprises selecting key frames, extracting planes and associating the planes with current feature points;
the joint optimization comprises the steps of decoupling camera rotation and feature point constraint, decoupling camera translation and feature point constraint and feature point and plane constraint.
Separating the estimation of the rotation matrix from the noise interference of the depth value by using the visual characteristic point information in the texture image, and estimating the rotation matrix on the basis; and (4) translation vector estimation, namely, combining observation information in the depth image to convert the optimization of the translation vector into a corresponding energy function.
As shown in fig. 3, the local subgraph fusion algorithm framework based on point and plane features in the present invention specifically includes the following steps:
1) and optimizing the accuracy of the camera posture. The wide existence of plane characteristics in an indoor environment is reasonably utilized, and the camera attitude estimation precision is improved;
2) generating a local subgraph;
the method comprises the steps of obtaining an RGBD image sequence from a camera, solving a local attitude, obtaining a global model and obtaining accurate camera attitude information;
the local attitude solving comprises key frame selection, image block extraction, plane parameterization and local attitude solving;
the global model comprises plane data association, feature point data association, observation model definition and fusion algorithm.
A plane parameterization method is provided based on a lie group space, and in the step of local subgraph generation, points, image blocks and planes are used as observation information sources. The image blocks and planes are more robust to depth noise, so that more accurate and robust local subgraphs are generated. The global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs can be realized by minimizing the following fusion objective function.
Wherein,andin order to be a matrix of rotations,andis the firstiThe anchor points of the image blocks to which the group corresponds,andis thatlAndl+1on the frameiThe set of normal vectors is then used to generate,is the number of corresponding image blocks between two frames and the energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block.
The incremental association mode realizes efficient three-dimensional online reconstruction by introducing a plane data association method.
As shown in fig. 4, the combination body-level feature constraint algorithm framework of the present invention specifically includes the following steps:
1) and (6) local adjustment optimization. The front end of ORB-SLAM2 is used as a feature point to participate in local optimization. Two special adjustment techniques are proposed in local beam adjustment: tracking, local maps;
a) and executing adjustment of map points and optimizing camera attitude variables in the tracking thread.
The method comprises data preprocessing and attitude estimation. Tracking the local map and the new associated frame several steps.
b) Local window adjustment is performed when new key frames are added.
The method comprises the steps of key frame insertion, map point management, new map construction, local beam adjustment and local key frame management.
2) And (4) performing multi-constraint joint optimization. Two types of constraint conditions of plane features and object-level boundary features are adopted. And extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints.
The method comprises the steps of using an object detector SSD to perform object detection, plane feature constraint, object-level boundary constraint and joint optimization.
Simultaneously, the object-level feature information and the plane feature constraint are included in the joint optimization, and the optimization equation is as follows:
in the above formula, the rotation matrix of the cameraAnd the plane vectorIn order for the variables to be unknown,as a special orthogonal group, as part of a lie group,is a corresponding covariance matrix, is associated with the feature point scale,is a point in the map that is,is thatAnd (4) observing.
3) And denoising and fusing the point cloud by adopting a TSDF point cloud fusion method, and further extracting a high-precision three-dimensional model.
The method comprises two steps of TSDF-Spalit point cloud fusion and high-precision triangular mesh modeling.
As shown in fig. 5, the research idea of the joint observation constraint framework based on object-level features in the present invention specifically includes the following steps:
1) and (5) constructing a hardware system. Firstly, hardware of the system is constructed, the hardware comprises camera angle setting, camera calibration, rotating pan-tilt control and designing of an efficient drawing mode based on the cooperation of an ARM development board and a common PC, and three cameras are used for simultaneously acquiring data to obtain an RGBD image sequence.
2) Constructing a local map, namely generating a local three-dimensional model based on ARM platform calculation and single acquisition point;
the method comprises the steps of data preprocessing, observation model definition and incremental optimization.
For local map building, an observation model conforming to a hardware configuration is defined according to the specific definition of multiple cameras. The scanning pose is used to replace the single camera pose, and the observation model with the scanning pose as a variable is defined in detail. The partial derivatives of the relevant variables are solved for example with the observation data. In the optimization stage, an incremental optimization model SLAM + + is adopted to optimize the scanning attitude and the space point coordinates, and the constraint of the capability equation is defined as follows:
wherein,is that the global three-dimensional space point is converted into a local coordinate system, and then the pixel error projected on the image,is a data acquisition pose constraint that is,andthe weights corresponding to the two;
3) constructing a global map, namely calculating based on a PC (personal computer) and combining based on a plurality of acquisition points to generate a global three-dimensional model;
the method comprises the steps of similarity retrieval, wide baseline matching and camera pose optimization.
Similarity retrieval based on a visual model is established, and a GMS feature matching method is adopted to solve the problem of wide baseline matching between local maps. After the observation information between the local maps is obtained, G2O is adopted for optimization, and the accurate position and posture information of the local maps under the global coordinate system is obtained.
4) And constructing a panoramic roaming system. Constructing an indoor panoramic roaming system based on the position posture information of the local map and the global map;
the method comprises panoramic image generation, positioning information and web-based indoor roaming system construction.
The invention has the novel points that the texture image information is utilized to estimate the rotation matrix, thereby realizing the separation of the rotation matrix and the depth noise; a new plane parameterization method is developed based on manifold space, and the problems of over-parameterization and local minimum of a three-dimensional reconstruction algorithm of the traditional point characteristics are solved; expanding the available information to object-level features by using a deep neural network detection technology; the ORB-SLAM2 is used as a point tracking module, and the situation that the traditional bundle adjustment model only minimizes the reprojection error is expanded.
The present invention provides a method and a concept for reconstructing an indoor scene combined with body level feature constraints, and a method and a way for implementing the technical solution are many, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (10)
1. An indoor scene reconstruction method combining body-level feature constraints is characterized by comprising the following steps:
step 1, estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm, so as to realize separation of the rotation matrix and depth noise;
step 2, constructing an optimized plane parameterization method based on manifold space according to a local subgraph fusion algorithm based on point and plane characteristics;
step 3, expanding available information to object-level features by utilizing a deep neural network detection technology according to a combination object-level feature constraint algorithm;
and 4, by taking the ORB-SLAM2 as a point tracking module, expanding the situation that the adjustment model of the light beam method only minimizes the reprojection error according to the establishment of a combined observation constraint framework containing point characteristics, object outsourcing rectangle information and plane information, and integrating to form the indoor scene construction combined with body-level characteristic constraint.
2. The method of claim 1, wherein the camera pose decoupling estimation algorithm in step 1 comprises the following steps:
step 1-1, establishing a multi-scale window Gaussian mixture uncertain model;
step 1-2, establishing a new uncertainty model;
and 1-3, establishing an attitude decoupling estimation algorithm.
3. The method of claim 2, wherein the multi-scale window Gaussian mixture uncertainty model is based on quantitative analysis of depth observation uncertainty distribution, and combines independent pixel uncertainty and depth sensor error distribution characteristics to incorporate depth observation uncertainty in different window neighborhoods into the multi-scale window Gaussian mixture uncertainty model by Gaussian mixture.
4. The method of claim 3, wherein the pose decoupling estimation algorithm in steps 1-3 comprises the following steps:
step 1-3-1, separating the estimation of a rotation matrix from the noise interference of depth values by using visual characteristic point information in a texture image, and estimating the rotation matrix on the basis;
step 1-3-2, estimating a translation vector, and converting optimization of the translation vector into a corresponding energy function by combining observation information in a depth image, wherein the method comprises the following steps:
in the above formula, the first and second carbon atoms are,three-dimensional coordinates representing a rotation matrix, a feature pointAnd the pose of the cameraIn order for the variables to be optimized,andis the pixel value of the projected point of the feature point on the left and right texture images,mthe logarithm of feature points representing a match between two frames,Kis an internal orientation element of the camera,uandva column number and a line number representing the picture,is composed ofuvA covariance matrix of the observations.
5. The method for reconstructing indoor scene combined with body-level feature constraint according to claim 4, wherein the local subgraph fusion algorithm based on point and plane features in step 2 comprises the following steps:
step 2-1, optimizing the attitude precision of the camera;
and 2-2, generating a local subgraph.
6. The method of claim 5, wherein the indoor scene reconstruction method is based on a combination of the body-level feature constraints: the fusion objective function generated by the local subgraph in the step 2-2 is as follows:
wherein,andin order to be a matrix of rotations,andis the firstiThe anchor points of the image blocks to which the group corresponds,andis thatlAndl+1on the frameiThe set of normal vectors is then used to generate,is the number of image blocks corresponding between two framesThe energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block;
generating a local subgraph by using points, image blocks and planes as observation information sources based on a plane parameterization method; the global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs is realized by minimizing a fusion objective function; the incremental association mode realizes three-dimensional online reconstruction by introducing a plane data association method.
7. The method of claim 6, wherein the algorithm for reconstructing indoor scene with combination body-level feature constraint in step 3 comprises the following steps:
step 3-1, local adjustment optimization;
step 3-2, performing multi-constraint condition joint optimization;
and 3-3, denoising and fusing the point cloud by adopting a point cloud fusion method based on a truncation distance function, and extracting a three-dimensional model.
8. The method for reconstructing indoor scene combined with body-level feature constraint according to claim 7, wherein the local adjustment optimization in step 3-1 adopts two adjustment methods: firstly, executing map point invariance in a tracking thread to optimize adjustment of a camera attitude variable; the second performs local window adjustment when new key frames are added.
9. The method for reconstructing an indoor scene with combination of body-level feature constraints according to claim 8, wherein the multi-constraint joint optimization in step 3-2 adopts two types of constraints of plane features and object-level boundary features; extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints, and simultaneously including object-level feature information and plane feature constraints in joint optimization, wherein an optimization equation is as follows:
in the above formula, the rotation matrix of the cameraAnd the plane vectorIn order for the variables to be unknown,as a special orthogonal group, as part of a lie group,is a corresponding covariance matrix, is associated with the feature point scale,is a point in the map that is,is thatAnd (4) observing.
10. The method for reconstructing an indoor scene based on the combination of the object-level feature constraints as claimed in claim 9, wherein the joint observation constraint framework based on the object-level features in step 4 specifically includes:
step 4-1, system construction;
step 4-2, constructing a local map;
4-3, constructing a global map;
in step 4-2, the local map construction adopts an observation model with the scanning posture as a variable, an incremental optimization model SLAM + + is adopted to optimize the scanning posture and the space point coordinates, and the constraint of the capability equation is defined as follows:
wherein,is that the global three-dimensional space point is converted into a local coordinate system, and then the pixel error projected on the image,is a data acquisition pose constraint that is,andthe weights corresponding to the two;
in step 4-3, the global map is constructed by firstly adopting similarity retrieval based on a visual model, secondly adopting a GMS feature matching method to solve wide baseline matching between local maps and finally adopting a general map optimization method to optimize so as to obtain accurate position and attitude information of the local maps under a global coordinate system.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210030559.3A CN114037804A (en) | 2022-01-12 | 2022-01-12 | Indoor scene reconstruction method combining body-level feature constraints |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210030559.3A CN114037804A (en) | 2022-01-12 | 2022-01-12 | Indoor scene reconstruction method combining body-level feature constraints |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114037804A true CN114037804A (en) | 2022-02-11 |
Family
ID=80141575
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210030559.3A Pending CN114037804A (en) | 2022-01-12 | 2022-01-12 | Indoor scene reconstruction method combining body-level feature constraints |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114037804A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060191333A1 (en) * | 2003-04-18 | 2006-08-31 | Noe Stephen A | Runoff rain gauge |
CN108564616A (en) * | 2018-03-15 | 2018-09-21 | 中国科学院自动化研究所 | Method for reconstructing three-dimensional scene in the rooms RGB-D of fast robust |
-
2022
- 2022-01-12 CN CN202210030559.3A patent/CN114037804A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060191333A1 (en) * | 2003-04-18 | 2006-08-31 | Noe Stephen A | Runoff rain gauge |
CN108564616A (en) * | 2018-03-15 | 2018-09-21 | 中国科学院自动化研究所 | Method for reconstructing three-dimensional scene in the rooms RGB-D of fast robust |
Non-Patent Citations (1)
Title |
---|
王俊: "基于RGB-D相机数据的室内三维重建模型与方法研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112435325B (en) | VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method | |
US11748907B2 (en) | Object pose estimation in visual data | |
CN106651938B (en) | A kind of depth map Enhancement Method merging high-resolution colour picture | |
Liu et al. | Continuous depth estimation for multi-view stereo | |
US7856125B2 (en) | 3D face reconstruction from 2D images | |
CN118212141A (en) | System and method for hybrid depth regularization | |
KR20180054487A (en) | Method and device for processing dvs events | |
US20230419438A1 (en) | Extraction of standardized images from a single-view or multi-view capture | |
CN111063021A (en) | Method and device for establishing three-dimensional reconstruction model of space moving target | |
Yuan et al. | SDV-LOAM: semi-direct visual–LiDAR Odometry and mapping | |
CN110517211B (en) | Image fusion method based on gradient domain mapping | |
CN106791774A (en) | Virtual visual point image generating method based on depth map | |
CN113538569A (en) | Weak texture object pose estimation method and system | |
Ramirez et al. | Open challenges in deep stereo: the booster dataset | |
Xu et al. | Three dimentional reconstruction of large cultural heritage objects based on uav video and tls data | |
CN114782628A (en) | Indoor real-time three-dimensional reconstruction method based on depth camera | |
CN112102504A (en) | Three-dimensional scene and two-dimensional image mixing method based on mixed reality | |
Fu et al. | Image stitching techniques applied to plane or 3-D models: a review | |
Jisen | A study on target recognition algorithm based on 3D point cloud and feature fusion | |
CN114935316B (en) | Standard depth image generation method based on optical tracking and monocular vision | |
Yagi et al. | Diminished reality for privacy protection by hiding pedestrians in motion image sequences using structure from motion | |
CN113674407B (en) | Three-dimensional terrain reconstruction method, device and storage medium based on binocular vision image | |
CN114037804A (en) | Indoor scene reconstruction method combining body-level feature constraints | |
Jäger et al. | A comparative Neural Radiance Field (NeRF) 3D analysis of camera poses from HoloLens trajectories and Structure from Motion | |
CN109089100B (en) | Method for synthesizing binocular stereo video |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20220211 |