CN114037804A - Indoor scene reconstruction method combining body-level feature constraints - Google Patents

Indoor scene reconstruction method combining body-level feature constraints Download PDF

Info

Publication number
CN114037804A
CN114037804A CN202210030559.3A CN202210030559A CN114037804A CN 114037804 A CN114037804 A CN 114037804A CN 202210030559 A CN202210030559 A CN 202210030559A CN 114037804 A CN114037804 A CN 114037804A
Authority
CN
China
Prior art keywords
point
local
constraint
information
plane
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210030559.3A
Other languages
Chinese (zh)
Inventor
韩东
施晓东
时荔蕙
王春龙
孙镱诚
丁阳
乐意
刘延杰
陆中祥
陆萍
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202210030559.3A priority Critical patent/CN114037804A/en
Publication of CN114037804A publication Critical patent/CN114037804A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides an indoor scene reconstruction method combining body-level feature constraint, which comprises the following steps of: estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm to realize the separation of the rotation matrix and depth noise; according to a local subgraph fusion algorithm based on point and plane features and based on manifold space, a new plane parameterization method is developed to solve the problems of over-parameterization and local minimum; according to a combination body-level feature constraint algorithm, further utilizing a deep neural network detection technology to expand available information to object-level features; and (3) by taking the ORB-SLAM2 as a point tracking module, expanding the situation that the traditional bundle adjustment model only minimizes the reprojection error according to the establishment of a combined observation constraint framework containing point characteristics, object outsourcing rectangle information and plane information, and integrating to form the rapid construction capability of the indoor scene combined with the body-level characteristic constraint.

Description

Indoor scene reconstruction method combining body-level feature constraints
Technical Field
The invention relates to an indoor scene reconstruction method, in particular to an indoor scene reconstruction method combining body-level feature constraint.
Background
The filtering aspect of the depth image mainly comprises the steps of removing artifacts, smoothing similar areas, deleting outliers and improving the boundary precision of the depth image. At the position of discontinuous depth, the depth map is blurred, and the bilateral filtering method is a common method for edge-preserving and denoising in the aspect of image processing. Kopf et al use joint bilateral filtering to reduce noise. Mueller et al propose a self-adaptive combined trilateral filtering algorithm, which considers the confidence metric problem of selecting a stereo matching algorithm in the aspect of filtering weight and combines a stereo image to filter a depth image. But with a slight deficiency that they do not take into account missing depth data nor the temporal correlation of the Kinect to acquire depth maps.
Depth map hole filling is a very challenging task, especially in disparity maps generated based on stereoscopic systems. Due to the existence of occlusion, an area of unknown disparity value will always be generated, resulting in the generation of a hole. To reduce the impact of the occlusion problem, Deng et al propose a stereo matching algorithm under an energy minimization framework. Wan et al use a two-view geometry to fill in the missing depth information. To generate the content of 3DTV (three-dimensional television), a similar problem is posed in the application of depth map rendering: from one depth map, multiple virtual views are generated to facilitate the repair of the generated holes. In some image restoration algorithms, the missing information is recovered by the supplemental information generated by the two disparity maps. One of the main drawbacks of the above-mentioned hole repairing methods is that these methods are related to stereoscopic vision or multi-view vision, and the lost information is inferred by combining images (visual information) of different viewpoints in the same scene. For the research of the hole repairing method with the RGB-D image, the prior method cannot be directly adopted because the depth sensor acquires both depth information and visual information. Due to the recently published depth sensor-based related papers, mainly focusing on specific application development like animation, motion sensing games, etc., and less research on noise removal. Matyunine et al propose a special depth map denoising method based on spatio-temporal median filtering, which is implemented by motion vector information. Hole repair is achieved by median filtering, which considers only the similarity pixels in the color domain. In addition, similar to the method described above, Patel et al construct a 3D object data set for identification purposes, with a 5 x 5 recursive median filter on one scale for hole repair. Therefore, the above method has problems that: the performance exhibited by the median filtering based method is rather limited since no visual information is taken into account, and the inserted new depth values depend only on spatial position, which will result in an inaccurate depth map being generated. Therefore, the research of more effective denoising and cavity repairing algorithms is a key point for realizing high-efficiency and high-quality three-dimensional reconstruction effects.
Currently, many scholars have conducted intensive research into increasing the resolution of depth maps for depth images captured by TOF cameras. For enhancing the resolution of depth maps produced by depth scanners or TOF cameras, there are mainly interpolation-based methods or graphics-based methods; under certain assumptions Garcia proposes a mrf (markov random field) -based method, which is based on certain assumptions: depth discontinuities in a scene vary with the intensity and brightness of the scene, i.e., within a neighborhood, similar region intensities correspond to similar depths. However, this is done on-the-fly, which tends to cause over-smoothing of the solution. However, these sensors are built on the TOF measurement principle. Patra1et et al propose a method of improving resolution that makes full use of high resolution information obtained from RGB images and HD (high precision) camera video images, but does not take depth images into account and is essentially indistinguishable from three-dimensional reconstruction using several two-dimensional images.
In terms of camera calibration, multi-view geometry (multi-view geometry) theory is a concern. The method relates to several key technologies of camera self-calibration, camera pose determination and the like, the contribution in computer vision is more and more prominent, and the method is more practical and effective in solving the practical application problem, so that the direction may bring a new idea for camera calibration.
The most prominent approach in three-dimensional reconstruction is to use hand-designed descriptors based on orientation histograms, like SIFT and SURF, which are widely applied to a bow (bag of words) model and successfully applied to many different identification problems. However, these descriptors tend to lose some useful information. For example, the initial data provided typically contains much information, such as color, texture, etc., whereas in practice only gray scale information is used, and the rest of the information is not incorporated. This constraint prevents the performance of practical applications that make use of these descriptors. In order for these feature descriptors to contain additional, more global information, researchers have proposed new algorithms, such as color histograms. The Besl and McKay propose an iterative closest point ICP algorithm, and a new idea of three-dimensional reconstruction is opened. Of course, the classical algorithm has its own limitations, and in fact, the method is more suitable for fine tuning situations. Magnusson et al hope to achieve three-dimensional reconstruction by performing coordinate transformations by geometric features, but these three-dimensional data are scattered and the triangular meshes describing these data are not well mastered, so that the experimental results are less than ideal. Recently, for the registration problem of RGB-D images, Henry et al registered the extracted textured surface patches into the model using the ICP algorithm. Meanwhile, a graph-optimization (graph-optimization) method is adopted to improve the mapping precision. Engelhard et al matched the SURF (speeded Up Robust feature) features between RGB-D image frames using the ICP algorithm and optimized the registration estimates. However, these methods do not seamlessly integrate depth information and two-dimensional image information together at the time of data stitching. The three-dimensional reconstruction of domestic research relates to the RGB-D directions and mainly comprises the following steps: liuxin and the like of Chinese academy of sciences provide a rapid object reconstruction method based on a Kinect depth sensor with higher robustness, solve the problem of point cloud registration error, stably obtain a more ideal three-dimensional reconstruction model and obtain a good reconstruction effect on the three-dimensional reconstruction of an object with shielding. If the robot needs to accurately determine the position of an object, six degrees of freedom need to be eliminated, and the Rooibos and the like of the Qinghua university propose an incremental parameterized model by using a Kinect sensor, so that the problem of positioning of the indoor robot is solved, and motion estimation of the six degrees of freedom is realized. Royuan et al of Chongqing post and telecommunications university use the Kinect depth sensor for gesture recognition and convert it into control commands to implement various operations on images. The people of the large opening of Chinese science and technology, and the like, put forward a brand-new virtual-real fusion framework based on depth information by means of Kinect Sensor hardware equipment, and realize visual consistency with high precision. Wangningbo et al at university of Zhejiang combines RGB-D images to perform pedestrian detection. The 3D indoor environment is subjected to map reconstruction by the aid of a depth sensor by the aid of Zhuxiao and other people of Shanghai traffic university, and a good reconstruction effect is achieved.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of providing an indoor scene reconstruction method combining body-level feature constraints aiming at the defects of the prior art.
In order to solve the technical problem, the invention discloses an indoor scene reconstruction method combining body-level feature constraint, which comprises the following steps of:
step 1, estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm, so as to realize separation of the rotation matrix and depth noise;
step 2, constructing an optimized plane parameterization method based on manifold space according to a local subgraph fusion algorithm based on point and plane characteristics;
step 3, expanding available information to object-level features by utilizing a deep neural network detection technology according to a combination object-level feature constraint algorithm;
and step 4, integrating to form the indoor scene construction constrained by the combination body level characteristics by taking ORB-SLAM2 (reference: Ra l Mur-Artal, and Juan D. Tard is Yu s. ORB-SLAM2: an Open-Source SLAM System for Monocular, Stereo and RGB-D cameras. Arxiv preprintiv: 1610.06475, 2016) as a point tracking module according to the condition of establishing a combined observation constraint frame containing point characteristics, object outsourcing rectangle information and plane information and expanding a beam method adjustment model to only minimize the reprojection errors.
The camera attitude decoupling estimation algorithm in the step 1 of the invention comprises the following steps:
step 1-1, establishing a multi-scale window Gaussian mixture uncertain model;
step 1-2, establishing a new uncertainty model;
and 1-3, establishing an attitude decoupling estimation algorithm.
The multi-scale window Gaussian mixture uncertainty model is based on quantitative analysis of depth observation uncertainty distribution, independent pixel uncertainty and depth sensor error distribution characteristics are combined, and depth observation uncertainty in different window neighborhoods is brought into the multi-scale window Gaussian mixture uncertainty model by Gaussian mixture.
The attitude decoupling estimation algorithm in the steps 1-3 of the invention comprises the following steps:
step 1-3-1, separating the estimation of a rotation matrix from the noise interference of depth values by using visual characteristic point information in a texture image, and estimating the rotation matrix on the basis;
step 1-3-2, estimating a translation vector, and converting optimization of the translation vector into a corresponding energy function by combining observation information in a depth image, wherein the method comprises the following steps:
Figure 686200DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure 21367DEST_PATH_IMAGE002
three-dimensional coordinates representing a rotation matrix, a feature point
Figure 15868DEST_PATH_IMAGE003
And the pose of the camera
Figure 891420DEST_PATH_IMAGE004
In order for the variables to be optimized,
Figure 654976DEST_PATH_IMAGE005
and
Figure 223361DEST_PATH_IMAGE006
is the pixel value of the projected point of the feature point on the left and right texture images,mthe logarithm of feature points representing a match between two frames,Kis an internal orientation element of the camera,uandva column number and a line number representing the picture,
Figure 705158DEST_PATH_IMAGE007
is composed ofuvA covariance matrix of the observations.
The local subgraph fusion algorithm based on the point and plane features in the step 2 of the invention comprises the following steps:
step 2-1, optimizing the attitude precision of the camera;
and 2-2, generating a local subgraph.
The fusion objective function generated by the local subgraph in step 2-2 of the invention is as follows:
Figure 322084DEST_PATH_IMAGE008
Figure 205726DEST_PATH_IMAGE009
wherein,
Figure 679433DEST_PATH_IMAGE010
and
Figure 648526DEST_PATH_IMAGE011
in order to be a matrix of rotations,
Figure 865881DEST_PATH_IMAGE012
and
Figure 869609DEST_PATH_IMAGE013
is the firstiThe anchor points of the image blocks to which the group corresponds,
Figure 248638DEST_PATH_IMAGE014
and
Figure 705027DEST_PATH_IMAGE015
is thatlAndl+1on the frameiThe set of normal vectors is then used to generate,
Figure 460493DEST_PATH_IMAGE016
is the number of corresponding image blocks between two frames, the energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block;
generating a local subgraph by using points, image blocks and planes as observation information sources based on a plane parameterization method; the global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs is realized by minimizing a fusion objective function; the incremental association mode realizes three-dimensional online reconstruction by introducing a plane data association method.
The combination body-level feature constraint algorithm in the step 3 of the invention comprises the following steps:
step 3-1, local adjustment optimization;
step 3-2, performing multi-constraint condition joint optimization;
and 3-3, denoising and fusing the point cloud by adopting a point cloud fusion method based on a truncation distance function, and extracting a three-dimensional model.
The local adjustment optimization in the step 3-1 of the invention adopts two adjustment methods: firstly, executing map point invariance in a tracking thread to optimize adjustment of a camera attitude variable; the second performs local window adjustment when new key frames are added.
In the step 3-2 of the invention, the multi-constraint condition joint optimization adopts two types of constraint conditions of plane features and object-level boundary features; extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints, and simultaneously including object-level feature information and plane feature constraints in joint optimization, wherein an optimization equation is as follows:
Figure 584307DEST_PATH_IMAGE017
in the above formula, the rotation matrix of the camera
Figure 134237DEST_PATH_IMAGE018
And the plane vector
Figure 281184DEST_PATH_IMAGE019
In order for the variables to be unknown,
Figure 840342DEST_PATH_IMAGE018
as a special orthogonal group, as part of a lie group,
Figure 553083DEST_PATH_IMAGE020
is a corresponding covariance matrix, is associated with the feature point scale,
Figure 70652DEST_PATH_IMAGE021
is a point in the map that is,
Figure 236054DEST_PATH_IMAGE022
is that
Figure 130061DEST_PATH_IMAGE021
And (4) observing.
The joint observation constraint framework based on the object-level features in the step 4 of the invention specifically comprises the following steps:
step 4-1, system construction;
step 4-2, constructing a local map;
4-3, constructing a global map;
in step 4-2, the local map construction adopts an observation model with the scanning posture as a variable, an incremental optimization model SLAM + + is adopted to optimize the scanning posture and the space point coordinates, and the constraint of the capability equation is defined as follows:
Figure 962887DEST_PATH_IMAGE023
wherein,
Figure 323462DEST_PATH_IMAGE024
is that the global three-dimensional space point is converted into a local coordinate system, and then the pixel error projected on the image,
Figure 976160DEST_PATH_IMAGE025
is a data acquisition pose constraint that is,
Figure 877120DEST_PATH_IMAGE026
and
Figure 830032DEST_PATH_IMAGE027
the weights corresponding to the two;
in step 4-3, the global map is constructed by firstly adopting similarity retrieval based on a visual model, secondly adopting a GMS feature matching method to solve wide baseline matching between local maps and finally adopting a general map optimization method to optimize so as to obtain accurate position and attitude information of the local maps under a global coordinate system.
The implementation of the invention is based on the following principle:
and researching a camera attitude decoupling estimation algorithm, and decoupling the rotation estimation and the absolute scale estimation. The algorithm principle is as follows: firstly, a rotation matrix is estimated by means of texture image information, and the influence of depth observation value noise on the rotation matrix is reduced; secondly, the estimation of the absolute translation component and the rotation matrix is separated, and linear and nonlinear factors are separately processed, so that the speed and the stability of the algorithm are improved; and thirdly, the estimation of the absolute translation component uses all observed values, and not only sparse feature points.
The method is characterized by researching a local sub-graph fusion algorithm based on point and plane features, wherein the algorithm principle is to research a traditional sub-graph fusion algorithm based on point features and extend the algorithm to the plane features. The plane features are less affected by noise of the depth observation value, and the plane can provide better structural information in the process of recovering the global three-dimensional structure so as to help the algorithm to converge to the global optimal value. Aiming at the problem that plane features have no effective descriptors, an incremental plane data association method is researched by utilizing a local sub-graph recoverable covariance matrix.
Researching a combination body-level feature constraint algorithm, wherein the algorithm principle is that available information is expanded to object-level features on each key frame by using a deep neural network detection technology;
based on an ORB-SLAM2 as a point tracking module, a combined observation constraint framework containing point features, object outsourcing rectangle information and plane information is researched and established. The principle is that the situation that a traditional beam adjustment model only minimizes a reprojection error is expanded, and an iterative algorithm is rapidly guided to converge to an optimal solution, so that the indoor scene rapid construction capability combined with body-level feature constraint is better integrated and formed.
Has the advantages that: compared with the prior art, the invention has the following remarkable advantages: estimating a rotation matrix by using texture image information to realize the separation of the rotation matrix and depth noise; a new plane parameterization method is developed based on manifold space, and the problems of over-parameterization and local minimum of a three-dimensional reconstruction algorithm of the traditional point characteristics are solved; expanding the available information to object-level features by using a deep neural network detection technology; the ORB-SLAM2 is used as a point tracking module, and the situation that the traditional bundle adjustment model only minimizes the reprojection error is expanded.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Fig. 1 is a general technical schematic of the present invention.
FIG. 2 is a schematic diagram of a camera pose decoupling estimation algorithm framework of the present invention.
FIG. 3 is a frame diagram of a local subgraph fusion algorithm based on point and plane features according to the invention.
FIG. 4 is a block diagram of the combination body-level feature constraint algorithm framework of the present invention.
FIG. 5 is a schematic diagram of the concept of the joint observation constraint framework research based on object-level features.
Detailed Description
The following describes the embodiments of the present invention in further detail with reference to the accompanying drawings.
The invention relates to an indoor scene reconstruction method combined with body-level feature constraint, as shown in fig. 1, the invention comprises the following steps:
1) researching a camera attitude decoupling estimation algorithm, estimating a rotation matrix by using texture image information, and realizing the separation of the rotation matrix and depth noise;
the method comprises the steps of RGBD characteristic preprocessing result, joint optimization and camera attitude;
wherein the joint optimization includes decoupled camera rotation and feature point constraints and decoupled camera translation and feature point constraints.
2) Researching a local subgraph fusion algorithm based on point and plane features, developing a new plane parameterization method based on manifold space, and solving the problems of over-parameterization and local minimum;
the method comprises the steps of selecting a key frame, generating a local subgraph, forming a three-dimensional point cloud by a global model and accurate posture information;
the local sub-image generation comprises the steps of image block extraction, plane parameterization, local attitude solution and key frame selection;
the global model comprises feature point association, plane data association, observation model definition and fusion algorithm.
3) Researching a combination body-level feature constraint algorithm, and further expanding available information to object-level features by utilizing a deep neural network detection technology;
the method comprises the steps of image sequence, ORG-SLAM2 point tracking and local optimization, object detection, plane feature constraint, object-level boundary constraint, joint optimization, point cloud fusion and high-precision point cloud.
4) The ORB-SLAM2 is used as a point tracking module, a combined observation constraint framework containing point features, object outsourcing rectangle information and plane information is researched and established, the situation that a traditional beam method adjustment model only minimizes reprojection errors is expanded, and the indoor scene rapid construction capability combined with body-level feature constraint is formed in an integrated mode.
The invention aims at the scene of fast construction of indoor scenes combined with body-level feature constraints. Based on the image source information, a virtual model is created for the indoor multi-dimensional space entity object in a digital mode, and all characteristics of the indoor space target are accurately shown through the simulation model.
As shown in fig. 2, the camera pose decoupling estimation algorithm framework of the present invention specifically includes the following steps:
1) and establishing a multi-scale window Gaussian mixture uncertain model. Based on the quantitative analysis of the depth observation uncertainty distribution, a multi-scale window Gaussian mixture uncertainty model is researched and provided by combining the uncertainty of the independent pixel and the error distribution characteristic of the depth sensor, and the depth observation uncertainties in different window neighborhoods are taken into consideration by utilizing Gaussian mixture.
2) And establishing a new uncertainty model. The distribution description which is more real for the depth observation can be generated under the condition of more complex environment, and the observation weight can be measured more finely in optimization.
3) Aiming at the problem that the traditional characteristic points are greatly influenced by depth noise, an attitude decoupling estimation algorithm is provided;
the method comprises the steps of image acquisition through a camera, RGB feature matching, preprocessing, joint optimization and accurate posture information acquisition.
The preprocessing comprises selecting key frames, extracting planes and associating the planes with current feature points;
the joint optimization comprises the steps of decoupling camera rotation and feature point constraint, decoupling camera translation and feature point constraint and feature point and plane constraint.
Separating the estimation of the rotation matrix from the noise interference of the depth value by using the visual characteristic point information in the texture image, and estimating the rotation matrix on the basis; and (4) translation vector estimation, namely, combining observation information in the depth image to convert the optimization of the translation vector into a corresponding energy function.
As shown in fig. 3, the local subgraph fusion algorithm framework based on point and plane features in the present invention specifically includes the following steps:
1) and optimizing the accuracy of the camera posture. The wide existence of plane characteristics in an indoor environment is reasonably utilized, and the camera attitude estimation precision is improved;
2) generating a local subgraph;
the method comprises the steps of obtaining an RGBD image sequence from a camera, solving a local attitude, obtaining a global model and obtaining accurate camera attitude information;
the local attitude solving comprises key frame selection, image block extraction, plane parameterization and local attitude solving;
the global model comprises plane data association, feature point data association, observation model definition and fusion algorithm.
A plane parameterization method is provided based on a lie group space, and in the step of local subgraph generation, points, image blocks and planes are used as observation information sources. The image blocks and planes are more robust to depth noise, so that more accurate and robust local subgraphs are generated. The global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs can be realized by minimizing the following fusion objective function.
Figure 158245DEST_PATH_IMAGE008
Figure 32661DEST_PATH_IMAGE009
Wherein,
Figure 471732DEST_PATH_IMAGE010
and
Figure 13572DEST_PATH_IMAGE011
in order to be a matrix of rotations,
Figure 778266DEST_PATH_IMAGE012
and
Figure 405556DEST_PATH_IMAGE013
is the firstiThe anchor points of the image blocks to which the group corresponds,
Figure 851581DEST_PATH_IMAGE014
and
Figure 44665DEST_PATH_IMAGE015
is thatlAndl+1on the frameiThe set of normal vectors is then used to generate,
Figure 714681DEST_PATH_IMAGE016
is the number of corresponding image blocks between two frames and the energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block.
The incremental association mode realizes efficient three-dimensional online reconstruction by introducing a plane data association method.
As shown in fig. 4, the combination body-level feature constraint algorithm framework of the present invention specifically includes the following steps:
1) and (6) local adjustment optimization. The front end of ORB-SLAM2 is used as a feature point to participate in local optimization. Two special adjustment techniques are proposed in local beam adjustment: tracking, local maps;
a) and executing adjustment of map points and optimizing camera attitude variables in the tracking thread.
The method comprises data preprocessing and attitude estimation. Tracking the local map and the new associated frame several steps.
b) Local window adjustment is performed when new key frames are added.
The method comprises the steps of key frame insertion, map point management, new map construction, local beam adjustment and local key frame management.
2) And (4) performing multi-constraint joint optimization. Two types of constraint conditions of plane features and object-level boundary features are adopted. And extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints.
The method comprises the steps of using an object detector SSD to perform object detection, plane feature constraint, object-level boundary constraint and joint optimization.
Simultaneously, the object-level feature information and the plane feature constraint are included in the joint optimization, and the optimization equation is as follows:
Figure 829267DEST_PATH_IMAGE017
in the above formula, the rotation matrix of the camera
Figure 78983DEST_PATH_IMAGE018
And the plane vector
Figure 595415DEST_PATH_IMAGE019
In order for the variables to be unknown,
Figure 436332DEST_PATH_IMAGE018
as a special orthogonal group, as part of a lie group,
Figure 38215DEST_PATH_IMAGE020
is a corresponding covariance matrix, is associated with the feature point scale,
Figure 826042DEST_PATH_IMAGE021
is a point in the map that is,
Figure 196980DEST_PATH_IMAGE022
is that
Figure 474378DEST_PATH_IMAGE021
And (4) observing.
3) And denoising and fusing the point cloud by adopting a TSDF point cloud fusion method, and further extracting a high-precision three-dimensional model.
The method comprises two steps of TSDF-Spalit point cloud fusion and high-precision triangular mesh modeling.
As shown in fig. 5, the research idea of the joint observation constraint framework based on object-level features in the present invention specifically includes the following steps:
1) and (5) constructing a hardware system. Firstly, hardware of the system is constructed, the hardware comprises camera angle setting, camera calibration, rotating pan-tilt control and designing of an efficient drawing mode based on the cooperation of an ARM development board and a common PC, and three cameras are used for simultaneously acquiring data to obtain an RGBD image sequence.
2) Constructing a local map, namely generating a local three-dimensional model based on ARM platform calculation and single acquisition point;
the method comprises the steps of data preprocessing, observation model definition and incremental optimization.
For local map building, an observation model conforming to a hardware configuration is defined according to the specific definition of multiple cameras. The scanning pose is used to replace the single camera pose, and the observation model with the scanning pose as a variable is defined in detail. The partial derivatives of the relevant variables are solved for example with the observation data. In the optimization stage, an incremental optimization model SLAM + + is adopted to optimize the scanning attitude and the space point coordinates, and the constraint of the capability equation is defined as follows:
Figure 563557DEST_PATH_IMAGE023
wherein,
Figure 889496DEST_PATH_IMAGE024
is that the global three-dimensional space point is converted into a local coordinate system, and then the pixel error projected on the image,
Figure 380520DEST_PATH_IMAGE025
is a data acquisition pose constraint that is,
Figure 828819DEST_PATH_IMAGE026
and
Figure 342977DEST_PATH_IMAGE027
the weights corresponding to the two;
3) constructing a global map, namely calculating based on a PC (personal computer) and combining based on a plurality of acquisition points to generate a global three-dimensional model;
the method comprises the steps of similarity retrieval, wide baseline matching and camera pose optimization.
Similarity retrieval based on a visual model is established, and a GMS feature matching method is adopted to solve the problem of wide baseline matching between local maps. After the observation information between the local maps is obtained, G2O is adopted for optimization, and the accurate position and posture information of the local maps under the global coordinate system is obtained.
4) And constructing a panoramic roaming system. Constructing an indoor panoramic roaming system based on the position posture information of the local map and the global map;
the method comprises panoramic image generation, positioning information and web-based indoor roaming system construction.
The invention has the novel points that the texture image information is utilized to estimate the rotation matrix, thereby realizing the separation of the rotation matrix and the depth noise; a new plane parameterization method is developed based on manifold space, and the problems of over-parameterization and local minimum of a three-dimensional reconstruction algorithm of the traditional point characteristics are solved; expanding the available information to object-level features by using a deep neural network detection technology; the ORB-SLAM2 is used as a point tracking module, and the situation that the traditional bundle adjustment model only minimizes the reprojection error is expanded.
The present invention provides a method and a concept for reconstructing an indoor scene combined with body level feature constraints, and a method and a way for implementing the technical solution are many, and the above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, a plurality of modifications and embellishments can be made without departing from the principle of the present invention, and these modifications and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (10)

1. An indoor scene reconstruction method combining body-level feature constraints is characterized by comprising the following steps:
step 1, estimating a rotation matrix by using texture image information according to a camera attitude decoupling estimation algorithm, so as to realize separation of the rotation matrix and depth noise;
step 2, constructing an optimized plane parameterization method based on manifold space according to a local subgraph fusion algorithm based on point and plane characteristics;
step 3, expanding available information to object-level features by utilizing a deep neural network detection technology according to a combination object-level feature constraint algorithm;
and 4, by taking the ORB-SLAM2 as a point tracking module, expanding the situation that the adjustment model of the light beam method only minimizes the reprojection error according to the establishment of a combined observation constraint framework containing point characteristics, object outsourcing rectangle information and plane information, and integrating to form the indoor scene construction combined with body-level characteristic constraint.
2. The method of claim 1, wherein the camera pose decoupling estimation algorithm in step 1 comprises the following steps:
step 1-1, establishing a multi-scale window Gaussian mixture uncertain model;
step 1-2, establishing a new uncertainty model;
and 1-3, establishing an attitude decoupling estimation algorithm.
3. The method of claim 2, wherein the multi-scale window Gaussian mixture uncertainty model is based on quantitative analysis of depth observation uncertainty distribution, and combines independent pixel uncertainty and depth sensor error distribution characteristics to incorporate depth observation uncertainty in different window neighborhoods into the multi-scale window Gaussian mixture uncertainty model by Gaussian mixture.
4. The method of claim 3, wherein the pose decoupling estimation algorithm in steps 1-3 comprises the following steps:
step 1-3-1, separating the estimation of a rotation matrix from the noise interference of depth values by using visual characteristic point information in a texture image, and estimating the rotation matrix on the basis;
step 1-3-2, estimating a translation vector, and converting optimization of the translation vector into a corresponding energy function by combining observation information in a depth image, wherein the method comprises the following steps:
Figure 84706DEST_PATH_IMAGE001
in the above formula, the first and second carbon atoms are,
Figure 267425DEST_PATH_IMAGE002
three-dimensional coordinates representing a rotation matrix, a feature point
Figure 578321DEST_PATH_IMAGE003
And the pose of the camera
Figure 239109DEST_PATH_IMAGE004
In order for the variables to be optimized,
Figure 53482DEST_PATH_IMAGE005
and
Figure 418821DEST_PATH_IMAGE006
is the pixel value of the projected point of the feature point on the left and right texture images,mthe logarithm of feature points representing a match between two frames,Kis an internal orientation element of the camera,uandva column number and a line number representing the picture,
Figure 951434DEST_PATH_IMAGE007
is composed ofuvA covariance matrix of the observations.
5. The method for reconstructing indoor scene combined with body-level feature constraint according to claim 4, wherein the local subgraph fusion algorithm based on point and plane features in step 2 comprises the following steps:
step 2-1, optimizing the attitude precision of the camera;
and 2-2, generating a local subgraph.
6. The method of claim 5, wherein the indoor scene reconstruction method is based on a combination of the body-level feature constraints: the fusion objective function generated by the local subgraph in the step 2-2 is as follows:
Figure 150334DEST_PATH_IMAGE008
Figure 350371DEST_PATH_IMAGE009
wherein,
Figure 874893DEST_PATH_IMAGE010
and
Figure 629223DEST_PATH_IMAGE011
in order to be a matrix of rotations,
Figure 897393DEST_PATH_IMAGE012
and
Figure 686357DEST_PATH_IMAGE013
is the firstiThe anchor points of the image blocks to which the group corresponds,
Figure 116202DEST_PATH_IMAGE014
and
Figure 623406DEST_PATH_IMAGE015
is thatlAndl+1on the frameiThe set of normal vectors is then used to generate,
Figure 429688DEST_PATH_IMAGE016
is the number of image blocks corresponding between two framesThe energy function is a measure of the distance of the anchor point along the normal vector of the corresponding image block;
generating a local subgraph by using points, image blocks and planes as observation information sources based on a plane parameterization method; the global structure of the scene is gradually recovered through the process of local subgraph fusion, and the optimal fusion adjustment of the local subgraphs is realized by minimizing a fusion objective function; the incremental association mode realizes three-dimensional online reconstruction by introducing a plane data association method.
7. The method of claim 6, wherein the algorithm for reconstructing indoor scene with combination body-level feature constraint in step 3 comprises the following steps:
step 3-1, local adjustment optimization;
step 3-2, performing multi-constraint condition joint optimization;
and 3-3, denoising and fusing the point cloud by adopting a point cloud fusion method based on a truncation distance function, and extracting a three-dimensional model.
8. The method for reconstructing indoor scene combined with body-level feature constraint according to claim 7, wherein the local adjustment optimization in step 3-1 adopts two adjustment methods: firstly, executing map point invariance in a tracking thread to optimize adjustment of a camera attitude variable; the second performs local window adjustment when new key frames are added.
9. The method for reconstructing an indoor scene with combination of body-level feature constraints according to claim 8, wherein the multi-constraint joint optimization in step 3-2 adopts two types of constraints of plane features and object-level boundary features; extracting object outsourcing rectangles detected in the texture image as object-level boundary constraints, and simultaneously including object-level feature information and plane feature constraints in joint optimization, wherein an optimization equation is as follows:
Figure 73159DEST_PATH_IMAGE017
in the above formula, the rotation matrix of the camera
Figure 939484DEST_PATH_IMAGE018
And the plane vector
Figure 668406DEST_PATH_IMAGE019
In order for the variables to be unknown,
Figure 543958DEST_PATH_IMAGE018
as a special orthogonal group, as part of a lie group,
Figure 307515DEST_PATH_IMAGE020
is a corresponding covariance matrix, is associated with the feature point scale,
Figure 79162DEST_PATH_IMAGE021
is a point in the map that is,
Figure 560958DEST_PATH_IMAGE022
is that
Figure 709043DEST_PATH_IMAGE021
And (4) observing.
10. The method for reconstructing an indoor scene based on the combination of the object-level feature constraints as claimed in claim 9, wherein the joint observation constraint framework based on the object-level features in step 4 specifically includes:
step 4-1, system construction;
step 4-2, constructing a local map;
4-3, constructing a global map;
in step 4-2, the local map construction adopts an observation model with the scanning posture as a variable, an incremental optimization model SLAM + + is adopted to optimize the scanning posture and the space point coordinates, and the constraint of the capability equation is defined as follows:
Figure 327106DEST_PATH_IMAGE023
wherein,
Figure 269654DEST_PATH_IMAGE024
is that the global three-dimensional space point is converted into a local coordinate system, and then the pixel error projected on the image,
Figure 973168DEST_PATH_IMAGE025
is a data acquisition pose constraint that is,
Figure 924944DEST_PATH_IMAGE026
and
Figure 397513DEST_PATH_IMAGE027
the weights corresponding to the two;
in step 4-3, the global map is constructed by firstly adopting similarity retrieval based on a visual model, secondly adopting a GMS feature matching method to solve wide baseline matching between local maps and finally adopting a general map optimization method to optimize so as to obtain accurate position and attitude information of the local maps under a global coordinate system.
CN202210030559.3A 2022-01-12 2022-01-12 Indoor scene reconstruction method combining body-level feature constraints Pending CN114037804A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210030559.3A CN114037804A (en) 2022-01-12 2022-01-12 Indoor scene reconstruction method combining body-level feature constraints

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210030559.3A CN114037804A (en) 2022-01-12 2022-01-12 Indoor scene reconstruction method combining body-level feature constraints

Publications (1)

Publication Number Publication Date
CN114037804A true CN114037804A (en) 2022-02-11

Family

ID=80141575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210030559.3A Pending CN114037804A (en) 2022-01-12 2022-01-12 Indoor scene reconstruction method combining body-level feature constraints

Country Status (1)

Country Link
CN (1) CN114037804A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060191333A1 (en) * 2003-04-18 2006-08-31 Noe Stephen A Runoff rain gauge
CN108564616A (en) * 2018-03-15 2018-09-21 中国科学院自动化研究所 Method for reconstructing three-dimensional scene in the rooms RGB-D of fast robust

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060191333A1 (en) * 2003-04-18 2006-08-31 Noe Stephen A Runoff rain gauge
CN108564616A (en) * 2018-03-15 2018-09-21 中国科学院自动化研究所 Method for reconstructing three-dimensional scene in the rooms RGB-D of fast robust

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王俊: "基于RGB-D相机数据的室内三维重建模型与方法研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Similar Documents

Publication Publication Date Title
CN112435325B (en) VI-SLAM and depth estimation network-based unmanned aerial vehicle scene density reconstruction method
US11748907B2 (en) Object pose estimation in visual data
CN106651938B (en) A kind of depth map Enhancement Method merging high-resolution colour picture
Liu et al. Continuous depth estimation for multi-view stereo
US7856125B2 (en) 3D face reconstruction from 2D images
CN118212141A (en) System and method for hybrid depth regularization
KR20180054487A (en) Method and device for processing dvs events
US20230419438A1 (en) Extraction of standardized images from a single-view or multi-view capture
CN111063021A (en) Method and device for establishing three-dimensional reconstruction model of space moving target
Yuan et al. SDV-LOAM: semi-direct visual–LiDAR Odometry and mapping
CN110517211B (en) Image fusion method based on gradient domain mapping
CN106791774A (en) Virtual visual point image generating method based on depth map
CN113538569A (en) Weak texture object pose estimation method and system
Ramirez et al. Open challenges in deep stereo: the booster dataset
Xu et al. Three dimentional reconstruction of large cultural heritage objects based on uav video and tls data
CN114782628A (en) Indoor real-time three-dimensional reconstruction method based on depth camera
CN112102504A (en) Three-dimensional scene and two-dimensional image mixing method based on mixed reality
Fu et al. Image stitching techniques applied to plane or 3-D models: a review
Jisen A study on target recognition algorithm based on 3D point cloud and feature fusion
CN114935316B (en) Standard depth image generation method based on optical tracking and monocular vision
Yagi et al. Diminished reality for privacy protection by hiding pedestrians in motion image sequences using structure from motion
CN113674407B (en) Three-dimensional terrain reconstruction method, device and storage medium based on binocular vision image
CN114037804A (en) Indoor scene reconstruction method combining body-level feature constraints
Jäger et al. A comparative Neural Radiance Field (NeRF) 3D analysis of camera poses from HoloLens trajectories and Structure from Motion
CN109089100B (en) Method for synthesizing binocular stereo video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220211