Three-dimensional reconstruction device and method based on multi-view structure
Technical Field
The invention relates to the field of indoor three-dimensional reconstruction and multi-view structures, in particular to a method for carrying out indoor scene three-dimensional reconstruction in a multi-view structure mode.
Background
With the vigorous development of AI technology and the continuous emergence of novel equipment, three-dimensional reconstruction is a hot research topic in the field of computer graphics, and the main task is to perform three-dimensional modeling on a real physical world based on data acquired by various sensors and by adopting mathematical tools such as multi-view geometry, probability statistics, optimization theory and the like, and establish a bridge between the real world and a virtual world. Therefore, three-dimensional reconstruction has wide application in a plurality of different fields such as manufacturing, medical treatment, movie and television production, cultural relic protection, augmented reality, virtual reality, positioning navigation and the like. The application and development of the indoor three-dimensional scene reconstruction technology in augmented reality are particularly rapid, and the technology comprises indoor augmented reality games, robot navigation, AR furniture house watching and the like.
At present, the application and development of the three-dimensional scene reconstruction technology in augmented reality is particularly rapid, and especially the technology related to the field of indoor reconstruction is very rapid. However, most of the common conventional technologies are realized by a single RGB-D camera, and the reconstruction of an indoor three-dimensional scene by using a plurality of RGB-D cameras in a multi-panorama fusion manner is still novel at present. The task of understanding and navigating modern scenes requires having a database of highly recognized 3D scenes, mostly obtained by hand-held scanning or panoramic scanning. The handheld scanning technique takes the RGB-D video stream as input and utilizes modern dense reconstruction systems or visual SLAM algorithms for tracking and integrating sequential frames. Panoramic scanning, on the other hand, arranges the scanning process in multiple in-situ rotations to build a 3D panorama for progressive integration at different viewpoints. Compared to panoramic scanning, which requires continuous attention to areas with sufficient geometric or photometric features for reliable tracking, panoramic scanning is much easier to track in-situ rotation and has become a practical alternative for industrial or commercial applications. Currently, various techniques have been developed to construct a 360-degree panorama by using a panorama scan, and can be classified into three categories, i.e., 2D to 2D, 2D to 3D, and 3D to 3D, according to its input and output image types (i.e., whether or not depth information is included). While it is possible to recover coarse depth information by using a 2D RGB camera for canonical stitching and VR/AR applications, the depth quality is generally not acceptable for high definition 3D reconstruction. Current 3D-to-3D technologies based on a single RGB-D camera limit the freedom of view when the sensor is moving and therefore cannot cover most spherical panoramas. This narrow field of view problem can be solved by using multiple RGB-D cameras (e.g. arranged vertically for horizontal rotation), but this in turn introduces new camera calibration and synchronization problems.
In addition, the traditional indoor three-dimensional reconstruction or panoramic scanning is to scan an indoor scene frame by manually holding a single depth camera, which is very inefficient, and the difficulty in controlling the moving speed in the scanning process can cause the scanning result to be noisy. This problem can be solved by introducing a specific hardware-mounted device, but this introduces a new sensor at the same time, and the specific pose of the camera needs to be determined by solving the fusion problem among multiple sensors so as to perform accurate reconstruction.
Definitions of related nouns and symbols
1. Factor Graph (Factor Graph)
The factor graph is one of the probability graphs, which are many, the most common being Bayesian networks and Markov Random Fields. In probability maps, it is a common problem to find the edge distribution of a certain variable. There are many solutions to this problem, one of which is to convert Bayesian networks and Markov Random Fields into a Facor Graph and then solve it with sum-product algorithm. And based on Factor Graph, the sum-product algorithm can be used for efficiently solving the edge distribution of each variable. The more detailed explanation is that a global function with multiple variables is factorized to obtain the product of several local functions, and a bipartite graph obtained based on the product is called a factor graph. The factor graph is a representation graph of factorization of functions, and generally comprises two nodes, namely variable nodes and function nodes. As is known, a global function can be decomposed into the product of a plurality of local functions, and factorization works, and these local functions and corresponding variables can be represented on a factor graph.
2. The hardware carrying device is a three-dimensional reconstruction device based on a multi-purpose structure and designed autonomously.
Disclosure of Invention
In order to specifically realize the combination of the mode of constructing the 3D panorama and the three-dimensional reconstruction of a large indoor scene, the invention solves the problems of panoramic scanning and pose estimation among multiple cameras to reconstruct a subsequent indoor three-dimensional scene or generate the panorama, and thus, the invention discloses a multi-view structure-based three-dimensional reconstruction device which obtains more stable and accurate input data by fixing the positions among the cameras and controlling the scanning speed at a constant speed. The invention provides a tight coupling framework by fusing information of a plurality of sensors (three depth cameras and a speed reducing motor) and utilizing a factor graph optimization mode to realize high-precision and robust state estimation and reconstruction. The invention provides a multi-view structure-based three-dimensional reconstruction device and method.
According to the method, three Azure kinect depth cameras are required to be mounted on an autonomously designed rotary scanning device, the three depth cameras are fixedly mounted at a certain angle through the rotary scanning mounting device of the depth cameras, and the depth cameras mounted on the device are enabled to rotate in situ through closed-loop control over a motor. Therefore, the cameras rotate on the same common shaft, the complexity of camera pose calculation is greatly simplified, and full coverage of a scene can be realized without sufficient overlapping of a visual field due to the fact that the cameras are carried at a fixed angle. Based on this key device, we propose a new way to synchronously track these cameras-simplification of sensor pose freedom is achieved by corresponding regularization term constraints. Because the device of the invention introduces the speed reducing motor, the invention provides a tight coupling frame of the vision-wheel speed meter so as to realize the state estimation and the three-dimensional scene construction with high precision and robustness. The key idea of the invention is to perform anti-noise three-dimensional reconstruction of large indoor scenes by simplifying the problem of pose freedom among multiple cameras and by fusing the information of a depth camera and a wheel speed meter.
The primary problem with multi-camera panoramic scanning is how to recover the relative pose of these RGB-D frames, most commercial depth sensors do not support shutter synchronization, and forced time stamp grouping results in misalignment of the final image by ignoring motion during the shutter interval. However, the latest Azure kinect depth camera used by the people realizes synchronization among different cameras on a hardware level, and fundamentally solves the potential problem.
The invention realizes the functions: firstly, required data are acquired through an autonomously designed multi-view structure-based three-dimensional reconstruction device, and then three-dimensional reconstruction of a large indoor scene is completed through a multi-panorama fusion mode by combining with the proposed vision-wheel speed meter tight coupling frame. The reconstruction part is mainly divided into two steps, the first step is the construction of a single-point position 3D panorama, the second step is the fusion of a multi-point position panorama, and the whole indoor scene can be reconstructed in a three-dimensional mode through the two steps.
The utility model provides a three-dimensional reconstruction device based on many meshes structure, includes depth camera carrying device, rotation framework, supporting platform, three depth cameras, wheel speed meter, servo motor and cloud platform.
The depth camera carrying device is connected with the rotating framework, the rotating framework is connected with the holder and drives the depth camera carrying device to stably rotate through the rotation of the holder; the cradle head is arranged at the upper end of the supporting platform, is connected with the servo motor and drives the cradle head to rotate through the servo motor; the servo motor is arranged on the supporting platform base.
The wheel speed meter is connected with the servo motor and used for obtaining real-time angle information of the three-dimensional reconstruction device.
The depth camera carrying device is formed by 3D printing and is divided into an upper part, a middle part and a lower part which are respectively used for carrying three depth cameras with different angles,
the servo motor drives the rotating framework to rotate through the closed-loop control holder so as to stabilize the rotating speed of the depth camera carrying device.
The depth camera adopts an Azure kinect depth camera, and the three depth cameras form an angle of 120 degrees and can cover the whole visual field.
A three-dimensional reconstruction method based on a multi-view structure comprises the following steps:
step 1: utilizing a wheel speed meter feedback information closed loop to control a servo motor to enable a three-dimensional reconstruction device to rotate at a constant speed, and acquiring real-time angle information;
step 2: carrying the depth camera in a rotating device at a fixed angle to acquire position and attitude information between the cameras;
and step 3: acquiring data images of an indoor scene by a plurality of depth cameras: an RGB map and a depth map; camera motion positioning, namely attitude estimation, is carried out by fusing real-time angle information and attitude information between cameras;
and 4, step 4: preprocessing an RGB image of an indoor local scene acquired at a single positioning point and a depth image corresponding to the RGB image to construct a single-point 3D panorama;
and 5: acquiring and constructing data of a multi-point 3D panorama;
step 6: fusing the multipoint 3D panoramas to obtain fused multiple panoramas;
and 7: and realizing the anti-noise three-dimensional reconstruction of the indoor scene through the fused multiple panoramas.
The step (2) is specifically operated as follows:
firstly, external parameters between cameras are obtained by calibrating in advance according to the fixed position of the depth camera in the device. Secondly, the device is fixed and rotated around a shaft, so that the motion of the camera is changed from the original six degrees of freedom into a single degree of freedom, namely the angular velocity in the horizontal direction, other degrees of freedom are fixed through the constraint of hardware, and the pose degree of freedom of the sensor is simplified through the constraint of a corresponding regularization term.
The problem of non-registration of the cameras is solved and the complexity of camera pose calculation is simplified through a three-dimensional reconstruction device fixedly provided with a plurality of depth cameras and the hardware synchronism of the cameras; from the coaxiality of the camera motions, their states are jointly derived without relying on synchronicity or significant landmark co-occurrence. The regularization term comprises four or three terms: a landmark observation factor term, an attitude regularization factor term, and a smoothness factor term, and a wheel speed meter angle factor term.
The step (3) is specifically operated as follows:
acquiring data images of an indoor scene by a plurality of cameras of a three-dimensional reconstruction device: RGB map and depth map. And (3) calculating the pose of the camera by combining the real-time angle information obtained in the step (1) and the step (2), the RGB-D image acquired by the depth camera and the external parameters among the cameras, and performing camera motion positioning estimation on the camera motion, namely pose estimation. The camera motion is restrained and estimated jointly through a landmark observation factor item, an attitude regularization factor item, a smoothness factor item and a wheel speed meter angle factor item, the camera motion pose is estimated accurately, and the target is achieved through regularization restraint under a factor graph optimization framework.
The step (4) is specifically operated as follows:
the construction of a single-point panorama is carried out through equal rectangular image projection, the original color and depth measurement values are transformed into an equal rectangular representation form of the required panorama, so that statistical modeling is carried out on sensor noise, and the initially acquired panorama is optimized through a filtering or completion method so as to keep the geometric quality of the panorama.
The specific method of the step (5) is as follows:
and acquiring data images of different point positions by controlling the movement of the three-dimensional reconstruction device. During scanning, the three-dimensional reconstruction device is required to be controlled at a plurality of point positions to realize in-situ rotation, and the number and the positions of the positioning points are set according to the size and the structure of an indoor scene.
And then constructing a panoramic image corresponding to each positioning point by the method in the step (4) according to the obtained data images of different positioning points.
The specific method of the step (6) is as follows:
and for the registration between the two panoramas, constructing a dense corresponding relation between pixels of the two panoramas, establishing and minimizing a geometric distance in an iterative mode, estimating relative transformation between the two panoramas by adopting an ICP (inductively coupled plasma) algorithm, and so on, and finally realizing the fusion of the plurality of panoramas.
The invention has the following beneficial effects:
the method solves the problems of panoramic scanning and pose estimation among multiple cameras by designing the multi-view structure-based three-dimensional reconstruction device so as to reconstruct the subsequent indoor three-dimensional scene or generate a panoramic image. Moreover, the invention provides a visual-wheel speed meter tightly-coupled framework by fusing information of a plurality of sensors (three depth cameras and a speed reducing motor) and utilizing a factor graph optimization mode so as to realize high-precision and robust state estimation and reconstruction. The invention limits the part of motion consistency of the camera through the three-dimensional reconstruction device so as to restrain the degree of freedom of the camera, and adopts a factor graph optimization mode to more accurately estimate the motion of the camera through the proposed tight coupling frame, so that the reconstruction result is more accurate.
The invention provides a flexible two-step three-dimensional reconstruction frame based on indoor scenes through construction and fusion of panoramic pictures, combines the advantages of a traditional slam high-quality algorithm and a 3D-based panoramic picture, can obtain more accurate indoor scene reconstruction effect and realize higher-quality reconstruction, and the device designed by the invention can be carried on an indoor mobile robot, thereby providing new possibility for the development of positioning and navigation of a follow-up large indoor scene service robot and the like.
Drawings
FIG. 1 is a flow chart of an embodiment of the present invention.
FIG. 2 is a schematic structural diagram of a three-dimensional reconstruction apparatus without a depth camera according to an embodiment of the present invention;
FIG. 3 is a factor graph according to an embodiment of the present invention.
Detailed Description
The objects and effects of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
In order to specifically realize the combination of the mode of constructing the 3D panorama and the three-dimensional reconstruction of a large indoor scene, the invention solves the problems of panoramic scanning and pose estimation among multiple cameras so as to reconstruct a subsequent indoor three-dimensional scene or generate a panorama, and thus the invention discloses a panoramic scanning device based on multiple cameras, which obtains more stable and accurate input data by fixing the positions among the cameras and controlling the scanning speed at a constant speed. The invention provides a tight coupling framework by fusing information of a plurality of sensors (three depth cameras and a speed reducing motor) and utilizing a factor graph optimization mode to realize high-precision and robust state estimation and reconstruction. In summary, the present invention provides a three-dimensional reconstruction apparatus and method based on a multi-view structure,
as shown in fig. 2, a three-dimensional reconstruction apparatus based on a multi-view structure includes a depth camera carrying apparatus 1, a rotating frame 2, a supporting platform 5, three depth cameras, a wheel speed meter, a servo motor 4, and a pan/tilt head 3.
The depth camera carrying device 1 is connected with the rotating framework 2, the rotating framework 2 is connected with the holder 3, and the depth camera carrying device 1 is driven to stably rotate through the rotation of the holder 3; the cloud platform 3 is arranged at the upper end of the supporting platform 5, is connected with the servo motor 4, and drives the cloud platform 3 to rotate through the servo motor 4; the servo motor 4 is arranged on the base of the supporting platform 5.
The wheel speed meter is connected with the servo motor 4 and used for obtaining real-time angle information of the three-dimensional reconstruction device.
The depth camera carrying device 1 is formed by 3D printing and is divided into an upper part, a middle part and a lower part which are respectively used for carrying three depth cameras with different angles,
the servo motor 4 drives the rotating framework 2 to rotate through the closed-loop control holder 3 so as to stabilize the rotating speed of the depth camera carrying device 1.
The three depth cameras can cover the whole visual field by forming an angle of 120 degrees, and can rotate in situ in the scene through the rotation of the three-dimensional reconstruction device.
As shown in fig. 1, a three-dimensional reconstruction method based on a multi-view structure includes the following steps:
step 1: utilizing a wheel speed meter feedback information closed loop to control a servo motor 4 to enable a three-dimensional reconstruction device to rotate at a constant speed, and acquiring real-time angle information;
step 2: carrying the depth camera in a rotating device at a fixed angle to acquire position and attitude information between the cameras;
firstly, external parameters between cameras are obtained by calibrating in advance according to the fixed position of the camera in the device. Secondly, the device is fixed and rotated around a shaft, so that the motion of the camera is changed from the original six degrees of freedom into a single degree of freedom, namely the angular velocity in the horizontal direction, other degrees of freedom are fixed through the constraint of hardware, and the pose degree of freedom of the sensor is simplified through the constraint of a corresponding regularization term.
The problem of non-registration of the cameras is solved and the complexity of camera pose calculation is simplified through a three-dimensional reconstruction device fixedly provided with a plurality of depth cameras and the hardware synchronism of the cameras; from the coaxiality of the camera motions, their states are jointly derived without relying on synchronicity or significant landmark co-occurrence. The regularization term comprises four terms: a landmark observation factor item (establishing a relationship between the frame posture and the mark point), a posture regularization factor item (adjusting the camera motion to be consistent with the horizontal rotation and estimating the posture of the rotating shaft), a smoothness factor item (limiting the angular velocity to be consistent between continuous frames so that the angular velocity is kept at a constant speed), and a wheel speed meter angle factor item (keeping the stable rotation of the whole device and adjusting the device motion to be consistent with the motor rotation and keeping the angle of the rotating shaft unchanged). Since all cameras and axes constitute a fixed object and move together during scanning, a unified physical model and external model can be used to describe their motion. By utilizing the characteristic of coaxial rotation, all the cameras and the shafts are changed into a mixture body and move together in the scanning process. Especially for in-situ rotation, once the external problems between the axis and the cameras are solved, the multiple poses of the camera, i.e. six axes, three-dimensional translation and three-dimensional rotation, which need to be considered originally can be represented by only one degree of freedom, i.e. the azimuth angle of the rotator.
And step 3: acquiring data images of an indoor scene by a plurality of depth cameras: an RGB map and a depth map; camera motion positioning, namely attitude estimation, is carried out by fusing real-time angle information and attitude information between cameras;
acquiring data images of an indoor scene by a plurality of cameras of a three-dimensional reconstruction device: RGB map and depth map. And (3) calculating the pose of the camera by combining the real-time angle information obtained in the step (1) and the step (2), the RGB-D image acquired by the depth camera and the external parameters among the cameras, and performing camera motion positioning estimation on the camera motion, namely pose estimation. The camera motion is restrained and estimated through a landmark observation factor item, an attitude regularization factor item, a smoothness factor item and a wheel speed meter angle factor item, the camera motion pose is accurately estimated, the target is realized through regularization restraint under a factor graph optimization frame, and a factor graph is shown in figure 3.
Regarding the synchronous registration problem among multiple cameras, the synchronous shutter shooting effect among different cameras can be conveniently realized by utilizing the synchronous characteristic of the Azure Kinect in the aspect of related hardware. And regarding the pose estimation of the camera, the pose of the camera is calculated according to the acquired RGB image, the depth image and angle information provided by the wheel speed meter, the motion of the camera is estimated, and a final global optimization value is obtained through a close coupling frame of the vision-wheel speed meter, so that the pose of the camera is accurately estimated.
And 4, step 4: preprocessing an RGB image of an indoor local scene acquired at a single positioning point and a depth image corresponding to the RGB image to construct a single-point 3D panorama;
the construction of a single-point panorama is carried out through equal rectangular image projection, the original color and depth measurement values are transformed into an equal rectangular representation form of the required panorama, so that statistical modeling is carried out on sensor noise, and the initially acquired panorama is optimized through a filtering or completion method so as to keep the geometric quality of the panorama.
Original RGB-D pixels are uniformly re-projected to a target domain through equal rectangular image projection and adjacent relation of the original RGB-D pixels is kept, original color and depth measurement values are transformed into equal rectangular representation forms of a required panoramic image so as to carry out statistical modeling on sensor noise, and optimization (initially acquired panoramic images are optimized through a filtering or completion method, such as a GC filter) is carried out so as to keep geometric quality of the panoramic images. When processing in the panoramic image domain, instead of using a conventional data structure to generate a point cloud or patch mesh, an organized image is produced, which would be more favorable for statistics and optimization of the original depth measurements. Such a panorama can convey most of the valid measurements since almost all of the original image areas can be merged into the panorama with little occlusion, since each original frame pair to be integrated has little disparity to construct the panorama. For construction of panoramas, there are several alternative configurations, such as cube maps, stereographic projection images, and equirectangular image projections. Among them, equal rectangular image projection is the best method to uniformly re-project the original RGB-D pixels to the target domain and maintain their neighborhood.
And 5: acquiring and constructing data of a multi-point 3D panorama;
and acquiring data images of different point positions by controlling the movement of the three-dimensional reconstruction device. During scanning, the three-dimensional reconstruction device is required to be controlled at a plurality of point positions to realize in-situ rotation, and the number and the positions of the positioning points are set according to the size and the structure of an indoor scene.
And then constructing a panoramic image corresponding to each positioning point by the method in the step (4) according to the obtained data images of different positioning points.
In order to realize the three-dimensional reconstruction of the whole large indoor scene, the data acquisition of one positioning point is difficult to cover the whole indoor scene, and the data acquisition of at least 2-3 positioning points is required to cover the whole indoor scene. In this case, the data images of different points can be obtained by controlling the movement of the scanning device. During scanning, the scanning device is required to control the depth camera device to realize in-situ rotation at a plurality of point positions, and the number and the positions of the positioning points are set according to the size and the structure of an indoor scene.
And then constructing a panoramic image corresponding to each positioning point by the method in the step (4) according to the obtained data images of different positioning points.
And (6): fusing the multipoint 3D panoramas to obtain fused multiple panoramas;
and (4) fusion of the panoramas, namely consistent alignment and registration stitching between the multiple panoramas. And for the registration between the two panoramas, constructing a dense corresponding relation between pixels of the two panoramas, establishing and minimizing a geometric distance in an iterative mode, estimating relative transformation between the two panoramas by adopting an ICP (inductively coupled plasma) algorithm, and finally realizing the fusion of the plurality of panoramas.
And (7): and realizing the anti-noise three-dimensional reconstruction of the indoor scene through the fused multiple panoramas.