CN110148086B

CN110148086B - Depth filling method and device for sparse depth map and three-dimensional reconstruction method and device

Info

Publication number: CN110148086B
Application number: CN201910353712.4A
Authority: CN
Inventors: 陈崇雨; 卫彦智; 赵东; 林倞
Original assignee: DMAI Guangzhou Co Ltd
Current assignee: DMAI Guangzhou Co Ltd
Priority date: 2019-04-28
Filing date: 2019-04-28
Publication date: 2023-02-17
Anticipated expiration: 2039-04-28
Also published as: CN110148086A

Abstract

The invention discloses a depth filling method and a depth filling device for a sparse depth map, a three-dimensional reconstruction method and a three-dimensional reconstruction device, wherein the depth filling method for the sparse depth map comprises the following steps: processing the sparse depth map by respectively adopting at least two interpolation algorithms to obtain a corresponding processed sparse depth map; respectively obtaining the depth value of each feature point in the sparse depth map in each processed sparse depth map; retaining feature points of which the difference in depth values is smaller than a predetermined threshold; and acquiring a dense depth map according to the reserved feature points. By implementing the method, the simple method for obtaining the dense depth map by performing depth filling on the sparse depth map is realized, and an accurate data basis is provided for performing three-dimensional reconstruction by using the dense depth map.

Description

Depth filling method and device for sparse depth map, and three-dimensional reconstruction method and device

Technical Field

The invention relates to the technical field of computer vision, in particular to a depth completion method and device of a sparse depth map and a three-dimensional reconstruction method and device.

Background

Three-dimensional reconstruction is a technology for establishing a mathematical model suitable for computer processing for a three-dimensional object, is also a basis for processing and analyzing the three-dimensional object in a computer environment, and is a key technology for establishing virtual reality for expressing an objective world in a computer. Three-dimensional reconstruction refers to a mathematical process and a computer technology for recovering three-dimensional information (shape and the like) of an object by using two-dimensional projection of the object, and comprises the steps of data acquisition, preprocessing, point cloud splicing (fusion), characteristic analysis and the like.

At present, one of conventional three-dimensional reconstruction methods is to acquire images of a target scene at different positions by using image acquisition equipment to obtain sparse depth images of the target scene, perform depth filling on the sparse depth images to obtain dense depth images, and construct a three-dimensional model of the target scene according to the dense depth images. The method for depth filling of the sparse depth image directly determines the matching degree of the dense depth image and the real scene image, and further directly influences the final reconstruction effect of the three-dimensional model.

Disclosure of Invention

In view of this, embodiments of the present invention provide a depth filling method and apparatus for a sparse depth map, and a three-dimensional reconstruction method and apparatus, so as to implement depth filling for the sparse depth map and obtain a dense depth image.

According to a first aspect, an embodiment of the present invention provides a depth padding method for a sparse depth map, including: processing the sparse depth map by respectively adopting at least two interpolation algorithms to obtain corresponding processed sparse depth maps; respectively obtaining the depth value of each feature point in the sparse depth map in each processed sparse depth map; keeping the characteristic points of which the difference of the depth values is less than a preset threshold value; and acquiring a dense depth map according to the reserved feature points.

Optionally, the difference in the depth values comprises: a difference in depth values and/or a squared difference in depth values.

Optionally, the obtaining a dense depth map according to the retained feature points includes: each reserved characteristic point forms a complete depth map; and acquiring an original image corresponding to the sparse depth map, and performing depth filling on the complete depth map according to the original image to obtain a dense depth map.

Optionally, the obtaining an original image corresponding to the sparse depth map, and performing depth compensation on the complete depth map according to the original image to obtain a dense depth map includes: processing the original image according to the resolution of the sparse depth map; filtering the complete depth image according to the processed original image; and carrying out iterative filtering on the filtered complete depth map by adopting a preset filtering algorithm according to the processed original image and the filtered complete depth map to obtain the dense depth map.

Optionally, the depth filling method for the sparse depth map further includes: judging whether the resolution of the dense depth map is lower than that of the original image; and when the resolution of the dense depth map is lower than that of the original image, updating the sparse depth map into the dense depth map, and returning to the step of respectively performing upsampling on the sparse depth map by adopting at least two preset interpolation algorithms to obtain a middle sparse depth map corresponding to each preset difference algorithm until the resolution of the dense depth map is the same as that of the original image.

According to a second aspect, the embodiment of the present invention further provides a method for acquiring at least one set of equivalent binocular image information including a scene target; establishing a corresponding sparse depth map according to each group of equivalent binocular image information; by adopting the depth filling method of the sparse depth maps, depth filling is carried out on each sparse depth map to obtain each dense depth map; and performing depth fusion on each dense depth map according to the equivalent binocular image information to obtain a three-dimensional model of the scene target.

Optionally, each set of equivalent binocular image information includes a first image and a corresponding first pose acquired at a first preset position, and a second image and a corresponding second pose acquired at a second preset position; the method for acquiring each group of equivalent binocular image information comprises the following steps: obtaining an equivalent baseline of the current group according to the first preset position and the second preset position of the scene target; determining the first pose and the second pose of the scene target according to the equivalent baseline; controlling image acquisition equipment to acquire the first image at the first preset position according to the first position; and controlling the image acquisition equipment to acquire the second image at the second preset position according to the second pose, wherein the first image and the second image form a group of equivalent binocular images.

Optionally, the establishing of the corresponding sparse depth map according to each set of equivalent binocular image information includes: performing feature matching on the first image and the second image in each group of equivalent binocular image information to obtain a plurality of groups of feature point information; respectively calculating parallax information of the feature points according to the feature point information; obtaining depth information of each feature point according to the parallax information and the equivalent base line; and establishing the corresponding sparse depth map according to the depth information of each feature point.

Optionally, after obtaining the depth information of each feature point according to the disparity information and the equivalent baseline, before establishing the corresponding sparse depth map according to the depth information of each feature point, the three-dimensional reconstruction method further includes: obtaining a plurality of characteristic point areas according to a preset coordinate range and the depth information of the characteristic points of the current group; respectively calculating the average depth information of each feature point contained in the feature point region; and each feature point region corresponds to one feature point, and the average depth information is determined as the depth information of the feature point corresponding to the feature point region.

According to a third aspect, an embodiment of the present invention further provides a depth completion apparatus for a sparse depth map, including: the first processing module is used for processing the sparse depth map by adopting at least two interpolation algorithms respectively to obtain a corresponding processed sparse depth map; the second processing module is used for respectively acquiring the depth value of each feature point in the sparse depth map in each processed sparse depth map; the third processing module is used for reserving the feature points of which the difference of the depth values is smaller than a preset threshold value; and the fourth processing module is used for acquiring a dense depth map according to the reserved feature points.

According to a fourth aspect, an embodiment of the present invention further provides a three-dimensional reconstruction apparatus, including: the fifth processing module is used for acquiring at least one group of equivalent binocular image information containing scene targets; the sixth processing module is used for establishing a corresponding sparse depth map according to each group of equivalent binocular image information; a seventh processing module, configured to perform depth filling on each sparse depth map by using the depth filling apparatus for sparse depth maps according to claim 10, to obtain each dense depth map; and the eighth processing module is used for carrying out depth fusion on each dense depth map according to the equivalent binocular image information to obtain a three-dimensional model of the scene target.

According to a fifth aspect, embodiments of the present invention provide an electronic device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, and the processor executing the computer instructions to perform the depth completion method for a sparse depth map as described in the first aspect, or any one of its alternative embodiments, or to perform the three-dimensional reconstruction method as described in the second aspect, or any one of its alternative embodiments.

According to a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium storing computer instructions for causing a computer to perform the method for depth padding of a sparse depth map as described in the first aspect, or any one of its optional implementations, or perform the method for three-dimensional reconstruction as described in the second aspect, or any one of its optional implementations.

The technical scheme of the invention has the following beneficial effects:

according to the depth completion method for the sparse depth map, provided by the embodiment of the invention, the sparse depth map is processed by adopting different interpolation algorithms, the depth value of each processed sparse depth map of each feature point in the sparse depth map is obtained, the feature points with the depth value difference smaller than a preset threshold value are reserved, and the dense depth map is obtained according to the reserved feature points. Therefore, the simple method for obtaining the dense depth map by carrying out depth filling on the sparse depth map is provided, and an accurate data basis is provided for carrying out three-dimensional reconstruction by utilizing the dense depth map.

The three-dimensional reconstruction method provided by the embodiment of the invention comprises the steps of obtaining at least one group of equivalent binocular image information containing a scene target, establishing a corresponding sparse depth map according to each group of equivalent binocular image information, processing the sparse depth map by adopting different interpolation algorithms, obtaining the depth value of each processed sparse depth map of each feature point in the sparse depth map, reserving feature points with the depth value difference smaller than a preset threshold value, obtaining a dense depth map according to the reserved feature points, and finally performing depth fusion on the dense depth map to obtain a three-dimensional model of the scene target. Therefore, the simple three-dimensional reconstruction method of the scene target is provided, the method can be realized without depending on complex image acquisition equipment and computer equipment, and the three-dimensional reconstruction method has the characteristics of simplicity and high efficiency.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is a flow chart of a method of depth completion of a sparse depth map according to an embodiment of the present invention;

FIG. 2 is a flow chart of a method of three-dimensional reconstruction according to an embodiment of the present invention;

fig. 3 is a schematic diagram of an equivalent baseline and a photographing direction according to an embodiment of the present invention;

FIG. 4 is another schematic diagram of an equivalent baseline and a shooting direction according to an embodiment of the invention;

FIG. 5 is another schematic diagram of an equivalent baseline and a shooting direction according to an embodiment of the invention;

FIG. 6 is a schematic structural diagram of a depth completion apparatus for sparse depth maps according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a three-dimensional reconstruction apparatus according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.

The technical features mentioned in the different embodiments of the invention described below can be combined with each other as long as they do not conflict with each other.

The embodiment of the invention provides a depth filling method of a sparse depth map, which comprises the following steps of:

step S1: and processing the sparse depth map by adopting at least two interpolation algorithms respectively to obtain a corresponding processed sparse depth map. Specifically, the sparse depth map is a sparse depth map corresponding to a scene target image, and the difference algorithm adopted includes: nearest neighbor interpolation algorithms, bilinear interpolation algorithms, cubic convolution interpolation algorithms, and the like.

Step S2: and respectively obtaining the depth value of each processed sparse depth map of each feature point in the sparse depth map. Specifically, after the sparse depth maps are up-sampled by different interpolation algorithms, the depth values of different feature points in the processed different sparse depth maps may change, and if the depth value difference in the different sparse depth maps is large, it indicates that the depth value of the feature point is large in difference with the depth value of a true point represented by the feature point, and the depth value of the feature point is inaccurate, so as to avoid the influence of the inaccurate feature point depth value on a final result, the feature point is deleted.

And step S3: feature points having a difference in depth value smaller than a predetermined threshold are retained. Specifically, only feature points with depth value differences smaller than a predetermined threshold are retained, and a dense depth map composed of these feature points maintains a high degree of consistency with the depth information of the original image.

And step S4: and acquiring a dense depth map according to the reserved feature points. In practical application, the original image is used for depth compensation of the depth map formed by the reserved characteristic points, so that a dense depth map is obtained.

Through the steps S1 to S4, in the depth completion method for the sparse depth map provided in the embodiment of the present invention, the sparse depth map is processed by using different interpolation algorithms, the feature points having the depth value difference smaller than the preset threshold are retained by obtaining the depth value of each processed sparse depth map of each feature point in the sparse depth map, and the dense depth map is obtained according to the retained feature points. The method has the characteristics of simplicity and rapidness, can obtain the dense depth map which accurately reflects the pixel depth of the original image, and provides an accurate data base for three-dimensional reconstruction by using the dense depth map.

Specifically, in an embodiment, in the step S1, the sparse depth map is processed by at least two interpolation algorithms respectively to obtain corresponding processed sparse depth maps. In practical applications, for example, given a sparse depth map D with a resolution of H × W and only N depth values, the sparse depth map D may be up-sampled by nearest neighbor interpolation, bilinear interpolation, and cubic convolution interpolation to obtain sparse depth maps D1, D2, and D3, respectively.

Specifically, in an embodiment, in the step S2, the depth value of each processed sparse depth map of each feature point in the sparse depth map is obtained. In practical application, the depth values of the N feature points in D1, D2, and D3 are obtained.

Specifically, in an embodiment, in the step S3, the feature points with the depth value difference smaller than the predetermined threshold are retained. In practical applications, the depth values of the same feature point in different images in D1, D2, and D3 are compared numerically, for example, the difference and/or the squared difference of the depth values obtained by any two interpolation algorithms is calculated to represent the difference of the depth values of the feature points, and if the difference and/or the squared difference is smaller than a predetermined threshold, the feature point is considered as a stable feature point, which is consistent with the depth information of the original image, and is retained.

In an optional embodiment, the step S4 of obtaining a dense depth map according to the retained feature points specifically includes the following steps:

step S41: each remaining feature point constitutes a complete depth map. And (4) forming a complete depth map D (depth map) by using all the feature points reserved in the step (S3) according to the depth information of each feature point.

Step S42: and acquiring an original image corresponding to the sparse depth map, and performing depth compensation on the complete depth map according to the original image to obtain a dense depth map. Specifically, step S42 includes the steps of:

step S421: and processing the original image according to the resolution of the sparse depth map. In practical applications, the resolution of the sparse depth map may be much lower than that of the original image, and therefore, a downsampling operation needs to be performed on the original image to obtain the original image with the same resolution as that of the sparse image.

Step S422: and filtering the complete depth image according to the processed original image. In practical application, the original image with the same resolution is used as a guide image to guide and filter the complete depth map D, for example: the depth map completion can be guided by joint bilateral filtering or guided filtering to obtain an image D'.

Step S423: and carrying out iterative filtering on the filtered complete depth map by adopting a preset filtering algorithm according to the processed original image and the filtered complete depth map to obtain a dense depth map. In practical application, the original image and the D 'with the same resolution are used as a guide image, and the D' is subjected to iterative filtering by adopting combined trilateral filtering to obtain a smoother dense depth map.

Step S424: it is determined whether the resolution of the dense depth map is lower than the resolution of the original image. In practical applications, in order to obtain a dense depth map that restores the original image as much as possible, it is necessary to obtain a dense depth map with the same resolution as that of the original image, and therefore it is necessary to determine whether the dense depth map obtained at present satisfies the resolution of the original image, and if not, step S425 is performed.

Step S425: and when the resolution of the dense depth map is lower than that of the original image, updating the sparse depth map into the dense depth map, returning to the step of respectively performing upsampling on the sparse depth map by adopting at least two preset interpolation algorithms to obtain a middle sparse depth map corresponding to each preset difference algorithm until the resolution of the dense depth map is the same as that of the original image. In practical application, if the resolution of the obtained current dense depth map is lower than that of the original image, the current dense depth map is used as a new sparse depth map, and the step S1 is executed again until the resolution of the obtained dense depth map is the same as that of the original image. Therefore, the accuracy of the dense depth map is guaranteed, and the accuracy of the subsequent three-dimensional reconstruction according to the dense depth map is improved.

Through the steps S1 to S4, in the depth completion method for the sparse depth map provided in the embodiment of the present invention, the sparse depth map is processed by using different interpolation algorithms, the feature points having the depth value difference smaller than the preset threshold are retained by obtaining the depth value of each processed sparse depth map of each feature point in the sparse depth map, and the dense depth map is obtained according to the retained feature points. The method has the characteristics of simplicity and rapidness, can obtain the dense depth map which accurately reflects the pixel depth of the original image, and provides an accurate data base for three-dimensional reconstruction by utilizing the dense depth map.

An embodiment of the present invention further provides a three-dimensional reconstruction method, as shown in fig. 2, the three-dimensional reconstruction method includes:

step S101: at least one set of equivalent binocular image information including a scene target is acquired. In practical application, each set of equivalent binocular image information including scene objects can be obtained by shooting images at two known positions through a camera.

Step S102: and establishing a corresponding sparse depth map according to each group of equivalent binocular image information. Specifically, the sparse depth map is obtained by projecting feature points containing depth information obtained by binocular matching in equivalent binocular image information to each adjacent shooting view angle.

Step S103: by adopting the depth filling method of the sparse depth maps in the embodiment, the depth filling is performed on each sparse depth map to obtain each dense depth map. Specifically, for the process of performing depth filling on the sparse depth map, reference is made to the related description of the depth filling method embodiment of the sparse depth map, and details are not repeated here.

Step S104: and carrying out depth fusion on each dense depth map according to the equivalent binocular image information to obtain a three-dimensional model of the scene target. In practical application, the complete three-dimensional scene related to the scene target can be obtained by converting each dense depth map into a three-dimensional model respectively and then putting each three-dimensional model into the same coordinate system for fusion.

Through the steps S101 to S104, in the three-dimensional reconstruction method provided in the embodiment of the present invention, at least one group of equivalent binocular image information including the scene target is obtained, a corresponding sparse depth map is established according to each group of equivalent binocular image information, then the sparse depth map is processed by using different interpolation algorithms, the depth value of each processed sparse depth map of each feature point in the sparse depth map is obtained, the feature point with the depth value difference smaller than the preset threshold value is reserved, a dense depth map is obtained according to the reserved feature point, and finally, the dense depth map is subjected to depth fusion to obtain the three-dimensional model of the scene target. Therefore, the simple three-dimensional reconstruction method of the scene target is provided, the method can be realized without depending on complex image acquisition equipment and computer equipment, and the three-dimensional reconstruction method has the characteristics of simplicity and high efficiency.

Specifically, in an embodiment, in the step S101, at least one set of equivalent binocular image information including a scene object is obtained. In practical application, each set of equivalent binocular image information comprises a first image and a corresponding first pose acquired at a first preset position and a second image and a corresponding second pose acquired at a second preset position. The method for acquiring each group of equivalent binocular image information specifically comprises the following steps:

step S201: and obtaining the equivalent base line of the current group according to the first preset position and the second preset position of the scene target. In practical applications, the first preset position and the second preset position are respectively represented by a and B, and the equivalent baseline is shown in fig. 3, and the specific process is as follows: the two given positions are respectively a point A and a point B, and specifically, the position information of the point A and the point B can be obtained through calculation according to the size of a camera body and a steering engine control command. The camera can move from the point A to the point B (or from the point B to the point A) in any moving track, the equivalent base line is a connecting line of the point A and the point B, and the length of the equivalent base line can be calculated by the positions of the point A and the point B. The shooting directions of the cameras at the two points A and B are vertical to the equivalent base line, and the shooting directions at the two points A and B are parallel to each other, so that an equivalent binocular system is constructed, and pictures obtained by the cameras at the two points A and B form a group of equivalent binocular images.

Step S202: and determining a first pose and a second pose of the scene target according to the equivalent baseline. In practical application, the poses of the cameras shot at the positions a and B can be obtained through the body size of the camera and a steering engine control command, but the absolute stillness of the cameras cannot be guaranteed in the shooting process, so the poses of the cameras need to be estimated through the obtained feature point information, feature point extraction and feature matching are performed on the images obtained in the step S201, and the parallax and the depth value at the feature points can be calculated. And then carrying out information fusion with the size of the camera body and a steering engine control command to obtain a more accurate camera pose when equivalent binocular images are shot, namely a first pose at the point A and a second pose at the point B.

Step S203: and controlling the image acquisition equipment to acquire a first image at a first preset position according to the first attitude.

Step S204: and controlling the image acquisition equipment to acquire a second image at a second preset position according to the second attitude, wherein the first image and the second image form a group of equivalent binocular images. In practical applications, the camera is a camera capable of capturing images, and the focal length of the lens, the photosensitive unit, and the type of the lens are not limited, and may be, for example, a standard camera, a fisheye camera, or other lenses. Taking a fisheye camera as an example, the fisheye camera can be calibrated by using a camera calibration tool, and then the image shot by the fisheye camera is corrected and then used for collecting an image; an equivalent fisheye binocular system can also be constructed, feature point extraction and matching are carried out through a fisheye binocular matching algorithm, and parallax and depth values are calculated.

In practical application, as shown in fig. 4, taking three positions as an example, the specific process of the method for acquiring each set of equivalent binocular image information is as follows: giving three positions A, B and C, designating a shooting direction for the two positions A and C, designating 2 shooting directions for the position B, and respectively adopting the binocular system constructed in the step A to acquire the three-dimensional coordinates of the characteristic points and the pose of the camera at the positions A, B and C. Therefore, two groups of three-dimensional feature points and poses of the camera at A, B and C can be obtained. It should be noted that the actual moving track of the camera does not need to be a straight line, and may be any track as shown by a solid arrow.

As an alternative embodiment, as shown in fig. 5, taking as an example that a plurality of given positions may form positions of a circular trajectory, the specific process of the method for acquiring each set of equivalent binocular image information is as follows: the camera sequentially passes through the positions A, B, C, \8230;, H in the moving process and then returns to the position A, equivalent binocular images are shot towards the direction perpendicular to the equivalent base line at each position, and therefore the poses of a group of cameras and some feature points with real depth information obtained by binocular matching at each pose are obtained. It should be noted that the actual moving track of the camera does not need to be a straight line, and may be any track as shown by a solid arrow. Because the positions form a ring-shaped track, a pose graph can be established, and then the poses are optimized by adopting graph optimization and other methods, so that the accumulated error is reduced, and a more accurate group of poses is obtained.

Specifically, in an embodiment, the step S102 of establishing a corresponding sparse depth map according to each set of equivalent binocular image information specifically includes the following steps:

step S301: and performing feature matching on the first image and the second image in each group of equivalent binocular image information to obtain a plurality of groups of feature point information. In practical application, the two images shot at the position of the point A and the point B are subjected to binocular feature matching to obtain the information of the feature points of each group of equivalent binocular images.

Step S302: and respectively calculating the parallax information of the characteristic points according to the characteristic point information. Specifically, the step may be calculated by using a method for calculating a parallax in the prior art, and details are not repeated herein.

Step S303: and obtaining the depth information of each characteristic point according to the parallax information and the equivalent base line. Specifically, the depth information of the feature point may be calculated according to the disparity information of each feature point and the equivalent baseline constructed by the two points a and B, and the calculation process of the specific depth value refers to the calculation process of the prior art, which is not described herein again.

Step S304: and obtaining a plurality of characteristic point areas according to the preset coordinate range and the depth information of the current group of characteristic points.

Step S305: the average depth information of each feature point included in the feature point region is calculated.

Step S306: and each feature point region corresponds to one feature point, and the average depth information is determined as the depth information of the feature point corresponding to the feature point region.

Step S307: and establishing a corresponding sparse depth map according to the depth information of each feature point. And projecting the characteristic points of the adjacent view angles on each group of characteristic points to obtain a plurality of sparse depth maps of different view angles.

In practical application, in order to improve the accuracy of the sparse depth map, time-domain stabilization processing needs to be performed on the obtained multiple groups of feature points to obtain the feature points which can represent the depth information of the original image most, and the specific process is that the feature points of each group of images are projected to adjacent shooting poses (visual angles), and a certain intersection often appears between two groups of feature points at the adjacent visual angles. Because of errors in the depth measurements, the feature points in these intersections appear at different locations, even though they are actually the same point. In this regard, we propose to perform time-domain stabilization on neighboring points appearing in a three-dimensional space, and the specific step may be to define a preset small region, where the feature point region formed by the small region is actually the same point seen from the perspective of adjacent sparse depth maps, and this point may have a deviation in position under different sparse depth map perspectives, and the feature point region is a projection region of these points (actually, only one point in a physical space) on the space. Then, processing points in the characteristic point area, finding the most representative point as a representative point, regarding a plurality of characteristic points in the small area as the same characteristic point, and taking the average value of three-dimensional coordinates of the characteristic points as an estimated value of the characteristic point; for example: referring to a feature point stabilizing method in Embedding temporal relationship Recovery for Real-time (IROS 2018), a small area is defined for a neighboring point, a plurality of feature point coordinates located in the small area are arranged into a matrix, each column represents a coordinate of a point, and values of different feature points in each row on the same coordinate axis (for example, a first row is an x coordinate value, a second row is a y coordinate value, and a third row is a z coordinate value). And screening out the characteristic points with larger errors by using a low-rank sparse decomposition method (LRSD) for the matrix, and averaging the three-dimensional coordinates of the characteristic points with smaller errors to obtain an estimated value of the characteristic points. By performing the above processing on all the feature points in the intersection, a set of feature points with stable three-dimensional coordinates can be obtained.

Specifically, in an embodiment, in the step S103, the depth filling method of the sparse depth maps in the embodiment is adopted to perform depth filling on each sparse depth map, so as to obtain each dense depth map. Specifically, the process of performing depth filling on the sparse depth map refers to the related description of the depth filling method embodiment of the sparse depth map, and is not repeated herein.

Specifically, in an embodiment, in the step S104, the dense depth maps are subjected to depth fusion according to the equivalent binocular image information, so as to obtain a three-dimensional model of the scene target. In practical application, the dense depth maps in multiple positions can be placed under a global coordinate system to be fused according to the positions of multiple visual angles contained in all equivalent binocular image information, and a complete three-dimensional scene is obtained. The depth fusion mode is determined by the representation form of the three-dimensional scene, and may be point cloud, gridding model, voxel or volume representation, which respectively correspond to different fusion modes, and the invention is not limited to this time. Two representative fusion modes are listed below:

point cloud based fusion: firstly, converting a dense depth map into three-dimensional point cloud according to camera internal parameters, then placing the point cloud under the same coordinate system according to the pose, drawing a small range for each three-dimensional point of the overlapped part of the point cloud, fusing a plurality of three-dimensional points in the range into a point, and taking the fused point coordinate as the average value of the point coordinate in the range; after all the point clouds are fused, a three-dimensional model represented by the point clouds can be obtained.

Volume-based representation fusion: firstly, a volume area is defined for the whole scene object, the volume area is divided into a plurality of voxels, and a phase Signed Function (TSDF) is defined in the whole volume area. And then, updating the corresponding voxel value according to the depth value of each depth map by using a Kinectfusion method, and obtaining a complete three-dimensional model represented by the volume after updating according to all the depth maps.

The three-dimensional reconstruction method provided by the embodiment of the invention will be illustrated with reference to specific application examples.

Aiming at a desktop human-computer interaction device which is in a shape of a desk lamp and can move a camera according to a control signal, the preset shooting position of the camera adopts a position shown in figure 5 to acquire image information, and then three-dimensional reconstruction is carried out through a computer processor, and the specific implementation process is as follows:

step 1), solving control parameters of each mechanical part of the camera according to a given camera position and a given shooting orientation sequence;

step 2), controlling the mechanical part to move according to the control parameters, and calculating the posture and the equivalent base length of the camera;

step 3), shooting images at a given position by a camera according to a specified orientation, and carrying out binocular matching based on ORB (object-oriented features) on two images forming an equivalent binocular image through an equivalent base line to obtain a sparse depth map;

step 4), calculating the pose of the camera when each image is shot according to the matching points, the size of the camera body and the control parameters;

step 5), repeating the steps 2) -4) until the images at all the positions are used for binocular matching and depth recovery;

step 6) establishing a pose graph according to loop information in the position sequence, and optimizing the estimated pose of the camera by using a G2O solution packet;

step 7), placing the three-dimensional feature points under a global coordinate system according to the optimized pose, and performing time domain stabilization;

step 8), projecting the characteristic points to an image coordinate system of an adjacent shooting angle to generate a multi-view sparse depth map;

step 9), carrying out multi-scale recovery on the depth map by taking the image as a guide to obtain a multi-view dense depth map;

and step 10), carrying out depth fusion based on TSDF on the dense depth map to obtain a dense scene three-dimensional model.

Through the steps S101 to S104, in the three-dimensional reconstruction method provided in the embodiment of the present invention, at least one group of equivalent binocular image information including a scene target is obtained, a corresponding sparse depth map is established according to each group of equivalent binocular image information, then the sparse depth maps are processed by using different interpolation algorithms, a depth value of each processed sparse depth map of each feature point in the sparse depth map is obtained, feature points with a depth value difference smaller than a preset threshold value are reserved, a dense depth map is obtained according to the reserved feature points, and finally, a three-dimensional model of the scene target is obtained by performing depth fusion on the dense depth map. Therefore, the simple three-dimensional reconstruction method of the scene target is provided, the method can be realized without depending on complex image acquisition equipment and computer equipment, and the three-dimensional reconstruction method has the characteristics of simplicity and high efficiency. The three-dimensional reconstruction method provided by the embodiment of the invention only uses a single camera as a sensor, has low requirements on equipment and manufacturing cost, is convenient to miniaturize, determines the three-dimensional reconstruction scale according to the steering engine angle and the size of a machine body, obtains a sparse depth map under multiple visual angles of a scene through a series of equivalent binocular measurements, and obtains a dense three-dimensional model of the shot scene after depth completion and depth fusion. The three-dimensional reconstruction method can be operated on a computer CPU in a single thread real-time manner, has low calculation cost and has good effects in the aspects of measurement precision, equipment cost, calculation resources and the like.

An embodiment of the present invention further provides a depth completion device for a sparse depth map, as shown in fig. 6, the depth completion device for a sparse depth map includes:

the first processing module 1 is configured to process the sparse depth map by using at least two interpolation algorithms respectively to obtain a corresponding processed sparse depth map. For details, refer to the related description of step S1 in the above embodiment.

And the second processing module 2 is configured to obtain a depth value of each processed sparse depth map of each feature point in the sparse depth map. For details, refer to the related description of step S2 in the above embodiment.

And the third processing module 3 is used for reserving the feature points of which the difference of the depth values is smaller than a preset threshold value. For details, refer to the related description of step S3 in the above embodiment.

And the fourth processing module 4 is used for acquiring a dense depth map according to the reserved feature points. For details, refer to the related description of step S4 in the above embodiment.

Through the cooperative cooperation of the above components, the depth completion device for the sparse depth map provided by the embodiment of the present invention processes the sparse depth map by using different interpolation algorithms, retains the feature points whose depth value difference is smaller than the preset threshold value by obtaining the depth value of each feature point in the sparse depth map after each processing, and obtains the dense depth map according to the retained feature points. The method has the characteristics of simplicity and rapidness, can obtain the dense depth map which accurately reflects the pixel depth of the original image, and provides an accurate data base for three-dimensional reconstruction by utilizing the dense depth map.

An embodiment of the present invention further provides a three-dimensional reconstruction apparatus, as shown in fig. 7, the three-dimensional reconstruction apparatus includes:

and the fifth processing module 101 is configured to acquire at least one set of equivalent binocular image information including a scene target. For details, refer to the related description of step S101 in the above embodiment.

And the sixth processing module 102 is configured to establish a corresponding sparse depth map according to each set of equivalent binocular image information. For details, refer to the related description of step S102 in the above embodiment.

A seventh processing module 103, configured to perform depth filling on each sparse depth map by using the depth filling apparatus for sparse depth maps as claimed in claim 10, to obtain each dense depth map. For details, refer to the related description of step S103 in the above embodiment.

And the eighth processing module 104 is configured to perform depth fusion on each dense depth map according to the equivalent binocular image information to obtain a three-dimensional model of the scene target. For details, refer to the related description of step S104 in the above embodiment.

Through the cooperative cooperation of the above components, the three-dimensional reconstruction device provided in the embodiment of the present invention obtains at least one group of equivalent binocular image information including a scene target, establishes a corresponding sparse depth map according to each group of equivalent binocular image information, processes the sparse depth map by using different interpolation algorithms, obtains the depth value of each processed sparse depth map at each feature point in the sparse depth map, retains the feature points whose depth value difference is smaller than a preset threshold value, obtains a dense depth map according to the retained feature points, and finally performs depth fusion on the dense depth map to obtain a three-dimensional model of the scene target. Therefore, the simple three-dimensional reconstruction method of the scene target is provided, the method can be realized without depending on complex image acquisition equipment and computer equipment, and the three-dimensional reconstruction method has the characteristics of simplicity and high efficiency.

An embodiment of the present invention further provides an electronic device, as shown in fig. 8, the electronic device may include a processor 901 and a memory 902, where the processor 901 and the memory 902 may be connected through a bus or in another manner, and fig. 8 takes the connection through the bus as an example.

Processor 901 may be a Central Processing Unit (CPU). Processor 901 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.

The memory 902, which is a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for processing the access device fault in the embodiment of the present invention. The processor 901 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 902, that is, implementing the depth supplementing method of the sparse depth map in the above method embodiment, or implementing the three-dimensional reconstruction method in the above method embodiment.

The memory 902 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 901, and the like. Further, the memory 902 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 902 may optionally include memory located remotely from the processor 901, which may be connected to the processor 901 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory 902, and when executed by the processor 901, perform the depth filling method of the sparse depth map in the above method embodiment, or perform the three-dimensional reconstruction method in the above method embodiment.

The specific details of the electronic device may be understood by referring to the corresponding related descriptions and effects in the above method embodiments, and are not described herein again.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware related to instructions of a computer program, and the program can be stored in a computer readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD), a Solid State Drive (SSD), or the like; the storage medium may also comprise a combination of memories of the kind described above.

Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims

1. A depth complementing method of a sparse depth map, comprising:

processing the sparse depth map by respectively adopting at least two interpolation algorithms to obtain a corresponding processed sparse depth map;

respectively obtaining the depth value of each feature point in the sparse depth map in each processed sparse depth map;

respectively carrying out numerical comparison on depth values of the same feature point in different processed sparse depth maps, and reserving feature points of which the difference of the depth values is smaller than a preset threshold value;

and acquiring a dense depth map according to the reserved feature points.

2. The method of depth patching of a sparse depth map of claim 1, wherein the difference in depth values comprises: a difference in depth values and/or a squared difference in depth values.

3. The depth complementing method of the sparse depth map of claim 1, wherein the obtaining the dense depth map from the retained feature points comprises:

each reserved characteristic point forms a complete depth map;

and acquiring an original image corresponding to the sparse depth map, and performing depth filling on the complete depth map according to the original image to obtain a dense depth map.

4. The depth complementing method of the sparse depth map as claimed in claim 3, wherein said obtaining an original image corresponding to the sparse depth map and performing depth complementing on the complete depth map according to the original image to obtain a dense depth map comprises:

processing the original image according to the resolution of the sparse depth map;

filtering the complete depth map according to the processed original image;

and carrying out iterative filtering on the filtered complete depth map by adopting a preset filtering algorithm according to the processed original image and the filtered complete depth map to obtain the dense depth map.

5. The method of depth completion of a sparse depth map of claim 3, further comprising:

judging whether the resolution of the dense depth map is lower than that of the original image;

and when the resolution of the dense depth map is lower than that of the original image, updating the sparse depth map into the dense depth map, and returning to the step of processing the sparse depth map by adopting at least two interpolation algorithms respectively to obtain a corresponding processed sparse depth map until the resolution of the dense depth map is the same as that of the original image.

6. A method of three-dimensional reconstruction, comprising:

acquiring at least one group of equivalent binocular image information containing a scene target;

establishing a corresponding sparse depth map according to each group of equivalent binocular image information;

depth filling is performed on each of the sparse depth maps by using the depth filling method of the sparse depth maps as claimed in any one of claims 1 to 3 to obtain each dense depth map;

and performing depth fusion on each dense depth map according to the equivalent binocular image information to obtain a three-dimensional model of the scene target.

7. The three-dimensional reconstruction method of claim 6, wherein each set of the equivalent binocular image information comprises a first image and a corresponding first pose acquired at a first preset position and a second image and a corresponding second pose acquired at a second preset position;

the method for acquiring each group of equivalent binocular image information comprises the following steps:

obtaining an equivalent baseline of a current group according to the first preset position and the second preset position of the scene target;

determining the first pose and the second pose of the scene target according to the equivalent baseline;

controlling the image acquisition equipment to acquire the first image at the first preset position according to the first position;

and controlling the image acquisition equipment to acquire the second image at the second preset position according to the second pose, wherein the first image and the second image form a group of equivalent binocular images.

8. The three-dimensional reconstruction method of claim 7, wherein the establishing the corresponding sparse depth map according to each set of equivalent binocular image information comprises:

performing feature matching on the first image and the second image in each group of equivalent binocular image information to obtain multiple groups of feature point information;

respectively calculating parallax information of the feature points according to the feature point information;

obtaining depth information of each feature point according to the parallax information and the equivalent base line;

and establishing the corresponding sparse depth map according to the depth information of each feature point.

9. The three-dimensional reconstruction method according to claim 8, wherein after obtaining the depth information of each feature point according to the disparity information and the equivalent baseline, before establishing the corresponding sparse depth map according to the depth information of each feature point, the three-dimensional reconstruction method further comprises:

obtaining a plurality of characteristic point areas according to a preset coordinate range and the depth information of the characteristic points of the current group;

respectively calculating the average depth information of each feature point contained in the feature point region;

and each feature point region corresponds to one feature point, and the average depth information is determined as the depth information of the feature point corresponding to the feature point region.

10. A depth completion apparatus for a sparse depth map, comprising:

the first processing module is used for processing the sparse depth map by adopting at least two interpolation algorithms respectively to obtain a corresponding processed sparse depth map;

the second processing module is used for respectively acquiring the depth value of each feature point in the sparse depth map in each processed sparse depth map;

the third processing module is used for respectively carrying out numerical comparison on the depth values of the same feature point in different processed sparse depth maps and reserving the feature points of which the difference of the depth values is smaller than a preset threshold value;

and the fourth processing module is used for acquiring a dense depth map according to the reserved feature points.

11. A three-dimensional reconstruction apparatus, comprising:

the fifth processing module is used for acquiring at least one group of equivalent binocular image information containing scene targets;

the sixth processing module is used for establishing a corresponding sparse depth map according to each group of equivalent binocular image information;

a seventh processing module, configured to perform depth filling on each sparse depth map by using the depth filling apparatus for sparse depth maps according to claim 10, to obtain each dense depth map;

and the eighth processing module is used for carrying out depth fusion on each dense depth map according to the equivalent binocular image information to obtain a three-dimensional model of the scene target.

12. An electronic device, comprising:

a memory and a processor, the memory and the processor being communicatively connected to each other, the memory having stored therein computer instructions, and the processor executing the computer instructions to perform the method for depth completion of a sparse depth map according to any one of claims 1 to 5, or to perform the method for three-dimensional reconstruction according to any one of claims 6 to 9.

13. A computer-readable storage medium storing computer instructions for causing a computer to perform the method for depth completion of a sparse depth map as claimed in any one of claims 1 to 5 or the method for three-dimensional reconstruction as claimed in any one of claims 6 to 9.