CN106780592B - Kinect depth reconstruction method based on camera motion and image shading - Google Patents
Kinect depth reconstruction method based on camera motion and image shading Download PDFInfo
- Publication number
- CN106780592B CN106780592B CN201611061364.6A CN201611061364A CN106780592B CN 106780592 B CN106780592 B CN 106780592B CN 201611061364 A CN201611061364 A CN 201611061364A CN 106780592 B CN106780592 B CN 106780592B
- Authority
- CN
- China
- Prior art keywords
- depth
- pixel
- point
- camera
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000005286 illumination Methods 0.000 claims abstract description 34
- 239000011159 matrix material Substances 0.000 claims description 21
- 238000013519 translation Methods 0.000 claims description 14
- 238000005457 optimization Methods 0.000 claims description 10
- 238000000354 decomposition reaction Methods 0.000 claims description 8
- 230000001133 acceleration Effects 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000002146 bilateral effect Effects 0.000 claims description 3
- 238000012216 screening Methods 0.000 claims description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000003190 augmentative effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
Landscapes
- Image Processing (AREA)
Abstract
The invention discloses a Kinect depth reconstruction method based on camera motion and image shading, which comprises the following steps of: 1) under the condition that the Kinect depth camera and the RGB camera are calibrated and aligned, uploading data collected by the Kinect to a computer through a third-party interface; 2) recovering a three-dimensional scene structure and a motion track of a kinect RGB camera from an RGB video sequence to obtain a point cloud and camera motion relation; 3) and (3) reconstructing the image depth by combining the point cloud obtained in the step 2) and the camera motion relation and utilizing the light and shade condition information of the image. The method does not need to physically improve the depth camera, does not need to design complex device combination, does not need complex and harsh illumination calibration steps which are usually used in the traditional depth reconstruction method and only can be limited under laboratory conditions without practical application value, and has greater practical application value and significance compared with the traditional method.
Description
Technical Field
The invention relates to the field of depth reconstruction in computer image processing, in particular to a Kinect depth reconstruction method based on camera motion and image shading.
Background
With the advent and popularization of some civil depth cameras with relatively low prices in recent years, such as Microsoft Kinect, application Xtion Pro and the like, depth information is widely applied to various fields of motion sensing games, real-time three-dimensional reconstruction, augmented reality, virtual reality and the like, and the application of the depth information becomes an important support for the development of a novel human-computer interaction mode. However, most of the civil depth cameras which are popular in the market at present have the problems of insufficient depth detection precision and too large interference noise, and the quality of an application product based on depth information is seriously influenced. Therefore, how to acquire more accurate depth information is of great significance to applications developed based on the depth information.
Due to the above requirements, depth reconstruction algorithms are receiving more and more attention from academia and industry. At present, a novel method is used for reconstructing a depth map by combining the idea of three-dimensional reconstruction in computer graphics to assist the depth map, and the idea is also used in the patent. At present, the main methods in the aspect of three-dimensional reconstruction include recovering a three-dimensional scene structure from motion information, reconstructing an object shape from the light and shade conditions of an image, a photometric stereo method and the like. The patent mainly utilizes two methods of recovering a three-dimensional scene structure from motion information and reconstructing an object shape from the light and shade conditions of an image.
The method for recovering the three-dimensional scene structure from the motion information mainly utilizes the motion process of a camera to dynamically generate and correct a three-dimensional point cloud, and the typical representation of the method is a monocular camera-based SLAM system. The method for reconstructing the shape of the object from the brightness of the image is to establish an effective illumination model by using the brightness of the image and solve the illumination model by using an optimization method, so that the surface shape information of the target can be acquired.
By utilizing and improving the ideas of the two methods and utilizing the close relation between the depth map and the point cloud, the depth map can be effectively optimized and reconstructed, and a more accurate depth result is obtained.
Disclosure of Invention
The invention aims to overcome the defect of insufficient depth detection precision of the existing civil depth camera, and provides a Kinect depth reconstruction method based on camera motion and image shading.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: the Kinect depth reconstruction method based on camera motion and image shading comprises the following steps:
1) under the condition that the Kinect depth camera and the RGB camera are calibrated and aligned, uploading data collected by the Kinect to a computer through a third-party interface;
2) recovering a three-dimensional scene structure and a motion track of a kinect RGB camera from an RGB video sequence to obtain a point cloud and camera motion relation;
3) and (3) reconstructing the image depth by combining the point cloud obtained in the step 2) and the camera motion relation and utilizing the light and shade condition information of the image.
The step 2) comprises the following steps:
2.1) reading an RGB picture as a key frame when a system is initialized, binding a depth map to the key frame, traversing the depth map, and assigning a random value to each pixel position, wherein the depth map and the gray map have the same dimensionality;
2.2) for each read RGB picture, the following cost function is constructed:
wherein | · | purpleδIs the habo operator, rpWhich represents an error in the form of,represents the variance of the error;
the definition of the habo operator is as follows:
δ is a parameter of the habo operator;
definition of the error function rpThe following were used:
rp(p,ζji)=Ii(p)-Ij(w(p,Di(p),ζji))
Ii(p) the gray value, ζ, representing the position of the pixel p in the current framejiIndicating rotational translation of a three-dimensional point in the i coordinate to just under the j coordinateLie algebra of volume transformations, Di(p) denotes a depth value of a position corresponding to the pixel p in the depth map of the reference frame, w (p, D)i(p),ζji) And the three-dimensional point corresponding to the position of the pixel p in the reference frame i is transformed to the position of j in the current frame through a rotational translation rigid body, wherein the transformation formula is as follows:
wherein X, Y and Z respectively represent the coordinates of a three-dimensional point in an XYZ three-direction under a camera coordinate system; u, v represent pixel coordinates; f. ofx,fyRespectively representing the focal lengths in the X and Y directions;
whereinVariance, V, representing picture grayi(p) represents the variance of pixel point p of the reference frame depth map;
2.3) solving zeta of the minimum cost function in the step 2.2) by a Gauss-Newton iteration method to obtain a rotation translation relation between the reference frame and the current frame;
2.4) solving the gradient of all points on the reference frame gray level image, and selecting the points of which the gradient is greater than a threshold; then, screening the points; traversing all the points meeting the requirements, and searching the corresponding points of the points on the epipolar line of the current frame according to the epipolar line set; calculating the space coordinates of the points according to the monocular vision three-dimensional reconstruction geometric knowledge;
2.5) fusing the newly obtained depth value with the depth value in the depth map of the reference frame by using a Kalman filter.
The step 3) comprises the following steps:
3.1) aligning the depth image collected by the depth camera under the current frame with the color image collected by the monocular color camera; because the difference of the field of view ranges exists between the color camera and the depth camera, only the overlapped part of the field of view of the color camera and the depth camera has effective depth values, and an incomplete depth map is obtained after alignment;
3.2) generating a three-dimensional point cloud according to the incomplete depth map according to a model of a pinhole camera in the depth camera; the pinhole camera model is briefly described as follows: the relationship between the spatial coordinates [ x, y, z ] of a spatial point and its pixel coordinates [ u, v, d ] in the image is expressed as:
z=d/s
x=(u-cx)·z/fx
y=(v-cy)·z/fy
where d is the depth value of each pixel in the depth map, s is the scaling factor of the depth map, cxAnd cyIs the abscissa and ordinate of the principal point, fxAnd fyIs the focal length component in the abscissa direction and the ordinate direction;
converting the pixel coordinate of each pixel into a corresponding space coordinate by using the formula, and then completing the conversion from the depth map to the three-dimensional point cloud;
3.3) registering the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map by using a point-to-point iterative nearest neighbor point ICP algorithm to obtain a rotation matrix R and a translation matrix T between the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map;
3.4) converting the point cloud obtained by the monocular algorithm to a coordinate system of the point cloud generated by the incomplete depth map according to the obtained rotation matrix and translation matrix, and splicing the point cloud and the point cloud into a large point cloud;
3.5) for the invalid area of the depth value formed by the non-overlapping field of view in the depth map, calculating the spatial position of each pixel in the invalid area; if the spatial point position corresponding to the pixel is just coincident with the spatial position of a certain point in the large point cloud, directly endowing the z coordinate of the point, namely the depth value, to the pixel as the depth value; if the space point corresponding to the pixel does not coincide with the point in the point cloud, calculating the average value of the sum of the distances between the space point and the cloud point of the adjacent point in the large point cloud, and if the value is greater than a certain threshold value, taking the value as the depth value of the pixel value;
3.6) detecting whether pixel points without effective values exist in the depth map; if the pixel points which do not have the effective values exist, the depth value filling is carried out on the depth map by using the combined bilateral filter, and each pixel point in the depth map is ensured to have a depth value;
3.7) using the extended intrinsic image decomposition model function with the normal vector of each pixel point as a variable as an illumination model function of each pixel point; the extended intrinsic image decomposition model function used is:
3.8) calculating shading information for each pixel in the image; the shading information function is expressed by using a matrix form of a linear polynomial of zero-order and first-order spherical harmonic coefficients and a point cloud surface normal vector, namely:
firstly, calculating a normal vector of each point, and then solving a parameter vector of a light and shade function through a target function minimizing a difference value between the light and shade function and the real illumination intensity so as to determine a light and shade function value for each pixel point;
3.9) calculating albedo for each pixel in the image; since the shading information function only considers distant light sources and ambient light sources, it is a preliminary prediction of illumination, and thus it is necessary to introduce a different albedo for each pixel in order to take into account the effects caused by specular reflection, shadows and low-beam sources;
the minimization objective function is constructed as:
where ρ is the albedo of each pixel, I is the illumination intensity of each pixel, N is a neighborhood of the pixel being operated on in the full-image iteration operation, ρkIs the pixel point in the neighborhood, λρIs a parameter and, moreover,
3.10) calculating a value of the illumination difference for each pixel in the image due to the local illumination difference;
the minimization objective function is constructed as:
wherein β is the light difference value of each pixel point, βkIs the pixel point in a neighborhood of the pixel being operated in the full image iteration operation;is a parameter;
3.11) constructing an objective function between the light and shade model and the actually measured illumination intensity, and minimizing the objective function by using an improved depth enhancement acceleration algorithm so as to obtain an optimized depth map;
the normal of the point on the point cloud corresponding to each pixel is first represented in the form of the gradient of the depth map, i.e.:
wherein,
then, an objective function of depth optimization is established as follows:
wherein,
the depth is then iteratively optimized using a depth-enhanced acceleration algorithm, as follows:
① input initial depth map, and spherical harmonic coefficientAnd a vectorized albedo vector ρ and a vector of illumination difference values β;
② when the depth optimization objective function value is always in a reduced state, steps ③ to ⑤ are executed in a loop;
⑤ update zkSo that f (z)k) Smaller;
after the step is finished, the step 2.1) to the step 2.5) are operated in the monocular algorithm, and the program is executed until the user is detected to execute the operation of stopping the method operation.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the method can well perfect the Kinect depth sensor to obtain the depth map, and solves the problems that only the depth sensor collects data, and the depth value is inaccurate.
2. The constraint of the Kinect equipment on the depth value range of the measured object is solved, and the range of the measurable depth is widened.
3. The invention uses a specially designed monocular distance measurement point screening method, so that the depth precision is far higher than that of the common method.
4. The Kinect can adapt to a complex illumination conversion environment, and meanwhile, the problem that the Kinect is not suitable for outdoor use is solved.
5. The invention uses the illumination model of the surface normal vector combined with the three-dimensional representation when calculating the global brightness of the object, and better describes the global illumination.
6. The invention considers and utilizes the global illumination effect and the local illumination effect when the light irradiates on the object under the real illumination environment, and has robustness and practical significance for processing the depth reconstruction under different illumination.
7. The invention uses the depth enhancement acceleration algorithm to carry out optimization on the depth optimization step, thereby greatly reducing the calculated amount of the optimization step and the running time of the method.
Detailed Description
The present invention will be further described with reference to the following specific examples.
The Kinect depth reconstruction method based on camera motion and image shading comprises the following steps:
1) under the condition that the Kinect depth camera and the RGB camera are aligned in a calibration mode, data collected by the Kinect are uploaded to a computer through a third-party interface.
2.1) reading an RGB picture as a key frame when a system is initialized, binding a depth map to the key frame, traversing the depth map, and assigning a random value to each pixel position, wherein the depth map and the gray map have the same dimensionality;
2.2) for each read RGB picture, the following cost function is constructed:
wherein | · | purpleδIs the habo operator, rpWhich represents an error in the form of,representing the variance of the error.
The definition of the habo operator is as follows:
δ is a parameter of the habo operator.
Definition of the error function rpThe following were used:
rp(p,ζji)=Ii(p)-Ij(w(p,Di(p),ζji))
Ii(p) the gray value, ζ, representing the position of the pixel p in the current framejiLie algebra, D, representing a rigid body transformation that rotationally translates a three-dimensional point in the i-coordinate to the j-coordinatei(p) denotes a depth value of a position corresponding to the pixel p in the depth map of the reference frame, w (p, D)i(p),ζji) And the three-dimensional point corresponding to the position of the pixel p in the reference frame i is transformed to the position of j in the current frame through a rotational translation rigid body, wherein the transformation formula is as follows:
wherein X, Y and Z respectively represent the coordinates of three-dimensional points in the XYZ three directions under the camera coordinate system, u and v represent pixel coordinates, and fx,fyRespectively, the focal lengths in the X and Y directions.
whereinVariance, V, representing picture grayi(p) represents the variance of a pixel point p of the reference frame depth map.
2.3) solving zeta of the minimum cost function in the step 2.2) by a Gauss-Newton iteration method to obtain the rotation translation relation between the reference frame and the current frame.
2.4) calculating the gradient of all points on the gray level image of the reference frame, and selecting the points of which the gradient is greater than a threshold. These spots were then screened. The unit gradient direction g and gradient size g0 of the corresponding point are calculated. Calculating the inner product of g and l, and calculating
Traversing all the points meeting the requirements, and according to the epipolar geometry:
E=[t]xR
l=Fx
and calculating the position of the epipolar line where the point requiring solution depth in the current frame is located. Wherein K1Is the internal reference matrix of the RGB camera, t and R are the form of the algebraic decomposition of ζ lie in step 2.2) into a rotational translation. l is the corresponding epipolar line in the current frame, and x is the coordinate of the pixel point requiring solution depth in the reference frame.
The corresponding points of these points on the epipolar line of the current frame are searched. And calculating the space coordinates of the points according to the monocular vision three-dimensional reconstruction geometric knowledge:
according to
x(p3TX)-(p1TX)=0
y(p3TX)-(p2TX)=0
x(p2TX)-y(p1TX) 0 form a matrix with AX 0 is formed, where P is the product of the rotational-translational matrix and the internal reference matrix, PiTIs the ith row of the P matrix and X is the three-dimensional coordinate of the solution point. And x and y are coordinates of a corresponding point in the current frame in the horizontal axis direction and the horizontal axis direction. For the decomposition of the A matrix extreme svd, the eigenvector with the smallest eigenvalue is the spatial coordinate of the point.
2.5) fusing the newly obtained depth value with the depth value in the depth map of the reference frame by using a Kalman filter.
The updated depth value is:
the variance of the update depth is:
wherein d isoIs the original depth in the depth map, dpIs the newly calculated depth, σoIs the depth variance, σ, maintained in the depth mappThe depth variance is calculated newly.
And 3.1) aligning the depth image collected by the depth camera under the current frame with the color image collected by the monocular color camera. Because the field of view range difference exists between the color camera and the depth camera, only the overlapped part of the two fields of view has effective depth value, and therefore, an incomplete depth map is obtained after alignment.
And 3.2) generating a three-dimensional point cloud from the incomplete depth map according to a model of a pinhole camera in the depth camera. The pinhole camera model can be briefly described as follows, and the relationship between the spatial coordinates [ x, y, z ] of a spatial point and its pixel coordinates [ u, v, d ] in the image can be expressed as:
z=d/s
x=(u-cx)·z/fx
y=(v-cy)·z/fy
where d is the depth value of each pixel in the depth map, s is the scaling factor of the depth map, cxAnd cyIs the abscissa and ordinate of the principal point, fxAnd fyAre the focal length components in the abscissa direction and the ordinate direction.
The pixel coordinates of each pixel are converted into corresponding space coordinates by using the formula, and the conversion from the depth map to the three-dimensional point cloud can be completed.
3.3) registering the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map by using a point-to-point iterative nearest point (ICP) algorithm to obtain a rotation matrix R and a translation matrix T between the point cloud generated in the monocular algorithm and the point cloud generated in the incomplete depth map.
And 3.4) converting the point cloud obtained by the monocular algorithm to a coordinate system of the point cloud generated by the incomplete depth map according to the obtained rotation matrix and translation matrix, and splicing the point cloud and the point cloud into a large point cloud.
3.5) for depth value invalid regions in the depth map formed due to non-overlapping field of view ranges, we calculate their spatial position for each pixel that is in the invalid region. If the spatial point position corresponding to the pixel is just coincident with the spatial position of a certain point in the large point cloud, the z coordinate of the point, namely the depth value, is directly given to the pixel as the depth value. If the space point corresponding to the pixel does not coincide with a point in the point cloud, the average value of the sum of the distances between the space point and the cloud point of the adjacent point in the large point cloud is calculated, and if the value is larger than a certain threshold value, the value is taken as the depth value of the pixel value.
3.6) detecting whether pixel points without valid values exist in the depth map. If the pixel points without effective values still exist, the depth value filling is carried out on the depth map by using the combined bilateral filter, and each pixel point in the depth map is ensured to have a depth value.
3.7) using the extended intrinsic image decomposition model function with the normal vector for each pixel point as a variable as the illumination model function for each pixel point. The extended intrinsic image decomposition model function used is:
3.8) calculate shading information for each pixel in the image. The shading information function is expressed by using a matrix form of a linear polynomial of zero-order and first-order spherical harmonic coefficients and a point cloud surface normal vector. That is:
firstly, a normal vector of each point is calculated, and then a parameter vector of the light and shade function is solved through a target function which minimizes the difference between the light and shade function and the real illumination intensity, so that a light and shade function value is determined for each pixel point.
3.9) calculate the albedo for each pixel in the image. Since the shading information function only considers distant and ambient light sources, it is a preliminary prediction of the illumination, and thus a different albedo needs to be introduced for each pixel in order to take into account the effects caused by specular reflection, shadows and low-beam sources.
The minimization objective function is constructed as:
where ρ is the albedo of each pixel, I is the illumination intensity of each pixel, N is a neighborhood of the pixel being operated on in the full-image iteration operation, ρkIs the pixel point in the neighborhood, λρIs a parameter and, moreover,
3.10) calculate for each pixel in the image the illumination difference value due to the local illumination difference.
The minimization objective function is constructed as:
wherein β is the light difference value of each pixel point, βkIs the pixel point in a neighborhood of the pixel being operated on in the full graph iteration operation,is a parameter.
3.11) constructing an objective function between the light and shade model and the actually measured illumination intensity, and minimizing the objective function by using an improved depth enhancement acceleration algorithm, thereby obtaining an optimized depth map.
The normal of the point on the point cloud corresponding to each pixel is first represented in the form of the gradient of the depth map, i.e.:
wherein,
Then, an objective function of depth optimization is established as follows:
wherein,
Then, the depth is optimized iteratively by using a depth enhancement acceleration algorithm:
① input initial depth map, and spherical harmonic coefficientAnd a vectorized albedo vector p and a vector of illumination difference values β.
② steps ③ through ⑤ are executed in a loop when the depth optimization objective function value is always in a reduced state.
⑤ update zkSo that f (z)k) And is smaller.
After the step is finished, returning to the step 2.1) to the step 2.5) of the operation phase in the monocular algorithm, and executing the program until detecting that the user executes the operation of stopping the method operation.
The above-mentioned embodiments are merely preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, so that variations based on the shape and principle of the present invention should be covered within the scope of the present invention.
Claims (2)
1. The Kinect depth reconstruction method based on camera motion and image shading is characterized by comprising the following steps of:
1) under the condition that the Kinect depth camera and the RGB camera are calibrated and aligned, uploading data collected by the Kinect to a computer through a third-party interface;
2) recovering a three-dimensional scene structure and a motion track of a kinect RGB camera from an RGB video sequence to obtain a point cloud and camera motion relation; the method comprises the following steps:
2.1) reading an RGB picture as a key frame when a system is initialized, binding a depth map to the key frame, traversing the depth map, and assigning a random value to each pixel position, wherein the depth map and the gray map have the same dimensionality;
2.2) for each read RGB picture, the following cost function is constructed:
wherein | · | purpleδIs the habo operator, rpWhich represents an error in the form of,represents the variance of the error;
the definition of the habo operator is as follows:
δ is a parameter of the habo operator;
definition of the error function rpThe following were used:
rp(p,ζji)=Ii(p)-Ij(w(p,Di(p),ζji))
Ii(p) the gray value, ζ, representing the position of the pixel p in the current framejiLie algebra, D, representing a rigid body transformation that rotationally translates a three-dimensional point in the i-coordinate to the j-coordinatei(p) denotes a depth value of a position corresponding to the pixel p in the depth map of the reference frame, w (p, D)i(p),ζji) And the three-dimensional point corresponding to the position of the pixel p in the reference frame i is transformed to the position of j in the current frame through a rotational translation rigid body, wherein the transformation formula is as follows:
wherein X, Y and Z respectively represent the coordinates of a three-dimensional point in an XYZ three-direction under a camera coordinate system; u, v represent pixel coordinates; f. ofx,fyRespectively representing the focal lengths in the X and Y directions;
whereinVariance, V, representing picture grayi(p) represents the variance of pixel point p of the reference frame depth map;
2.3) solving zeta of the minimum cost function in the step 2.2) by a Gauss-Newton iteration method to obtain a rotation translation relation between the reference frame and the current frame;
2.4) solving the gradient of all points on the reference frame gray level image, and selecting the points of which the gradient is greater than a threshold; then, screening the points; traversing all the points meeting the requirements, and searching the corresponding points of the points on the epipolar line of the current frame according to the epipolar line set; calculating the space coordinates of the points according to the monocular vision three-dimensional reconstruction geometric knowledge;
2.5) fusing the newly obtained depth value with the depth value in the depth image of the reference frame by using a Kalman filter;
3) and (3) reconstructing the image depth by combining the point cloud obtained in the step 2) and the camera motion relation and utilizing the light and shade condition information of the image.
2. The Kinect depth reconstruction method based on camera motion and image shading as claimed in claim 1, wherein the step 3) comprises the steps of:
3.1) aligning the depth image collected by the depth camera under the current frame with the color image collected by the monocular color camera; because the difference of the field of view ranges exists between the color camera and the depth camera, only the overlapped part of the field of view of the color camera and the depth camera has effective depth values, and an incomplete depth map is obtained after alignment;
3.2) generating a three-dimensional point cloud according to the incomplete depth map according to a model of a pinhole camera in the depth camera; the pinhole camera model is briefly described as follows: the relationship between the spatial coordinates [ x, y, z ] of a spatial point and its pixel coordinates [ u, v, d ] in the image is expressed as:
z=d/s
x=(u-cx)·z/fx
y=(v-cy)·z/fy
where d is per in the depth mapDepth value of a pixel, s is the scaling factor of the depth map, cxAnd cyIs the abscissa and ordinate of the principal point, fxAnd fyIs the focal length component in the abscissa direction and the ordinate direction;
converting the pixel coordinate of each pixel into a corresponding space coordinate by using the formula, and then completing the conversion from the depth map to the three-dimensional point cloud;
3.3) registering the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map by using a point-to-point iterative nearest neighbor point ICP algorithm to obtain a rotation matrix R and a translation matrix T between the point cloud generated in the monocular algorithm and the point cloud generated by the incomplete depth map;
3.4) converting the point cloud obtained by the monocular algorithm to a coordinate system of the point cloud generated by the incomplete depth map according to the obtained rotation matrix and translation matrix, and splicing the point cloud and the point cloud into a large point cloud;
3.5) for the invalid area of the depth value formed by the non-overlapping field of view in the depth map, calculating the spatial position of each pixel in the invalid area; if the spatial point position corresponding to the pixel is just coincident with the spatial position of a certain point in the large point cloud, directly endowing the z coordinate of the point, namely the depth value, to the pixel as the depth value; if the space point corresponding to the pixel does not coincide with the point in the point cloud, calculating the average value of the sum of the distances between the space point and the cloud point of the adjacent point in the large point cloud, and if the value is greater than a certain threshold value, taking the value as the depth value of the pixel value;
3.6) detecting whether pixel points without effective values exist in the depth map; if the pixel points which do not have the effective values exist, the depth value filling is carried out on the depth map by using the combined bilateral filter, and each pixel point in the depth map is ensured to have a depth value;
3.7) using the extended intrinsic image decomposition model function with the normal vector of each pixel point as a variable as an illumination model function of each pixel point; the extended intrinsic image decomposition model function used is:
3.8) calculating shading information for each pixel in the image; the shading information function is expressed by using a matrix form of a linear polynomial of zero-order and first-order spherical harmonic coefficients and a point cloud surface normal vector, namely:
firstly, calculating a normal vector of each point, and then solving a parameter vector of a light and shade function through a target function minimizing a difference value between the light and shade function and the real illumination intensity so as to determine a light and shade function value for each pixel point;
3.9) calculating albedo for each pixel in the image; since the shading information function only considers distant light sources and ambient light sources, it is a preliminary prediction of illumination, and thus it is necessary to introduce a different albedo for each pixel in order to take into account the effects caused by specular reflection, shadows and low-beam sources;
the minimization objective function is constructed as:
where ρ is the albedo of each pixel, I is the illumination intensity of each pixel, N is a neighborhood of the pixel being operated on in the full-image iteration operation, ρkIs the pixel point in the neighborhood, λρIs a parameter and, moreover,
3.10) calculating a value of the illumination difference for each pixel in the image due to the local illumination difference;
the minimization objective function is constructed as:
wherein β is the light difference value of each pixel point, βkIs the pixel point in a neighborhood of the pixel being operated in the full image iteration operation;is a parameter;
3.11) constructing an objective function between the light and shade model and the actually measured illumination intensity, and minimizing the objective function by using an improved depth enhancement acceleration algorithm so as to obtain an optimized depth map;
the normal of the point on the point cloud corresponding to each pixel is first represented in the form of the gradient of the depth map, i.e.:
wherein,
then, an objective function of depth optimization is established as follows:
wherein,
the depth is then iteratively optimized using a depth-enhanced acceleration algorithm, as follows:
① input initial depth map, and spherical harmonic coefficientAnd a vectorized albedo vector ρ and a vector of illumination difference values β;
② when the depth optimization objective function value is always in a reduced state, steps ③ to ⑤ are executed in a loop;
⑤ update zkSo that f (z)k) Smaller;
after the step is finished, the step 2.1) to the step 2.5) are operated in the monocular algorithm, and the program is executed until the user is detected to execute the operation of stopping the method operation.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2016105115439 | 2016-06-30 | ||
CN201610511543 | 2016-06-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106780592A CN106780592A (en) | 2017-05-31 |
CN106780592B true CN106780592B (en) | 2020-05-22 |
Family
ID=58910978
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611061364.6A Active CN106780592B (en) | 2016-06-30 | 2016-11-28 | Kinect depth reconstruction method based on camera motion and image shading |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106780592B (en) |
Families Citing this family (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107169475B (en) * | 2017-06-19 | 2019-11-19 | 电子科技大学 | A kind of face three-dimensional point cloud optimized treatment method based on kinect camera |
CN109708636B (en) * | 2017-10-26 | 2021-05-14 | 广州极飞科技股份有限公司 | Navigation chart configuration method, obstacle avoidance method and device, terminal and unmanned aerial vehicle |
CN107845134B (en) * | 2017-11-10 | 2020-12-29 | 浙江大学 | Three-dimensional reconstruction method of single object based on color depth camera |
US10529086B2 (en) * | 2017-11-22 | 2020-01-07 | Futurewei Technologies, Inc. | Three-dimensional (3D) reconstructions of dynamic scenes using a reconfigurable hybrid imaging system |
CN108151728A (en) * | 2017-12-06 | 2018-06-12 | 华南理工大学 | A kind of half dense cognitive map creation method for binocular SLAM |
CN108053445A (en) * | 2017-12-08 | 2018-05-18 | 中南大学 | The RGB-D camera motion methods of estimation of Fusion Features |
CN108230381B (en) * | 2018-01-17 | 2020-05-19 | 华中科技大学 | Multi-view stereoscopic vision method combining space propagation and pixel level optimization |
CN108447060B (en) * | 2018-01-29 | 2021-07-09 | 上海数迹智能科技有限公司 | Foreground and background separation method based on RGB-D image and foreground and background separation device thereof |
CN108335325A (en) * | 2018-01-30 | 2018-07-27 | 上海数迹智能科技有限公司 | A kind of cube method for fast measuring based on depth camera data |
CN108470323B (en) * | 2018-03-13 | 2020-07-31 | 京东方科技集团股份有限公司 | Image splicing method, computer equipment and display device |
CN108830925B (en) * | 2018-05-08 | 2020-09-15 | 中德(珠海)人工智能研究院有限公司 | Three-dimensional digital modeling method based on spherical screen video stream |
CN109255819B (en) * | 2018-08-14 | 2020-10-13 | 清华大学 | Kinect calibration method and device based on plane mirror |
CN109579731B (en) * | 2018-11-28 | 2019-12-24 | 华中科技大学 | Method for performing three-dimensional surface topography measurement based on image fusion |
CN109657580B (en) * | 2018-12-07 | 2023-06-16 | 南京高美吉交通科技有限公司 | Urban rail transit gate traffic control method |
CN109727282A (en) * | 2018-12-27 | 2019-05-07 | 南京埃克里得视觉技术有限公司 | A kind of Scale invariant depth map mapping method of 3-D image |
CN109872355B (en) * | 2019-01-25 | 2022-12-02 | 合肥哈工仞极智能科技有限公司 | Shortest distance acquisition method and device based on depth camera |
CN109949397A (en) * | 2019-03-29 | 2019-06-28 | 哈尔滨理工大学 | A kind of depth map reconstruction method of combination laser point and average drifting |
US10510155B1 (en) * | 2019-06-11 | 2019-12-17 | Mujin, Inc. | Method and processing system for updating a first image generated by a first camera based on a second image generated by a second camera |
CN110455815B (en) * | 2019-09-05 | 2023-03-24 | 西安多维机器视觉检测技术有限公司 | Method and system for detecting appearance defects of electronic components |
CN111223053A (en) * | 2019-11-18 | 2020-06-02 | 北京邮电大学 | Data enhancement method based on depth image |
CN111402392A (en) * | 2020-01-06 | 2020-07-10 | 香港光云科技有限公司 | Illumination model calculation method, material parameter processing method and material parameter processing device |
CN111400869B (en) * | 2020-02-25 | 2022-07-26 | 华南理工大学 | Reactor core neutron flux space-time evolution prediction method, device, medium and equipment |
CN113085896B (en) * | 2021-04-19 | 2022-10-04 | 暨南大学 | Auxiliary automatic driving system and method for modern rail cleaning vehicle |
CN114067206B (en) * | 2021-11-16 | 2024-04-26 | 哈尔滨理工大学 | Spherical fruit identification positioning method based on depth image |
CN114612534B (en) * | 2022-03-17 | 2023-04-07 | 南京航空航天大学 | Whole-machine point cloud registration method of multi-view aircraft based on spherical harmonic characteristics |
CN115375827B (en) * | 2022-07-21 | 2023-09-15 | 荣耀终端有限公司 | Illumination estimation method and electronic equipment |
CN115049813B (en) * | 2022-08-17 | 2022-11-01 | 南京航空航天大学 | Coarse registration method, device and system based on first-order spherical harmonics |
CN116758170B (en) * | 2023-08-15 | 2023-12-22 | 北京市农林科学院智能装备技术研究中心 | Multi-camera rapid calibration method for livestock and poultry phenotype 3D reconstruction and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103400409A (en) * | 2013-08-27 | 2013-11-20 | 华中师范大学 | 3D (three-dimensional) visualization method for coverage range based on quick estimation of attitude of camera |
-
2016
- 2016-11-28 CN CN201611061364.6A patent/CN106780592B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103400409A (en) * | 2013-08-27 | 2013-11-20 | 华中师范大学 | 3D (three-dimensional) visualization method for coverage range based on quick estimation of attitude of camera |
Non-Patent Citations (1)
Title |
---|
"基于Kinect深度图像的三维重建";李务军等;《微型机与应用》;20160331;第35卷(第5期);第55-57页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106780592A (en) | 2017-05-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106780592B (en) | Kinect depth reconstruction method based on camera motion and image shading | |
CN107767442B (en) | Foot type three-dimensional reconstruction and measurement method based on Kinect and binocular vision | |
Delaunoy et al. | Photometric bundle adjustment for dense multi-view 3d modeling | |
KR102647351B1 (en) | Modeling method and modeling apparatus using 3d point cloud | |
US9245344B2 (en) | Method and apparatus for acquiring geometry of specular object based on depth sensor | |
CN107240129A (en) | Object and indoor small scene based on RGB D camera datas recover and modeling method | |
González-Aguilera et al. | An automatic procedure for co-registration of terrestrial laser scanners and digital cameras | |
US11568601B2 (en) | Real-time hand modeling and tracking using convolution models | |
Matsuki et al. | Codemapping: Real-time dense mapping for sparse slam using compact scene representations | |
Pan et al. | Dense 3D reconstruction combining depth and RGB information | |
JP5484133B2 (en) | Method for estimating the 3D pose of a specular object | |
CN115345822A (en) | Automatic three-dimensional detection method for surface structure light of aviation complex part | |
TW200907826A (en) | System and method for locating a three-dimensional object using machine vision | |
Xu et al. | Survey of 3D modeling using depth cameras | |
Zhang et al. | A new high resolution depth map estimation system using stereo vision and kinect depth sensing | |
US10559085B2 (en) | Devices, systems, and methods for reconstructing the three-dimensional shapes of objects | |
Wang et al. | Plane-based optimization of geometry and texture for RGB-D reconstruction of indoor scenes | |
Wan et al. | A study in 3d-reconstruction using kinect sensor | |
Guo et al. | Patch-based uncalibrated photometric stereo under natural illumination | |
Sang et al. | High-quality rgb-d reconstruction via multi-view uncalibrated photometric stereo and gradient-sdf | |
Mi et al. | 3D reconstruction based on the depth image: A review | |
CN110032927A (en) | A kind of face identification method | |
Corsini et al. | Stereo light probe | |
JP6579659B2 (en) | Light source estimation apparatus and program | |
Cui et al. | ACLC: Automatic Calibration for non-repetitive scanning LiDAR-Camera system based on point cloud noise optimization |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |