Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a global optimization method of a depth map, which realizes filtering and denoising of the depth map and filling of large cavities, converts left and right visual angle parallax data into RGB camera visual angles, fully utilizes RGB image edge information, and is simple and efficient.
In order to achieve the purpose, the invention adopts the specific scheme that: a global optimization method of a depth map comprises the following steps:
respectively carrying out regional filtering on initial left visual angle parallax data and initial right visual angle parallax data based on a regional growing method, removing error parallax of isolated block regions, and obtaining optimized left visual angle parallax data and optimized right visual angle parallax data; the specific process of removing the block area based on the area growing method and having the error parallax is as follows:
s1, creating two images Buff and Dst which have the same size as the initial left visual angle parallax data and the initial right visual angle parallax data and have initial values of zero, wherein Buff is used for recording grown pixel points, and Dst is used for marking image block areas meeting the conditions;
s2, setting a first threshold value and a second threshold value; the first threshold value is a parallax difference value, and the second threshold value is an area value of a block area with wrong parallax;
s3, traversing each pixel point which is not grown, taking the current point as a seed point, and pressing into a region growing function;
s4, creating stacks vectorGrowPoints and stacks ResultPoints, taking out the tail point from the stacks vectorGrowPoints, and then according to the eight directions of the point: { -1, -1}, {0, -1}, {1, -1}, {1, 0}, {1, 1}, {0, 1}, { -1, 1}, { -1, 0} extracting the disparity value of the pixel point which does not grow out to be compared with the disparity value of the seed point, if the disparity value is smaller than a first threshold value, considering that the condition is met, respectively pushing the pixel point into stacked vector growth points and stacked resource points, marking the grown point in Buff, and repeating the process until no point exists in the stacked vector growth points; if the number of points in the stack resultPoints is smaller than a second threshold value, marking in Dst;
s5, repeating the steps S3 and S4, and removing the marked region in the Dst from the parallax data to obtain optimized left visual angle parallax data and optimized right visual angle parallax data;
step two, calculating left visual angle confidence coefficient data according to the left visual angle parallax data optimized in the step one and the right visual angle parallax data optimized in the step one; the specific method for calculating the left view confidence coefficient data comprises the following steps: o isp=e-|ld-rd|Wherein ld is left view parallax data after the optimization of the step one, rd is corresponding right view parallax data after the optimization of the step one, and OpIs to the leftView confidence coefficient data;
step three, calculating left visual angle depth data according to the left visual angle parallax data and the camera parameters which are optimized in the step one; simultaneously carrying out perspective projection conversion on the left perspective depth data and the left perspective confidence coefficient data obtained in the second step to obtain initial depth data and confidence coefficient data under the perspective of the RGB camera;
and step four, calculating edge constraint coefficient data by using RGB image edge information, and then generating optimized depth data by using the edge constraint coefficient data, the initial depth data and the confidence coefficient data under the viewing angle of the RGB camera in the step three through a global optimization objective function.
Preferably, an acquisition device is used in the process of acquiring the depth image, and the acquisition device comprises two near-infrared cameras and an RGB camera.
Preferably, in step three, the specific calculation process of the initial depth data under the viewing angle of the RGB camera is as follows:
t1, traversing image pixels, knowing the base line and focal length of the left and right near-infrared cameras, and converting the parallax value into left visual angle depth data;
t2, calculating three-dimensional coordinates of the corresponding space points in the corresponding coordinate system according to the left visual angle depth data and the internal parameters of the left near-infrared camera or the near-infrared right camera;
t3, calculating three-dimensional coordinates of the corresponding space points in the RGB camera coordinate system according to the relative position relation between the left near-infrared camera coordinate system or the right near-infrared camera coordinate system and the RGB camera coordinate system and the three-dimensional correction matrix between the left near-infrared camera and the right near-infrared camera; and T4, calculating the projection and the depth value of the corresponding space point on the RGB image plane by the internal parameters of the RGB camera, and obtaining the initial depth data under the viewing angle of the RGB camera.
Preferably, the global optimization objective function adopted in step four is:
wherein the content of the first and second substances,
initial depth data for a pixel point p on the image, D
pFor depth data to be found, α
pIs confidence coefficient data, omega, of pixel point p under the visual angle of RGB camera
qpFour-neighborhood pixel points of which the number is p and q is edge constraint coefficient data; when (D) is minimum, the optimization ends; assuming that the image has n pixel points, in order to minimize (D), the right-hand part of the global optimization objective function with equal sign is directed to each D
pThe derivation is equal to zero, n equations are obtained, and a linear equation system with AX ═ B is obtained by sorting, wherein A is a coefficient matrix of n × n, and alpha
pAnd ω
qpIn relation to, B is a constant matrix of n × 1, and
pand
where X is the column vector [ D ] of the depth data to be determined
1,D
2,…,D
n]
TAnd obtaining optimized depth data through iterative calculation.
Preferably, for any pixel point p, AX is the p-th behavior in B:
and calculating a coefficient matrix A and a constant matrix B.
Preferably, the specific calculation process of the coefficient matrix a and the constant matrix B is as follows:
(1) firstly, the RGB image is gradiented
Is the gray scale difference between pixel points q and p, and then
The value range is [0, 1 ]]Wherein beta is a tuning parameter, andβ=20;
(2) from alphapAnd ωqpCalculating a coefficient matrix a, wherein the pth behavior of a: (alphap+∑(p,q)∈E(ωpq+ ωqp))Dp-∑(p,q)∈E(ωpq+ωqp)DqObtaining 5 nonzero values of the row, wherein the 5 nonzero values are the pixel point p and corresponding elements of the pixel point p in four adjacent domains, and the element corresponding to the pixel point p is alphap+∑(p,q)∈E(ωpq+ωqp) The element corresponding to the four-neighborhood pixel point q of the pixel point p is- (omega)pq+ωqp);
(3) From alpha
pAnd initial depth data
Computing a constant matrix B, wherein the p-th behavior of B
Preferably, the linear equation set is solved by adopting a super-relaxation iteration method to obtain the optimized depth data.
Has the advantages that:
(1) the invention provides a global optimization method of a depth map, which is based on an acquisition device, wherein the acquisition device comprises two near infrared cameras (NIR) and a visible light (RGB) camera, the near infrared cameras form a binocular stereo vision system, the depth map is acquired in real time and is registered with an RGB image acquired by the visible light camera; the method comprises the steps of fully utilizing global information of left and right visual angle parallax data and edge constraint of color data to carry out global optimization on a depth map, converting the left and right visual angle parallax data into RGB camera visual angles, and utilizing RGB image edge information; when calculating the confidence coefficient data, adopt e-xThe model directly utilizes the method of the parallax data of left and right visual angles, and experiments prove that the method is simple and effective. The simple and clean body is as follows: in the existing method, the confidence coefficient is determined by fitting a matching cost quadratic curve of three adjacent integer disparity values of a pixel point, and the method needs to be repeatedNewly calculating parallax matching cost, performing secondary fitting on three matching cost values of pixel points, and determining alpha by judging curve orientationpThe method is simpler compared with the prior art; the effective expression is as follows: the optimized depth map is smooth, the edge is kept, and large cavities can be filled well;
(2) the invention provides a global optimization method of a depth map, which is characterized in that a regional growing method is adopted to carry out regional filtering on initial left visual angle parallax data and initial right visual angle parallax data respectively, experiments prove that the method can finish marking after traversing an image once, and can effectively remove error and parallax of small isolated regions with similar parallax values and obviously different from peripheral parallax values.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to the flowchart of fig. 1 of the present invention, the internal and external parameters of all cameras of the present invention are known, and the initial left view parallax data and the initial right view parallax data are calculated by the prior art, which is not described herein again. A global optimization method of a depth map is used in the process of acquiring a depth image based on an acquisition device, wherein the acquisition device comprises two near-infrared cameras and an RGB camera, and the method comprises the following steps:
respectively carrying out regional filtering on initial left visual angle parallax data and initial right visual angle parallax data based on a regional growing method, removing error parallax of isolated block regions, and obtaining optimized left visual angle parallax data and optimized right visual angle parallax data; generally generated parallax data are subjected to left and right verification, a large amount of point parallaxes which are mismatched are removed, but the mismatching parallaxes which are in a small area still exist, the method firstly carries out regional filtering on the parallax data of left and right visual angles respectively, small isolated areas with similar parallax values are removed, the parallax quality is further improved, and the specific process of removing the mismatching parallaxes of the block areas based on the regional growing method is as follows:
s1, creating two images Buff and Dst which have the same size as the initial left visual angle parallax data and the initial right visual angle parallax data and have initial values of zero, wherein Buff is used for recording grown pixel points, and Dst is used for marking image block areas meeting the conditions;
s2, setting a first threshold value and a second threshold value; the first threshold value is a parallax difference value, and the second threshold value is an area value of a block area with wrong parallax; preferably, the first threshold is 10, and the second threshold is 60;
s3, traversing each pixel point which is not grown, taking the current point as a seed point, and pressing into a region growing function;
s4, creating stacks vectorGrowPoints and stacks ResultPoints, taking out the tail point from the stacks vectorGrowPoints, and then according to the eight directions of the point: { -1, -1}, {0, -1}, {1, -1}, {1, 0}, {1, 1}, {0, 1}, { -1, 1}, { -1, 0} extracting the disparity value of the pixel point which does not grow out to be compared with the disparity value of the seed point, if the disparity value is smaller than a first threshold value, considering that the condition is met, respectively pushing the pixel point into stacked vector growth points and stacked resource points, marking the grown point in Buff, and repeating the process until no point exists in the stacked vector growth points; if the number of points in the stack resultPoints is smaller than a second threshold value, marking in Dst;
s5, repeating the steps S3 and S4, and removing the marked region in the Dst from the parallax data to obtain optimized left visual angle parallax data and optimized right visual angle parallax data;
step two, the parallax data of the left visual angle optimized in the step one and the parallax data of the right visual angle optimized in the step one are measuredCalculating left view confidence coefficient data; the specific method for calculating the left view confidence coefficient data comprises the following steps: o isp=e-|ld-rd L, wherein ld is left viewing angle parallax data after optimization in step one, rd is right viewing angle parallax data after optimization in corresponding step one, and OpLeft view confidence coefficient data; in the existing method, a method for determining the point parallax confidence coefficient data by fitting a matching cost curve exists, the implementation process is complicated, and the method for calculating the confidence coefficient data is simple and efficient. Left view confidence coefficient data is decisive for the optimization effect, and OpThe reliability of the value is closely related to the accuracy of parallax data, and the small blocks in the parallax data have parallax errors, so that large blocks in the corresponding area have depth data errors after optimization, and therefore, the invention provides a method for removing the block-shaped parallax errors based on a region growing method to improve the parallax quality;
step three, calculating left visual angle depth data according to the left visual angle parallax data and the camera parameters which are optimized in the step one; simultaneously carrying out perspective projection conversion on the left perspective depth data and the left perspective confidence coefficient data obtained in the second step to obtain initial depth data and confidence coefficient data under the perspective of the RGB camera; the specific calculation process of the initial depth data under the viewing angle of the RGB camera is as follows:
t1, traversing image pixels, knowing the base line and focal length of the left and right near-infrared cameras, and converting the parallax value into left visual angle depth data;
t2, calculating three-dimensional coordinates of the corresponding space points in the corresponding coordinate system according to the left visual angle depth data and the internal parameters of the left near-infrared camera or the near-infrared right camera;
t3, calculating three-dimensional coordinates of the corresponding space points in the RGB camera coordinate system according to the relative position relation between the left near-infrared camera coordinate system or the right near-infrared camera coordinate system and the RGB camera coordinate system and the three-dimensional correction matrix between the left near-infrared camera and the right near-infrared camera; t4, calculating the projection and depth value of the corresponding space point on the RGB image plane by the internal parameters of the RGB camera, and obtaining the initial depth data under the visual angle of the RGB camera;
step four, calculating edge constraint coefficient data by using RGB image edge information, and then generating optimized depth data by using a global optimization objective function according to the edge constraint coefficient data, the initial depth data and the confidence coefficient data under the viewing angle of the RGB camera in the step three, wherein the adopted global optimization objective function is as follows:
wherein the content of the first and second substances,
initial depth data for a pixel point p on the image, D
pFor depth data to be found, α
pIs confidence coefficient data, omega, of pixel point p under the visual angle of RGB camera
qpFour-neighborhood pixel points of which the number is p and q is edge constraint coefficient data; when (D) is minimum, the optimization ends; assuming that the image has n pixel points, in order to minimize (D), the right-hand part of the global optimization objective function with equal sign is directed to each D
pThe derivation is equal to zero, n equations are obtained, and a linear equation system with AX ═ B is obtained by sorting, wherein A is a coefficient matrix of n × n, and alpha
pAnd ω
qpIn relation to, B is a constant matrix of n × 1, and
pand
where X is the column vector [ D ] of the depth data to be determined
1,D
2,…,D
n]
TAnd obtaining optimized depth data through iterative calculation.
For any pixel point p, AX is the p-th behavior in B:
and calculating a coefficient matrix A and a constant matrix B.
Step three, acquiring initial depth data, calculating a coefficient matrix and a constant matrix below, wherein for an image with million resolution, the depth data volume can reach million, the coefficient matrix data volume is in a square level, and in order to meet the requirement of GPU real-time implementation, the invention adopts an ultra-relaxation iterative method (SOR) to solve a linear equation set to complete depth data optimization, as shown in FIG. 2 and FIG. 3, FIG. 2 is a depth map of a large-slice cavity before optimization; FIG. 3 is a depth map optimized using the global optimization method of the present invention. The specific calculation process of the coefficient matrix A and the constant matrix B is as follows:
(1) firstly, the RGB image is gradiented
Is the gray scale difference between pixel points q and p, and then
The value range is [0, 1 ]]Where β is the tuning parameter, and β ═ 20, by this step ω is solved
qp,ω
qpThe effect on the depth effect is to keep the depth edge from being overly smoothed;
(2) from alphapAnd ωqpCalculating a coefficient matrix a, wherein the pth behavior of a: (alphap+∑(p,q)∈E(ωpq+ ωqp))Dp-∑(p,q)∈E(ωpq+ωqp)DqObtaining 5 nonzero values of the row, wherein the 5 nonzero values are the pixel point p and corresponding elements of the pixel point p in four adjacent domains, and the element corresponding to the pixel point p is alphap+∑(p,q)∈E(ωpq+ωqp) The element corresponding to the four-neighborhood pixel point q of the pixel point p is- (omega)pq+ωqp);
(3) From alpha
pAnd initial depth data
Computing a constant matrix B, wherein the p-th behavior of B
(4) And solving a linear equation set by an SOR method to obtain optimized depth data.
The invention provides a global optimization method of a depth map, which performs global optimization on the initial depth of a scene, realizes real-time high-precision acquisition of depth, and mainly solves the problem that a large number of cavities exist in calculated parallax data when the texture of the scene is lacked or repeated, such as hair, the texture is single, and even if an active light source is adopted to project structured light, the texture is easily absorbed and lacks characteristics. The method can be used in the cases of three-dimensional reconstruction, somatosensory interaction and the like. In the three-dimensional reconstruction, high-quality depth data under each visual angle is provided for real-time high-precision reconstruction, and subsequent optimization processing operation can be simplified. In the somatosensory interaction, a real picture is displayed in front of the opposite side by establishing different interactor models.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.