Global optimization method of depth map
Technical Field
The invention relates to the technical field of computer vision and image processing, in particular to a global optimization method of a depth map.
Background
Two images of a scene are acquired from different perspectives, and depth information of the scene can be estimated by the position offset of the scene in the two images. The position deviation corresponds to the parallax of the image pixel point, and can be directly converted into the scene depth, and is generally represented by a depth map. However, when a scene has texture missing and texture repeating, a large slice of holes may appear on the corresponding region in the calculated depth map. On one hand, in the existing method, the texture is enriched by artificially compensating the scene (such as pasting a mark point, projecting a light spot and the like), but the existing method has the conditions of inconvenient operation, incapability of operating, no function and the like; on the other hand, the depth map is directly optimized, but the method is complicated, excessively optimized or not in practical conditions.
Disclosure of Invention
In order to solve the defects in the prior art, the invention provides a global optimization method of a depth map, which realizes filtering and denoising of the depth map and filling of large cavities, converts left and right visual angle parallax data into RGB camera visual angles, fully utilizes RGB image edge information, and is simple and efficient.
In order to achieve the purpose, the invention adopts the specific scheme that: a global optimization method of a depth map comprises the following steps:
respectively carrying out regional filtering on initial left visual angle parallax data and initial right visual angle parallax data based on a regional growing method, removing error parallax of isolated block regions, and obtaining optimized left visual angle parallax data and optimized right visual angle parallax data; the specific process of removing the block area based on the area growing method and having the error parallax is as follows:
s1, newly building two images Buff and Dst with the same size as the original parallax image and the initial value of zero, wherein Buff is used for recording grown pixel points, and Dst is used for marking image block areas meeting the conditions;
s2, setting a first threshold value and a second threshold value; the first threshold value is a parallax difference value, and the second threshold value is an area value of a block area with wrong parallax;
s3, traversing each pixel point which is not grown, taking the current point as a seed point, and pressing into a region growing function;
s4, creating stacks vectorGrowPoints and stacks ResultPoints, taking out the tail point from the stacks vectorGrowPoints, and then according to the eight directions of the point: { -1, -1}, {0, -1}, {1, -1}, {1, 0}, {1, 1}, {0, 1}, { -1, 1}, { -1, 0} extracting the disparity value of the pixel point which does not grow out to be compared with the disparity value of the seed point, if the disparity value is smaller than a first threshold value, considering that the condition is met, respectively pushing the pixel point into stacked vector growth points and stacked resource points, marking the grown point in Buff, and repeating the process until no point exists in the stacked vector growth points; if the number of points in the stack resultPoints is smaller than a second threshold value, marking in Dst;
s5, repeating the steps S3 and S4, and removing the marked region in the Dst from the parallax data to obtain optimized left visual angle parallax data and optimized right visual angle parallax data;
step two, calculating left visual angle confidence coefficient data by the left visual angle parallax data optimized in the step one and the right visual angle parallax data optimized in the step one, wherein the specific method for calculating the left visual angle confidence coefficient data is that alphap=e-|ld-rd|wherein ld is left visual angle parallax data after optimization in the step one, rd is right visual angle parallax data after optimization in the corresponding step one, and alphapLeft view confidence coefficient data;
step three, calculating left visual angle depth data according to the left visual angle parallax data and the camera parameters which are optimized in the step one; simultaneously carrying out perspective projection conversion on the left perspective depth data and the left perspective confidence coefficient data obtained in the second step to obtain initial depth data and confidence coefficient data under the perspective of the RGB camera;
and step four, calculating edge constraint coefficient data by using RGB image edge information, and then generating optimized depth data by using the edge constraint coefficient data, the initial depth data and the confidence coefficient data under the viewing angle of the RGB camera in the step three through a global optimization objective function.
Preferably, an acquisition device is used in the process of acquiring the depth image, and the acquisition device comprises two near-infrared cameras and an RGB camera.
Preferably, in step three, the specific calculation process of the initial depth data under the viewing angle of the RGB camera is as follows:
t1, traversing image pixels, knowing the base lines and focal lengths of the left and right near-infrared cameras, and converting the parallax values into depth values;
t2, calculating the three-dimensional coordinates of the corresponding space point in the coordinate system according to the depth value and the internal parameters of the left near-infrared camera or the near-infrared right camera;
t3, calculating three-dimensional coordinates of the corresponding space points in the RGB camera coordinate system according to the relative position relation between the left near-infrared camera coordinate system or the right near-infrared camera coordinate system and the RGB camera coordinate system and the three-dimensional correction matrix between the left near-infrared camera and the right near-infrared camera; and T4, calculating the projection and the depth value of the corresponding space point on the RGB image plane by the internal parameters of the RGB camera, and obtaining the initial depth data under the viewing angle of the RGB camera.
Preferably, the global optimization objective function adopted in step four is:
wherein,initial depth data for a pixel point p on the image, Dpfor depth data to be found, αpLeft view confidence coefficient data, ω, of a pixel point pqpFour-neighborhood pixel points of which the number is p and q is edge constraint coefficient data; when epsilon (D) is minimum, the optimization is finished; assuming that the image has n pixel points, in order to minimize epsilon (D), the right-hand part of the global optimization objective function with equal sign is directed to each DpThe derivative is equal to zero to obtainto n equations, the linear system of equations AX ═ B is obtained, where A is a coefficient matrix of n × n, only with αpAnd ωqpin relation to that, B is a constant matrix of n × 1, only with αpAndwhere X is the column vector [ D ] of the depth data to be determined1,D2,…,Dn]TAnd obtaining optimized depth data through iterative calculation.
Preferably, for any pixel point p, AX is the p-th behavior in B: and calculating a coefficient matrix A and a constant matrix B.
Preferably, the specific calculation process of the coefficient matrix a and the constant matrix B is as follows:
(1) firstly, the RGB image is gradientedIs the gray scale difference between pixel points q and p, and then The value range is [0,1 ]]wherein β is a tuning parameter, and β ═ 20;
(2) from alphapAnd ωqpcalculating a coefficient matrix A, wherein the p-th behavior of A is (alpha)p+∑(p,q)∈E(ωpq+ωqp))Dp-∑(p,q)∈E(ωpq+ωqp)DqObtaining 5 nonzero values of the row, wherein the 5 nonzero values are the pixel point p and corresponding elements of the pixel point p in the four adjacent domains, and the image is formedelement α corresponding to prime point pp+∑(p,q)∈E(ωpq+ωqp) The element- (omega) corresponding to the four-adjacent domain pixel point q of the pixel point ppq+ωqp);
(3) from alphapAnd an initial depth valueComputing a constant matrix B, wherein the p-th behavior of B
Preferably, the linear equation set is solved by adopting a super-relaxation iteration method to obtain the optimized depth data.
Has the advantages that:
(1) the invention provides a global optimization method of a depth map, which is based on an acquisition device, wherein the acquisition device comprises two near infrared cameras (NIR) and a visible light (RGB) camera, the near infrared cameras form a binocular stereo vision system, the depth map is acquired in real time and is registered with an RGB image acquired by the visible light camera; the method comprises the steps of fully utilizing global information of left and right visual angle parallax data and edge constraint of color data to carry out global optimization on a depth map, converting the left and right visual angle parallax data into RGB camera visual angles, and utilizing RGB image edge information; when calculating the confidence coefficient data, adopt e-xthe method for determining the confidence coefficient by directly utilizing the parallax data of the left and right visual angles is proved by experiments to be simple and effective, and the method is characterized in that in the prior art, the confidence coefficient is determined by fitting a matching cost quadratic curve of three adjacent integer parallax values of pixel points, the method needs to recalculate the parallax matching cost, secondarily fits the three matching cost values of the pixel points, and determines α by judging the orientation of the curvepThe method is simpler compared with the prior art; the effective expression is as follows: the optimized depth map is smooth, the edge is kept, and large cavities can be filled well;
(2) the invention provides a global optimization method of a depth map, which is characterized in that a regional growing method is adopted to carry out regional filtering on initial left visual angle parallax data and initial right visual angle parallax data respectively, experiments prove that the method can finish marking after traversing an image once, and can effectively remove error and parallax of small isolated regions with similar parallax values and obviously different from peripheral parallax values.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a depth map of a large sheet of cavities in an optimized front header;
FIG. 3 is a depth map optimized by the global optimization method of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to the flowchart of fig. 1 of the present invention, the internal and external parameters of all cameras of the present invention are known, and the initial left view parallax data and the initial right view parallax data are calculated by the prior art, which is not described herein again. A global optimization method of a depth map is used in the process of acquiring a depth image based on an acquisition device, wherein the acquisition device comprises two near-infrared cameras and an RGB camera, and the method comprises the following steps:
respectively carrying out regional filtering on initial left visual angle parallax data and initial right visual angle parallax data based on a regional growing method, removing error parallax of isolated block regions, and obtaining optimized left visual angle parallax data and optimized right visual angle parallax data; generally generated parallax data are subjected to left and right verification, a large amount of point parallaxes which are mismatched are removed, but the mismatching parallaxes which are in a small area still exist, the method firstly carries out regional filtering on the parallax data of left and right visual angles respectively, small isolated areas with similar parallax values are removed, the parallax quality is further improved, and the specific process of removing the mismatching parallaxes of the block areas based on the regional growing method is as follows:
s1, newly building two images Buff and Dst with the same size as the original parallax image and the initial value of zero, wherein Buff is used for recording grown pixel points, and Dst is used for marking image block areas meeting the conditions;
s2, setting a first threshold value and a second threshold value; the first threshold value is a parallax difference value, and the second threshold value is an area value of a block area with wrong parallax; preferably, the first threshold is 10, and the second threshold is 60;
s3, traversing each pixel point which is not grown, taking the current point as a seed point, and pressing into a region growing function;
s4, creating stacks vectorGrowPoints and stacks ResultPoints, taking out the tail point from the stacks vectorGrowPoints, and then according to the eight directions of the point: { -1, -1}, {0, -1}, {1, -1}, {1, 0}, {1, 1}, {0, 1}, { -1, 1}, { -1, 0} extracting the disparity value of the pixel point which does not grow out to be compared with the disparity value of the seed point, if the disparity value is smaller than a first threshold value, considering that the condition is met, respectively pushing the pixel point into stacked vector growth points and stacked resource points, marking the grown point in Buff, and repeating the process until no point exists in the stacked vector growth points; if the number of points in the stack resultPoints is smaller than a second threshold value, marking in Dst;
s5, repeating the steps S3 and S4, and removing the marked region in the Dst from the parallax data to obtain optimized left visual angle parallax data and optimized right visual angle parallax data;
step two, bystep one, calculating left visual angle confidence coefficient data by the optimized left visual angle parallax data and the optimized right visual angle parallax data, wherein the specific method for calculating the left visual angle confidence coefficient data is αp=e-|ld-rd|wherein ld is left visual angle parallax data after optimization in the step one, rd is right visual angle parallax data after optimization in the corresponding step one, and alphapthe left visual angle confidence coefficient data plays a decisive role in the optimization effect, and α is the left visual angle confidence coefficient data, in the prior art, a method for determining the point parallax confidence coefficient data by fitting a matching cost curve exists, the implementation process is complicated, and the method for calculating the confidence coefficient data is simple and efficientpThe reliability of the value is closely related to the accuracy of parallax data, and the small blocks in the parallax data have parallax errors, so that large blocks in the corresponding area have depth data errors after optimization, and therefore, the invention provides a method for removing the block-shaped parallax errors based on a region growing method to improve the parallax quality;
step three, calculating left visual angle depth data according to the left visual angle parallax data and the camera parameters which are optimized in the step one; simultaneously carrying out perspective projection conversion on the left perspective depth data and the left perspective confidence coefficient data obtained in the second step to obtain initial depth data and confidence coefficient data under the perspective of the RGB camera; the specific calculation process of the initial depth data under the viewing angle of the RGB camera is as follows:
t1, traversing image pixels, knowing the base lines and focal lengths of the left and right near-infrared cameras, and converting the parallax values into depth values;
t2, calculating the three-dimensional coordinates of the corresponding space point in the coordinate system according to the depth value and the internal parameters of the left near-infrared camera or the near-infrared right camera;
t3, calculating three-dimensional coordinates of the corresponding space points in the RGB camera coordinate system according to the relative position relation between the left near-infrared camera coordinate system or the right near-infrared camera coordinate system and the RGB camera coordinate system and the three-dimensional correction matrix between the left near-infrared camera and the right near-infrared camera; t4, calculating the projection and depth value of the corresponding space point on the RGB image plane by the internal parameters of the RGB camera, and obtaining the initial depth data under the visual angle of the RGB camera;
step four, calculating edge constraint coefficient data by using RGB image edge information, and then generating optimized depth data by using a global optimization objective function according to the edge constraint coefficient data, the initial depth data and the confidence coefficient data under the viewing angle of the RGB camera in the step three, wherein the adopted global optimization objective function is as follows:
wherein,initial depth data for a pixel point p on the image, Dpfor depth data to be found, αpLeft view confidence coefficient data, ω, of a pixel point pqpFour-neighborhood pixel points of which the number is p and q is edge constraint coefficient data; when epsilon (D) is minimum, the optimization is finished; assuming that the image has n pixel points, in order to minimize epsilon (D), the right-hand part of the global optimization objective function with equal sign is directed to each Dpthe derivation is equal to zero, n equations are obtained, and a linear equation system with AX ═ B is obtained, wherein A is a coefficient matrix of n × n and only alpha is obtainedpAnd ωqpin relation to that, B is a constant matrix of n × 1, only with αpAndwhere X is the column vector [ D ] of the depth data to be determined1,D2,…,Dn]TAnd obtaining optimized depth data through iterative calculation.
For any pixel point p, AX is the p-th behavior in B: and calculating a coefficient matrix A and a constant matrix B.
Step three, acquiring initial depth data, calculating a coefficient matrix and a constant matrix below, wherein for an image with million resolution, the depth data volume can reach million, the coefficient matrix data volume is in a square level, and in order to meet the requirement of GPU real-time implementation, the invention adopts an ultra-relaxation iterative method (SOR) to solve a linear equation set to complete depth data optimization, as shown in FIG. 2 and FIG. 3, FIG. 2 is a depth map of a large-slice cavity before optimization; FIG. 3 is a depth map optimized using the global optimization method of the present invention. The specific calculation process of the coefficient matrix A and the constant matrix B is as follows:
(1) firstly, the RGB image is gradientedIs the gray scale difference between pixel points q and p, and then The value range is [0,1 ]]where β is the tuning parameter, and β ═ 20, by this step ω is solvedqp,ωqpThe effect on the depth effect is to keep the depth edge from being overly smoothed;
(2) from alphapAnd ωqpCalculating a coefficient matrix a, wherein the pth behavior of a: obtaining 5 nonzero values of the row, wherein the 5 nonzero values are corresponding elements of the pixel point p and four adjacent domain pixel points of the pixel point p, and the element α corresponding to the pixel point pp+∑(p,q)∈E(ωpq+ωqp) The element- (omega) corresponding to the four-adjacent domain pixel point q of the pixel point ppq+ωqp);
(3) from alphapAnd an initial depth valueComputing a constant matrix B, wherein the p-th behavior of B
(4) And solving a linear equation set by an SOR method to obtain optimized depth data.
The invention provides a global optimization method of a depth map, which performs global optimization on the initial depth of a scene, realizes real-time high-precision acquisition of depth, and mainly solves the problem that a large number of cavities exist in calculated parallax data when the texture of the scene is lacked or repeated, such as hair, the texture is single, and even if an active light source is adopted to project structured light, the texture is easily absorbed and lacks characteristics. The method can be used in the cases of three-dimensional reconstruction, somatosensory interaction and the like. In the three-dimensional reconstruction, high-quality depth data under each visual angle is provided for real-time high-precision reconstruction, and subsequent optimization processing operation can be simplified. In the somatosensory interaction, a real picture is displayed in front of the opposite side by establishing different interactor models.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.