Disclosure of Invention
The invention aims to provide a regional self-adaptive matching light field data depth reconstruction method, which provides high-precision depth information for digital refocusing and three-dimensional scene reconstruction based on a light field.
In order to achieve the above object, the present invention provides a method for reconstructing depth of light field data by adaptive matching in different regions, which comprises the following steps:
step 110, defining a distance measure function between a center view and a pixel point to be matched in a view to be matched;
step 120, selecting different matching windows for pixel points to be matched in different regions, wherein the different regions comprise texture regions, smooth regions and edge shielding regions;
step 130, counting the number of correctly matched pixel points of the matching window as a distance measurement value of the window, and adding a smoothing factor to the smooth area;
step 140, the matching disparity is optimized and the scene depth is calculated.
Further, the "distance measure function" in the step 110 represents the following formula (2):
E i,j (x,y,s)=||L u0,v0 (x,y)-L ui,vj (x+is,y+js)|| (2)
in the formula (2), L u0,v0 (x, y) represents a pixel value corresponding to the pixel point coordinate (x, y) in the central view; l is ui,vj (x + is, y + js) represents a pixel value corresponding to a pixel point coordinate (x + is, y + js) in the view to be matched; the central view L u0,v0 (x, y) as a central viewpoint (u) 0 ,v 0 ) The resulting image; the view L to be matched ui,vj (x, y) is the central viewpoint (u) 0 ,v 0 ) The rest of the viewpoints (u) except i ,v j ) The resultant image; e i,j (x, y, s) represents a distance measurement value of the pixel point coordinates (x, y) under the parallax s.
Further, step 120 specifically includes:
step 121, selecting a rectangular area with pixel points to be matched as a central point as a matching window in texture areas and smooth areas in a central view and a view to be matched; and
and step 122, selecting a window with the pixel points to be matched as the top points as a matching window in the edge shielding areas in the central view and the view to be matched.
Further, in step 121, the size of the matching window is selected as a rectangular region with a large difference between the gray values of the central point and the neighboring areas and located within the boundary of the occlusion region, which is expressed as the following formula (3):
in the formula (3), (s, t) represents neighborhood coordinates of pixel coordinates (x, y) in the central view, f (s, t) is a neighborhood pixel value, # represents the number of pixels, f (x, y) represents a pixel value of the pixel coordinates (x, y), and N (x, y) represents a rectangular region centered on the pixel coordinates (x, y).
Further, the matching window selected in step 121 is divided into 8 directions, the pixel points in the selected matching window are located in the same chrominance block, and a window in the matching direction in which the difference between the gray values of the pixel point to be matched and the pixel point in the window is large, which is represented by the following formula (4), is selected as a matching module:
in the formula (4), l represents the matching direction, W l Representing a window with pixel points to be matched as vertexes in the direction l, f (x, y) representing the pixel values of pixel point coordinates (x, y) in the central view, and f (x ', y') representing W of the pixel point coordinates (x, y) in the central view l Pixel values of neighborhood coordinates (x ', y').
Further, step 130 specifically includes:
step 131, counting the number of correct matching pixel points in a matching window in a non-smooth area in a central view, and taking the number as a distance measure function of the window; and
step 132, counting the number of correct matching pixel points in the matching window and the smoothing term in the smoothing area in the central view, and using the number and the smoothing term as a distance measure function of the window.
Further, the distance measurement value CONF (x, y, s) of the non-smooth region matching window in the step 131 is expressed by the following formula (7):
CONF(x,y,s)=conf(x,y,s),(x,y)∈W R×R (7)
in formula (7), W R×R Representing a matching window taking a pixel point (x, y) to be matched in the matching window as a center, taking R as a rectangular region with side length, and CONF (x, y, s) representing the pixel point (x, y) to be matched in the non-smooth region matching window in parallax sAnd (2) a distance metric value of conf (x, y, s) represents a distance metric value of a pixel point (x, y) to be matched in the matching window under the parallax s, and is obtained by the following formulas (5) and (6):
in the formulas (5) and (6), temp represents the number of correctly matched pixels in the matched window after statistics, E (x, y, s) represents the distance measurement value of the pixel (x, y) to be matched in the matched window under the parallax s, W represents the matched window, # [ W ] represents the number of pixels in the window, conf (x, y, s) represents the distance measurement value of the pixel (x, y) to be matched in the matched window under the parallax s.
Further, the distance measure CONF (x, y, s) of the smoothed region matching window in step 132 is expressed as the following formula (8):
CONF(x,y,s)=βconf(x,y,s)+(β-1)α(x,y),(x,y)∈W RxR (8)
in the formula (8), CONF (x, y, s) represents a distance measurement value of the smooth region matching window,
and the difference value between the current parallax value and the next parallax value in the matching process is represented, and beta is a weight coefficient of the smoothing item.
Further, step 140 specifically includes:
step 141, according to the formula (7) and the formula (8), solving the window distance measure function to obtain a preliminary disparity map by using the following formula (9):
s(x,y)=argmax(CONF(x,y,s)) (9)
step 142, optimizing a smooth area by adopting a TV model; and
and step 143, calculating a depth map Z according to the optimized disparity map s (x, y) of the formula (9).
By adopting the method provided by the invention, high-precision depth reconstruction can be realized under the four-dimensional light field theory.
Detailed Description
The invention is described in detail below with reference to the figures and examples.
As shown in fig. 1, the method for reconstructing depth of light field data by sub-region adaptive matching provided in this embodiment includes the following steps:
and step 110, defining a distance measure function between the central view and the pixel points to be matched in the view to be matched.
And step 120, selecting different matching windows for the pixel points to be matched in different areas.
Step 130, counting the number of correctly matched pixel points of the matching window as a distance measurement value of the window, and adding a smoothing factor to the smooth area.
Step 140, the matching disparity is optimized and the scene depth is calculated.
In one embodiment, step 110 specifically includes:
considering that the edge of the lens is likely to generate aberration, a viewpoint (u) at the center of the lens is generally selected 0 ,v 0 ) (hereinafter, simply referred to as the center viewpoint (u) 0 ,v 0 ) The resultant image as a center viewL u0,v0 (x, y), center viewpoint (u) 0 ,v 0 ) The rest of the viewpoints (u) except i ,v j ) The resulting image is taken as the view L to be matched ui,vj (x,y)。
And describing the corresponding relation between the two pixel points by using the difference of the gray values. Coordinates (x) of any pixel point in view to be matched i ,y j ) The correlation between the coordinate (x, y) of any pixel point in the central view and the coordinate (x, y) of any pixel point in the central view can be measured by a distance measure function E (x, y) represented by formula (1):
E(x,y)=||L u0,v0 (x,y)-L ui,vi (x i ,y j )|| (1)
in the formula (1), E (x, y) represents the coordinate (x) of a pixel point in the view to be matched i ,y j ) Distance measurement values between the coordinate (x, y) of the pixel point in the central view; l is u0,v0 (x,y)=L(u 0 ,v 0 X, y) representing pixel values corresponding to the coordinates (x, y) of the pixel points in the central view; l is ui,vj (x i ,y j )=L(u i ,v j ,x i ,y j ) Which represents the coordinates (x) of the pixel points in the view to be matched i ,y j ) A corresponding pixel value; and | | | | | | is an operator for describing the distance, and an L2 norm is selected for calculation.
And introducing epipolar constraint, namely that pixel points with corresponding relation between the central view and the two images are positioned on the epipolar line. And (3) obtaining a distance measure function of the pixel point under any viewpoint and related to the parallax s, wherein the distance measure function is expressed as a formula (2):
E i,j (x,y,s)=||L u0,v0 (x,y)-L ui,vj (x+is,y+js)|| (2)
in the formula (2), E i,j (x, y, s) represents a distance measure of the pixel coordinates (x, y) under parallax s, L u0,v0 (x, y) represents a pixel value corresponding to the pixel point coordinate (x, y) in the central view; l is ui,vj (x + is, y + js) represents a pixel value corresponding to a pixel point coordinate (x + is, y + js) in the view to be matched; u ═ i i -u 0 Denotes u in the u direction i J-v, i j -v 0 Representing v in the v direction j I.e. i-0, j-0 represents the central viewpoint (u) 0 ,v 0 ) By analogy, i equals 1, j equals 1 to denote the viewpoint (u) 1 ,v 1 )。
In one embodiment, step 120 specifically includes:
step 121, selecting a rectangular area with pixel points to be matched as a central point as a matching window in texture areas and smooth areas in a central view and a view to be matched; and
and step 122, selecting a window with the pixel points to be matched as the top points as a matching window in the edge shielding areas in the central view and the view to be matched.
In one embodiment, step 121 specifically includes:
in the texture region and the smooth region of the central view, the rectangular region with the pixel point to be matched as the central point is selected as the matching window in the embodiment. The size of the matching window selects a rectangular region with larger difference between the central point and the adjacent gray value, and the rectangular region cannot exceed the boundary of the shielding region. Here, "smooth region" means that the region has no texture and the pixel value is the same size. "occlusion region" refers to texture differences in different views.
The discrimination formula of the difference between the central point and the neighborhood gray value is shown as formula (3):
in the formula (3), (s, t) represents neighborhood coordinates of pixel point coordinates (x, y) in the central view, f (s, t) is a neighborhood pixel value, f (x, y) represents a pixel value of the pixel point coordinates (x, y), N (x, y) represents a rectangular area with the pixel point coordinates (x, y) as the center, and the size of the rectangular area is determined by the value of Tn. And # represents the number of the samples, and Tn represents the judgment of whether the difference between the gray values in the neighborhood meets the matching requirement. The Tn is selected by considering the efficiency of the algorithm and the scene difference, experiments prove that the Tn is generally selected to have better matching precision of 0.65.
In one embodiment, step 122 specifically includes:
the matching window selected in step 121 is divided into 8 directions (as shown in fig. 4). In general, at least one window with the pixel points to be matched as vertexes exists in 8 directions, and matching characteristics such as textures and gradients in the window exist in other views to be matched. Selecting a window in a matching direction as a matching module, wherein the pixel point in the matching window is in the same chrominance block, and the difference between the pixel point to be matched and the gray value of the pixel point in the window with the pixel point to be matched as the vertex is larger, and the window is represented by the following formula (4):
in the formula (4), l represents the matching direction, W l A window representing the l direction with the pixel point to be matched as the vertex (as shown in the light gray (nearly white) area of fig. 4), and (x ', y') representing the W of the pixel point coordinate (x, y) in the central view l Neighborhood coordinates, f (x, y) representing the pixel value of a pixel point coordinate (x, y) in the central view, f (x ', y') representing W of a pixel point coordinate (x, y) in the central view l Pixel values of neighborhood coordinates (x ', y').
In the above embodiments, the chrominance block refers to a chrominance block obtained by converting an RGB image of a scene into a YCbCr image, where a CbCr channel of the scene is a chrominance channel, and performing cluster segmentation by using multiple cbcrs.
In one embodiment, step 130 specifically includes:
and 131, counting the number of correct matching pixel points in the matching window in a non-smooth area in the central view as a distance measure function of the window. The non-smooth area comprises a shielding area and a texture area, and the distance measure of the window is the ratio of the number of correctly matched pixel points in the window to the total number of pixel points in the window.
Step 132, counting the number of correct matching pixel points in the matching window and the smoothing term in the smoothing area in the central view, and using the number and the smoothing term as a distance measure function of the window. The distance measure of the window is the ratio of the number of correctly matched pixel points in the window to the total number of pixel points in the window plus a smoothing term.
In one embodiment, step 131 is specifically as follows:
selecting a viewpoint (u) to be matched by using the area W of the central view as a matching window in the matching process i ,v j ) And (3) calculating the distance measurement value of the pixel point at the corresponding position in the matching window of the central view and the window in the selected view to be matched by using the formula (2) in the corresponding view which is the same as the area W, and considering that the pixel point is correctly matched under the condition that the distance measurement value is less than a threshold value zeta. Wherein the threshold value ζ is determined in accordance with an error of linear interpolation, which is for an exact match in a case where the parallax value is not an integral multiple of the pixel distance.
Counting the number of correctly matched pixel points in the matching window as a distance measurement value of the matching window, calculating the distance measurement value of the matching window by adopting the following formula (5), and normalizing the following formula (5) to obtain the following formula (6):
in the formulas (5) and (6), (x, y) represents the coordinates of the pixel points to be matched in the matching window, temp represents the number of the correctly matched pixel points in the counted matching window, E (x, y, s) represents the distance measurement value of the pixel points (x, y) to be matched in the matching window under the parallax s, W represents the matching window, # [ W ] represents the number of the pixel points in the window, conf (x, y, s) represents the distance measurement value of the pixel points (x, y) to be matched in the matching window under the parallax s.
The distance metric value CONF (x, y, s) matched at the non-smooth region window is expressed by the following formula (7):
CONF(x,y,s)=conf(x,y,s),(x,y)∈W R×R (7)
in the formula (7), W R×R The method comprises the steps of representing a matching window with pixel points (x, y) to be matched in the matching window as the center, representing a rectangular area with R as the side length, representing the distance measurement value of the pixel points (x, y) to be matched in the non-smooth area matching window under the parallax s by CONF (x, y, s), representing the distance measurement value of the pixel points (x, y) to be matched in the matching window under the parallax s by CONF (x, y, s), and obtaining CONF through calculation of formula (5) and formula (6).
In one embodiment, step 132 specifically includes:
in order to solve the problem of mismatching of the smooth region, a matching window is introduced, the difference value between the mean value of the parallax image pixel points corresponding to the correct matching pixel points and the current parallax value is used as a smooth term α (x, y), and then the distance measure CONF (x, y, s) of the smooth region matching window is expressed as the following formula (8):
CONF(x,y,s)=βconf(x,y,s)+(β-1)α(x,y),(x,y)∈W R×R (8)
in equation (8), CONF (x, y, s) represents a distance measurement value of the smooth region matching window. Beta is a weight coefficient of a smoothing item, depends on a scene, and is best close to 0 if more smoothing areas exist in the scene; if the smooth area in the scene is small, the value of β is preferably close to 1, and therefore, the value of β is set in the range between 0 and 1, preferably 0.2.
In one embodiment, step 140 specifically includes:
and step 141, solving the window distance measure function to obtain a preliminary disparity map.
And 142, optimizing the smooth region by adopting a full variation TV model.
Step 143, calculate the depth map.
In one embodiment, step 141 specifically includes:
according to the formula (7) and the formula (8), solving the optimization problem to obtain a preliminary disparity map s (x, y) by using the following formula (9):
s(x,y)=argmax(CONF(x,y,s)) (9)
in one embodiment, step 142 specifically includes:
a TV model is adopted to optimize a smooth area, the TV model can play a role in smooth filtering, and the edge structure can be better guaranteed.
In one embodiment, as shown in fig. 2, fig. 2 shows a diagram of parallax versus depth in a four-dimensional light field. In FIG. 2, u 1 ,u 2 Is the viewpoint on the (u, v) plane (v is 0), B is the viewpoint distance, A 1 Is the image point corresponding to the object point A, and s is the viewpoint u of the object point A 1 ,u 2 Parallax on the (x, y) imaging plane.
Step 143 specifically includes: according to the parallax map s (x, y) optimized by the formula (9), the optimized parallax map s (x, y) is converted into a depth map Z by the formula (10), that is, the distance F between the microlens array and the main lens and the depth Z of the focal plane in a single exposure of the light field camera 0 And the distance B between adjacent viewpoints is a fixed value, and the depth Z can be calculated according to the disparity s obtained by equation (9):
in the formula (10), Z 0 Denotes the depth of the focal plane, F denotes the distance between the main lens and the imaging plane, B denotes the distance between adjacent viewpoints, Z denotes the depth of the spatial scene, and s has the same meaning as that expressed by s (x, y) in the above equation.
Fig. 3a and 3b are schematic diagrams illustrating an occlusion region selection matching window, where fig. 3a is an occlusion region and fig. 3b is an occluded region. Fig. 5a to 5c show an example of disparity solution and corresponding error analysis performed according to the method provided by the present invention, where fig. 5a is a central view, fig. 5b is a reconstructed disparity map, and fig. 5c is a local mean square error map. Fig. 6a to 6d illustrate examples of depth reconstruction according to the method provided by the present invention, wherein fig. 6a and 6c are both central views, and fig. 6b and 6d are both accurate disparity maps.
Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present invention, which is defined by the appended claims.