CN110827338B

CN110827338B - Regional self-adaptive matching light field data depth reconstruction method

Info

Publication number: CN110827338B
Application number: CN201911063001.XA
Authority: CN
Inventors: 王安东; 刘畅; 邱钧; 史立根
Original assignee: Shandong Yellow River Delta National Nature Reserve Management Committee; Beijing Information Science and Technology University
Current assignee: Shandong Yellow River Delta National Nature Reserve Management Committee
Priority date: 2019-10-31
Filing date: 2019-10-31
Publication date: 2022-08-16
Anticipated expiration: 2039-10-31
Also published as: CN110827338A

Abstract

The invention discloses a regional self-adaptive matching light field data depth reconstruction method, which comprises the following steps: step 110, defining a distance measure function between a center view and a pixel point to be matched in a view to be matched; step 120, selecting different matching windows for pixel points to be matched in different regions, wherein the different regions comprise texture regions, smooth regions and edge shielding regions; step 130, counting the number of correctly matched pixel points of the matching window as a distance measurement value of the window, and adding a smoothing factor to the smooth area; step 140, the matching disparity is optimized and the scene depth is calculated. The invention can realize high-precision depth reconstruction under the four-dimensional light field theory.

Description

Regional self-adaptive matching light field data depth reconstruction method

Technical Field

The invention relates to the technical field of computer vision and digital image processing, in particular to a regional self-adaptive matching light field data depth reconstruction method.

Background

The light field contains space and angle information of light rays, is used for describing the irradiance of the light rays in a scene, and is widely applied to scene rendering, depth information acquisition, computational photography and three-dimensional reconstruction. Seven-dimensional plenoptic function

It was first proposed to formally describe the light field, describing irradiance information for light of any wavelength at any time in space. Levoy, Gortler et al propose a biplane parameterized representation of the light field, i.e., a four-dimensional light field L (x, y, u, v). Based on the theory of the four-dimensional light field, Ng and Levoy, et al design a handheld camera based on a micro-lens array acquisition system, and realize four-dimensional light field data acquisition and refocusing imaging under single exposure.

There are currently three types of methods for obtaining scene depth information from light field data:

the first is a depth acquisition method based on multi-view pixel matching, for which the light field can be regarded as a set of perspective views under different views, and which acquires parallax information of an arbitrary object point under all views through pixel matching.

The second is a depth acquisition method based on a polar diagram, which calculates the gradient direction or scale transformation to obtain an extreme value, and obtains the slope of a straight line in the polar diagram.

The third is based on a focus stack depth acquisition method, which estimates depth by sharpness evaluation and other characteristics of the focus stack.

However, the above-mentioned method for acquiring scene depth information from light field data still cannot perform depth reconstruction with high accuracy.

Disclosure of Invention

The invention aims to provide a regional self-adaptive matching light field data depth reconstruction method, which provides high-precision depth information for digital refocusing and three-dimensional scene reconstruction based on a light field.

In order to achieve the above object, the present invention provides a method for reconstructing depth of light field data by adaptive matching in different regions, which comprises the following steps:

step 110, defining a distance measure function between a center view and a pixel point to be matched in a view to be matched;

step 120, selecting different matching windows for pixel points to be matched in different regions, wherein the different regions comprise texture regions, smooth regions and edge shielding regions;

step 130, counting the number of correctly matched pixel points of the matching window as a distance measurement value of the window, and adding a smoothing factor to the smooth area;

step 140, the matching disparity is optimized and the scene depth is calculated.

Further, the "distance measure function" in the step 110 represents the following formula (2):

E _i，j (x，y，s)＝||L _u0，v0 (x，y)-L _ui，vj (x+is，y+js)|| (2)

in the formula (2), L _u0，v0 (x, y) represents a pixel value corresponding to the pixel point coordinate (x, y) in the central view; l is _ui，vj (x + is, y + js) represents a pixel value corresponding to a pixel point coordinate (x + is, y + js) in the view to be matched; the central view L _u0，v0 (x, y) as a central viewpoint (u) ₀ ，v ₀ ) The resulting image; the view L to be matched _ui，vj (x, y) is the central viewpoint (u) ₀ ，v ₀ ) The rest of the viewpoints (u) except _i ，v _j ) The resultant image; e _i，j (x, y, s) represents a distance measurement value of the pixel point coordinates (x, y) under the parallax s.

Further, step 120 specifically includes:

step 121, selecting a rectangular area with pixel points to be matched as a central point as a matching window in texture areas and smooth areas in a central view and a view to be matched; and

and step 122, selecting a window with the pixel points to be matched as the top points as a matching window in the edge shielding areas in the central view and the view to be matched.

Further, in step 121, the size of the matching window is selected as a rectangular region with a large difference between the gray values of the central point and the neighboring areas and located within the boundary of the occlusion region, which is expressed as the following formula (3):

in the formula (3), (s, t) represents neighborhood coordinates of pixel coordinates (x, y) in the central view, f (s, t) is a neighborhood pixel value, # represents the number of pixels, f (x, y) represents a pixel value of the pixel coordinates (x, y), and N (x, y) represents a rectangular region centered on the pixel coordinates (x, y).

Further, the matching window selected in step 121 is divided into 8 directions, the pixel points in the selected matching window are located in the same chrominance block, and a window in the matching direction in which the difference between the gray values of the pixel point to be matched and the pixel point in the window is large, which is represented by the following formula (4), is selected as a matching module:

in the formula (4), l represents the matching direction, W _l Representing a window with pixel points to be matched as vertexes in the direction l, f (x, y) representing the pixel values of pixel point coordinates (x, y) in the central view, and f (x ', y') representing W of the pixel point coordinates (x, y) in the central view _l Pixel values of neighborhood coordinates (x ', y').

Further, step 130 specifically includes:

step 131, counting the number of correct matching pixel points in a matching window in a non-smooth area in a central view, and taking the number as a distance measure function of the window; and

step 132, counting the number of correct matching pixel points in the matching window and the smoothing term in the smoothing area in the central view, and using the number and the smoothing term as a distance measure function of the window.

Further, the distance measurement value CONF (x, y, s) of the non-smooth region matching window in the step 131 is expressed by the following formula (7):

CONF(x，y，s)＝conf(x，y，s)，(x，y)∈W _R×R (7)

in formula (7), W _R×R Representing a matching window taking a pixel point (x, y) to be matched in the matching window as a center, taking R as a rectangular region with side length, and CONF (x, y, s) representing the pixel point (x, y) to be matched in the non-smooth region matching window in parallax sAnd (2) a distance metric value of conf (x, y, s) represents a distance metric value of a pixel point (x, y) to be matched in the matching window under the parallax s, and is obtained by the following formulas (5) and (6):

in the formulas (5) and (6), temp represents the number of correctly matched pixels in the matched window after statistics, E (x, y, s) represents the distance measurement value of the pixel (x, y) to be matched in the matched window under the parallax s, W represents the matched window, # [ W ] represents the number of pixels in the window, conf (x, y, s) represents the distance measurement value of the pixel (x, y) to be matched in the matched window under the parallax s.

Further, the distance measure CONF (x, y, s) of the smoothed region matching window in step 132 is expressed as the following formula (8):

CONF(x，y，s)＝βconf(x，y，s)+(β-1)α(x，y)，(x，y)∈W _RxR (8)

in the formula (8), CONF (x, y, s) represents a distance measurement value of the smooth region matching window,

and the difference value between the current parallax value and the next parallax value in the matching process is represented, and beta is a weight coefficient of the smoothing item.

Further, step 140 specifically includes:

step 141, according to the formula (7) and the formula (8), solving the window distance measure function to obtain a preliminary disparity map by using the following formula (9):

s(x，y)＝argmax(CONF(x，y，s)) (9)

step 142, optimizing a smooth area by adopting a TV model; and

and step 143, calculating a depth map Z according to the optimized disparity map s (x, y) of the formula (9).

By adopting the method provided by the invention, high-precision depth reconstruction can be realized under the four-dimensional light field theory.

Drawings

Fig. 1 is a flowchart of a light field data depth reconstruction method with regional adaptive matching according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating the relationship between parallax and depth in a light field;

FIG. 3a is a schematic diagram of matching a window with an area of an object to be occluded;

FIG. 3b is a schematic diagram of the selection of a region matching window for the occluded object;

FIG. 4 is a schematic view of an edge mask matching window orientation;

5 a-5 c are a disparity map and an error map solved according to the method provided by the present invention;

fig. 6a to 6d are disparity maps for solving other scenes according to the method provided by the present invention.

Detailed Description

The invention is described in detail below with reference to the figures and examples.

As shown in fig. 1, the method for reconstructing depth of light field data by sub-region adaptive matching provided in this embodiment includes the following steps:

and step 110, defining a distance measure function between the central view and the pixel points to be matched in the view to be matched.

And step 120, selecting different matching windows for the pixel points to be matched in different areas.

Step 130, counting the number of correctly matched pixel points of the matching window as a distance measurement value of the window, and adding a smoothing factor to the smooth area.

In one embodiment, step 110 specifically includes:

considering that the edge of the lens is likely to generate aberration, a viewpoint (u) at the center of the lens is generally selected ₀ ，v ₀ ) (hereinafter, simply referred to as the center viewpoint (u) ₀ ，v ₀ ) The resultant image as a center viewL _u0，v0 (x, y), center viewpoint (u) ₀ ，v ₀ ) The rest of the viewpoints (u) except _i ，v _j ) The resulting image is taken as the view L to be matched _ui，vj (x，y)。

And describing the corresponding relation between the two pixel points by using the difference of the gray values. Coordinates (x) of any pixel point in view to be matched _i ，y _j ) The correlation between the coordinate (x, y) of any pixel point in the central view and the coordinate (x, y) of any pixel point in the central view can be measured by a distance measure function E (x, y) represented by formula (1):

E(x，y)＝||L _u0，v0 (x，y)-L _ui，vi (x _i ，y _j )|| (1)

in the formula (1), E (x, y) represents the coordinate (x) of a pixel point in the view to be matched _i ，y _j ) Distance measurement values between the coordinate (x, y) of the pixel point in the central view; l is _u0，v0 (x，y)＝L(u ₀ ，v ₀ X, y) representing pixel values corresponding to the coordinates (x, y) of the pixel points in the central view; l is _ui，vj (x _i ，y _j )＝L(u _i ，v _j ，x _i ，y _j ) Which represents the coordinates (x) of the pixel points in the view to be matched _i ，y _j ) A corresponding pixel value; and | | | | | | is an operator for describing the distance, and an L2 norm is selected for calculation.

And introducing epipolar constraint, namely that pixel points with corresponding relation between the central view and the two images are positioned on the epipolar line. And (3) obtaining a distance measure function of the pixel point under any viewpoint and related to the parallax s, wherein the distance measure function is expressed as a formula (2):

E _i，j (x，y，s)＝||L _u0，v0 (x，y)-L _ui，vj (x+is，y+js)|| (2)

in the formula (2), E _i，j (x, y, s) represents a distance measure of the pixel coordinates (x, y) under parallax s, L _u0，v0 (x, y) represents a pixel value corresponding to the pixel point coordinate (x, y) in the central view; l is _ui，vj (x + is, y + js) represents a pixel value corresponding to a pixel point coordinate (x + is, y + js) in the view to be matched; u ═ i _i -u ₀ Denotes u in the u direction _i J-v, i _j -v ₀ Representing v in the v direction _j I.e. i-0, j-0 represents the central viewpoint (u) ₀ ，v ₀ ) By analogy, i equals 1, j equals 1 to denote the viewpoint (u) ₁ ，v ₁ )。

In one embodiment, step 120 specifically includes:

In one embodiment, step 121 specifically includes:

in the texture region and the smooth region of the central view, the rectangular region with the pixel point to be matched as the central point is selected as the matching window in the embodiment. The size of the matching window selects a rectangular region with larger difference between the central point and the adjacent gray value, and the rectangular region cannot exceed the boundary of the shielding region. Here, "smooth region" means that the region has no texture and the pixel value is the same size. "occlusion region" refers to texture differences in different views.

The discrimination formula of the difference between the central point and the neighborhood gray value is shown as formula (3):

in the formula (3), (s, t) represents neighborhood coordinates of pixel point coordinates (x, y) in the central view, f (s, t) is a neighborhood pixel value, f (x, y) represents a pixel value of the pixel point coordinates (x, y), N (x, y) represents a rectangular area with the pixel point coordinates (x, y) as the center, and the size of the rectangular area is determined by the value of Tn. And # represents the number of the samples, and Tn represents the judgment of whether the difference between the gray values in the neighborhood meets the matching requirement. The Tn is selected by considering the efficiency of the algorithm and the scene difference, experiments prove that the Tn is generally selected to have better matching precision of 0.65.

In one embodiment, step 122 specifically includes:

the matching window selected in step 121 is divided into 8 directions (as shown in fig. 4). In general, at least one window with the pixel points to be matched as vertexes exists in 8 directions, and matching characteristics such as textures and gradients in the window exist in other views to be matched. Selecting a window in a matching direction as a matching module, wherein the pixel point in the matching window is in the same chrominance block, and the difference between the pixel point to be matched and the gray value of the pixel point in the window with the pixel point to be matched as the vertex is larger, and the window is represented by the following formula (4):

in the formula (4), l represents the matching direction, W _l A window representing the l direction with the pixel point to be matched as the vertex (as shown in the light gray (nearly white) area of fig. 4), and (x ', y') representing the W of the pixel point coordinate (x, y) in the central view _l Neighborhood coordinates, f (x, y) representing the pixel value of a pixel point coordinate (x, y) in the central view, f (x ', y') representing W of a pixel point coordinate (x, y) in the central view _l Pixel values of neighborhood coordinates (x ', y').

In the above embodiments, the chrominance block refers to a chrominance block obtained by converting an RGB image of a scene into a YCbCr image, where a CbCr channel of the scene is a chrominance channel, and performing cluster segmentation by using multiple cbcrs.

In one embodiment, step 130 specifically includes:

and 131, counting the number of correct matching pixel points in the matching window in a non-smooth area in the central view as a distance measure function of the window. The non-smooth area comprises a shielding area and a texture area, and the distance measure of the window is the ratio of the number of correctly matched pixel points in the window to the total number of pixel points in the window.

Step 132, counting the number of correct matching pixel points in the matching window and the smoothing term in the smoothing area in the central view, and using the number and the smoothing term as a distance measure function of the window. The distance measure of the window is the ratio of the number of correctly matched pixel points in the window to the total number of pixel points in the window plus a smoothing term.

In one embodiment, step 131 is specifically as follows:

selecting a viewpoint (u) to be matched by using the area W of the central view as a matching window in the matching process _i ，v _j ) And (3) calculating the distance measurement value of the pixel point at the corresponding position in the matching window of the central view and the window in the selected view to be matched by using the formula (2) in the corresponding view which is the same as the area W, and considering that the pixel point is correctly matched under the condition that the distance measurement value is less than a threshold value zeta. Wherein the threshold value ζ is determined in accordance with an error of linear interpolation, which is for an exact match in a case where the parallax value is not an integral multiple of the pixel distance.

Counting the number of correctly matched pixel points in the matching window as a distance measurement value of the matching window, calculating the distance measurement value of the matching window by adopting the following formula (5), and normalizing the following formula (5) to obtain the following formula (6):

in the formulas (5) and (6), (x, y) represents the coordinates of the pixel points to be matched in the matching window, temp represents the number of the correctly matched pixel points in the counted matching window, E (x, y, s) represents the distance measurement value of the pixel points (x, y) to be matched in the matching window under the parallax s, W represents the matching window, # [ W ] represents the number of the pixel points in the window, conf (x, y, s) represents the distance measurement value of the pixel points (x, y) to be matched in the matching window under the parallax s.

The distance metric value CONF (x, y, s) matched at the non-smooth region window is expressed by the following formula (7):

CONF(x，y，s)＝conf(x，y，s)，(x，y)∈W _R×R (7)

in the formula (7), W _R×R The method comprises the steps of representing a matching window with pixel points (x, y) to be matched in the matching window as the center, representing a rectangular area with R as the side length, representing the distance measurement value of the pixel points (x, y) to be matched in the non-smooth area matching window under the parallax s by CONF (x, y, s), representing the distance measurement value of the pixel points (x, y) to be matched in the matching window under the parallax s by CONF (x, y, s), and obtaining CONF through calculation of formula (5) and formula (6).

In one embodiment, step 132 specifically includes:

in order to solve the problem of mismatching of the smooth region, a matching window is introduced, the difference value between the mean value of the parallax image pixel points corresponding to the correct matching pixel points and the current parallax value is used as a smooth term α (x, y), and then the distance measure CONF (x, y, s) of the smooth region matching window is expressed as the following formula (8):

CONF(x，y，s)＝βconf(x，y，s)+(β-1)α(x，y)，(x，y)∈W _R×R (8)

in equation (8), CONF (x, y, s) represents a distance measurement value of the smooth region matching window. Beta is a weight coefficient of a smoothing item, depends on a scene, and is best close to 0 if more smoothing areas exist in the scene; if the smooth area in the scene is small, the value of β is preferably close to 1, and therefore, the value of β is set in the range between 0 and 1, preferably 0.2.

In one embodiment, step 140 specifically includes:

and step 141, solving the window distance measure function to obtain a preliminary disparity map.

And 142, optimizing the smooth region by adopting a full variation TV model.

Step 143, calculate the depth map.

In one embodiment, step 141 specifically includes:

according to the formula (7) and the formula (8), solving the optimization problem to obtain a preliminary disparity map s (x, y) by using the following formula (9):

s(x，y)＝argmax(CONF(x，y，s)) (9)

in one embodiment, step 142 specifically includes:

a TV model is adopted to optimize a smooth area, the TV model can play a role in smooth filtering, and the edge structure can be better guaranteed.

In one embodiment, as shown in fig. 2, fig. 2 shows a diagram of parallax versus depth in a four-dimensional light field. In FIG. 2, u ₁ ，u ₂ Is the viewpoint on the (u, v) plane (v is 0), B is the viewpoint distance, A ₁ Is the image point corresponding to the object point A, and s is the viewpoint u of the object point A ₁ ，u ₂ Parallax on the (x, y) imaging plane.

Step 143 specifically includes: according to the parallax map s (x, y) optimized by the formula (9), the optimized parallax map s (x, y) is converted into a depth map Z by the formula (10), that is, the distance F between the microlens array and the main lens and the depth Z of the focal plane in a single exposure of the light field camera ₀ And the distance B between adjacent viewpoints is a fixed value, and the depth Z can be calculated according to the disparity s obtained by equation (9):

in the formula (10), Z ₀ Denotes the depth of the focal plane, F denotes the distance between the main lens and the imaging plane, B denotes the distance between adjacent viewpoints, Z denotes the depth of the spatial scene, and s has the same meaning as that expressed by s (x, y) in the above equation.

Fig. 3a and 3b are schematic diagrams illustrating an occlusion region selection matching window, where fig. 3a is an occlusion region and fig. 3b is an occluded region. Fig. 5a to 5c show an example of disparity solution and corresponding error analysis performed according to the method provided by the present invention, where fig. 5a is a central view, fig. 5b is a reconstructed disparity map, and fig. 5c is a local mean square error map. Fig. 6a to 6d illustrate examples of depth reconstruction according to the method provided by the present invention, wherein fig. 6a and 6c are both central views, and fig. 6b and 6d are both accurate disparity maps.

Finally, it should be pointed out that: the above examples are only for illustrating the technical solutions of the present invention, and are not limited thereto. Those of ordinary skill in the art will understand that: modifications can be made to the technical solutions described in the foregoing embodiments, or some technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit and scope of the present invention, which is defined by the appended claims.

Claims

1. A regional self-adaptive matching light field data depth reconstruction method is characterized by comprising the following steps:

step 130, counting the number of correctly matched pixel points of the matching window as a distance measurement value of the window, and adding a smoothing factor to the smooth area; step 120 specifically includes:

step 121, selecting a rectangular area with pixel points to be matched as a central point as a matching window in texture areas and smooth areas in a central view and a view to be matched; in step 121, the size of the matching window is selected as a rectangular region with a large difference between the central point and the neighboring gray-scale value and located within the boundary of the occlusion region, which is expressed as the following formula (3):

in the formula (3), (s, t) represents neighborhood coordinates of pixel coordinates (x, y) in the central view, f (s, t) is a neighborhood pixel value, # represents the number of pixels, f (x, y) represents a pixel value of the pixel coordinates (x, y), N (x, y) represents a rectangular area with the pixel coordinates (x, y) as the center, and Tn represents whether the difference between gray values in the neighborhood meets the matching requirement or not;

step 122, selecting a window with pixel points to be matched as vertexes as a matching window in the edge shielding area in the central view and the view to be matched;

2. The method for the depth reconstruction of the light field data with the adaptive matching according to claim 1, wherein the distance measure function in step 110 represents the following formula (2):

E _i，j (x，y，s)＝||L _u0，v0 (x，y)-L _ui，vj (x+is，y+js)|| (2)

in the formula (2), L _u0，v0 (x, y) represents a pixel value corresponding to the pixel point coordinate (x, y) in the central view; l is _ui，vj (x + is, y + js) represents a pixel value corresponding to a pixel point coordinate (x + is, y + js) in the view to be matched; the central view L _u0，v0 (x, y) as a central viewpoint (u) ₀ ，v ₀ ) The resultant image; the view L to be matched _ui，vj (x, y) is the central viewpoint (u) ₀ ，v ₀ ) The rest of the views (u) outside _i ，v _j ) The resultant image; e _i，j (x, y, s) represents a distance measurement value of the pixel point coordinates (x, y) under the parallax s.

3. The method according to claim 1, wherein the matching window selected in step 121 is divided into 8 directions, the selected pixels in the matching window are located in the same chroma block, and a window in the matching direction with a larger difference between the gray values of the pixels to be matched and the pixels in the window, as represented by the following formula (4), is selected as the matching module:

in the formula (4), l represents the matching direction, W _l A window which represents the direction l and takes pixel points to be matched as vertexes, f (x, y) represents the pixel value of the pixel point coordinates (x, y) in the central view, and f (x ', y') represents the central viewW of middle pixel point coordinate (x, y) _l Pixel values of neighborhood coordinates (x ', y').

4. The method for the depth reconstruction of the light field data with the adaptive matching in different areas according to claim 1, wherein the step 130 specifically comprises:

5. The method according to claim 4, wherein the distance metric CONF (x, y, s) of the non-smooth region matching window in the step 131 is represented by the following formula (7):

CONF(x，y，s)＝conf(x，y，s)，(x，y)∈W _R×R (7)

in the formula (7), W _R×R The method comprises the following steps of representing a matching window taking a pixel point (x, y) to be matched in the matching window as a center, taking R as a rectangular area with side length, representing a distance measurement value of the pixel point (x, y) to be matched in a non-smooth area matching window under parallax s by CONF (x, y, s), representing a distance measurement value of the pixel point (x, y) to be matched in the matching window under parallax s by CONF (x, y, s), and obtaining the distance measurement value through formula (5) and formula (6):

in the formulas (5) and (6), temp represents the number of correctly matched pixel points in the matched window after statistics, E (x, y, s) represents the distance measurement value of the pixel point (x, y) to be matched in the matched window under the parallax s, W represents the matched window, # [ W ] represents the number of the pixel points in the window, conf (x, y, s) represents the distance measurement value of the pixel point (x, y) to be matched in the matched window under the parallax s, and zeta represents the threshold value.

6. The method for the depth reconstruction of the light field data with the adaptive matching by regions according to claim 5, wherein the distance measure CONF (x, y, s) of the smooth region matching window in the step 132 is expressed as the following formula (8):

CONF(x，y，s)＝βconf(x，y，s)+(β-1)α(x，y)，(x，y)∈W _RxR (8)

in equation (8), CONF (x, y, s) represents a distance metric value of the smoothing region matching window, β is a weight coefficient of the smoothing term, and α (x, y) represents the smoothing term.

7. The method according to claim 6, wherein the step 140 specifically comprises:

s(x，y)＝argmax(CONF(x，y，s)) (9)

step 142, optimizing a smooth area by adopting a TV model; and