CN109360235B

CN109360235B - Hybrid depth estimation method based on light field data

Info

Publication number: CN109360235B
Application number: CN201811151940.5A
Authority: CN
Inventors: 李贺建; 黄建民; 沙龙; 张镇军; 刘鹏程; 冉广奎; 王贺
Original assignee: AVIC Shanghai Aeronautical Measurement Controlling Research Institute
Current assignee: AVIC Shanghai Aeronautical Measurement Controlling Research Institute
Priority date: 2018-09-29
Filing date: 2018-09-29
Publication date: 2022-07-19
Anticipated expiration: 2038-09-29
Also published as: CN109360235A

Abstract

The invention discloses a mixed depth estimation method based on light field data. The method comprises the following steps: performing mixed depth estimation based on parallax and fuzzy detection on focus stack images, sub-images and full-focus images acquired by data acquired by a light field camera by adopting a mixed depth estimation method, performing parallax analysis on the sub-images with angle difference by adopting a fusion matching method, performing focus detection on the focus stack images acquired simultaneously to acquire depth images, and fusing the depth images acquired based on parallax and the depth images acquired based on focusing by adopting a classification and self-adaptive mixing method; and performing depth map fusion after up-sampling the sub-images. The method can obtain the scene depth information by only using one light field camera, and has simple algorithm and high accuracy.

Description

Hybrid depth estimation method based on light field data

Technical Field

The invention relates to the technical field of stereo image processing, in particular to a mixed depth estimation method based on light field data.

Background

Depth estimation is a key technology for three-dimensional reconstruction, and is a key intermediate step in the fields of three-dimensional display, machine vision, virtual reality, intelligent vision and the like. The depth measurement or depth estimation adopts a manual measurement method in the early stage, more accurate laser measurement appears later, and the depth estimation based on images becomes a research hotspot along with the development of image processing technology, so that the development of non-contact measurement technology and three-dimensional reconstruction technology is promoted. The depth estimation based on the images has the characteristics of simple acquisition equipment, easiness in integration and the like, the two-dimensional limitation of images shot by the original camera is expanded, the depth estimation enables the mobile equipment to have a three-dimensional function, the mobile equipment becomes a key technology in applications such as stereoscopic televisions and virtual reality, and the like, and the depth estimation is widely concerned and researched. However, the depth estimation algorithm based on the image still faces the difficulties of high algorithm complexity, further improvement on accuracy and the like.

Existing depth estimation methods include active laser ranging, TOF camera, structured light depth acquisition, and passive multi-view image-based depth estimation methods, as well as the latest light field camera-based methods. The active mode has high acquisition speed and high accuracy, but has the defect of complex equipment of a measurement system; the structured light depth acquisition method has high requirements on environmental illumination conditions; the TOF camera method adopts ray measurement, and cannot give correct depth to an object with small reflectivity; the passive method is based on visual cue for analysis, the depth perception of human vision integrates various depth cues such as blur, parallax, motion, color and the like, and the blur and parallax cues are most commonly used in the field of computers.

The light field camera is a novel camera which inherits a micro lens array, and records light ray two-dimensional information and light ray angle information. The light field imaging technology can realize a series of special imaging effects such as full-focus image synthesis, depth of field expansion, digital refocusing and the like. And calculating the data of the fuzzy clues needing different focusing depths of the same viewpoint, evaluating and measuring the focusing degree of each pixel point in the image by using a focusing detection function, and calculating the focusing pixel in the image. The depth estimation algorithm based on the parallax is based on a binocular parallax principle to carry out depth estimation, simplifies video acquisition equipment, does not need additional equipment, but is high in algorithm complexity and still needs to be improved in accuracy.

Disclosure of Invention

The invention aims to provide a low-complexity mixed depth estimation method based on light field data, which can accurately obtain scene depth information by using only one light field camera.

The technical solution for realizing the purpose of the invention is as follows: a hybrid depth estimation method based on light field data comprises the following steps:

step 1, acquiring light field data: acquiring four-dimensional light field image data through a light field camera, and processing to obtain a scene focus stack image, a sub-image, a full focus image and a focus parameter, wherein the focus parameter comprises a white image calibration parameter and a sub-image calibration parameter;

step 2, enhancing parallax matching of the CT-SAD algorithm: selecting a horizontally arranged camera to simulate the horizontally arranged camera based on the subimages, and performing stereo matching of left and right viewpoints, including enhanced CT conversion, SAD calculation, CT-SAD matching cost fusion, cost accumulation, parallax estimation, parallax post-processing and depth conversion to obtain an initial resolution depth image;

and 3, upsampling based on bilateral filtering: combining the full-focus image and the initial resolution depth image obtained in the parallax matching in the step 2, and performing an up-sampling process based on bilateral filtering to obtain a first depth image;

and 4, depth estimation based on focus detection: analyzing the focal stack image by adopting a gradient algorithm, obtaining a focusing pixel layer by adopting a two-dimensional gradient algorithm, and performing depth-of-field conversion by utilizing a focusing parameter to obtain the depth of the focusing pixel layer; repeatedly carrying out focusing detection until all focusing layers are detected to obtain a second depth map;

step 5, depth self-adaptive mixing: firstly, equalizing the depth grading of a first depth map obtained in a parallax matching mode and a second depth map obtained in a focusing detection mode, obtaining a substitution value according to a median approach principle, then performing cost fusion in a self-adaptive parameter mode, minimizing the cost to obtain a depth grading map, and performing depth conversion on the depth grading map by using grading equalization parameters to obtain a final depth map;

step 6, visualization of the depth map: and quantizing the final depth map obtained by the depth grading, and adjusting the depth display range to be in the range of 0-255.

Further, the enhanced CT transform in step 2 is specifically as follows:

the CT transformation process is formulated as follows:

l, R is the conversion result of left image and right image, the conversion window size is M N, the variables M, N are integer and cover the whole window M N; r (u, v), l (u, v) are the right and left image pixel values at (u, v), respectively; xi (-) is a comparison relation, and the calculation method is like xi (p)₁,p₂) Shown in a formula; p is a radical of₁、p₂For the value of the original pixel, it is,

the window area pixel average, (u, v) the pixel coordinates,

indicating that each of the obtained comparison values is ordered;

the Census transform yields a corresponding array representing the transformed center pixel by comparing the center pixel with surrounding pixels within an mxn window, thus yielding two images R and L represented by the Census transform.

Further, the enhanced CT-SAD algorithm in step 2 specifically comprises the following steps:

combining the SAD method based on luminance with the Census transform method, using C_SADRepresenting the matching cost of SAD, C_SADBy passingCalculating the difference value of the brightness and obtaining:

wherein, formula C_SAD(μ, v, d) represents the cumulative SAD matching cost for a pixel with disparity d at (u, v), L × K is the cumulative window constant, I_rAnd I_lPixel values corresponding to the right view and the left view are d;

obtaining matching cost C based on Census transformation by comparing the Hamming distance of Census transformation values of corresponding pixels at the same positions in the right image and the left image_CTThen passes the parameter (rho) by two initial matching costs_SAD,ρ_CT) Obtaining the matching cost C of the pixel by fusion_CT-SAD(p,d)：

C_CT-SAD(p,d)＝ρ_SADC_SAD(p,d)+ρ_CTC_CT(p,d)

Wherein C_CT-SADP represents a (u, v) point pixel for the matching cost of the parallax d position obtained based on the parallax principle of the subimage; ρ is a unit of a gradient_SAD、ρ_CTRespectively fusing cost parameters; c_SAD(p,d)、C_CT(p, d) are SAD matching cost and CT matching cost of the pixel p in the parallax d respectively;

carrying out matching cost accumulation in a pixel related region, and obtaining a new matching cost C after the cost accumulation is finished_aggThen, the matching cost C is obtained based on the principle of the parallax of the sub-image_bComprises the following steps:

wherein, C_agg(μ, v, d) is an accumulated cost of the (μ, v) point pixel when the parallax is d, and m, n is an accumulated window size;

and performing minimum optimization on the matching cost by adopting a WTA optimization method to obtain a parallax matching value, and searching a corresponding pixel position with the minimum matching cost to obtain a corresponding parallax map.

Further, the disparity post-processing in step 2 is to perform denoising processing on the disparity map, which is specifically as follows:

performing median filtering by adopting a k multiplied by l window to obtain a disparity value Di;

the parallax map is represented by pixels, and the parallax is converted into an initial resolution depth image Z' b by using camera calibration parameters and a parallax conversion relation.

Further, the upsampling based on bilateral filtering in step 3 specifically includes the following steps:

by adopting bilateral upsampling combined with an original texture map, the expression of the joint bilateral upsampling is as follows:

wherein

Is the up-sampled pixel value; p is a radical of formula_s(Y) is the pixel of the small-size image, namely the pixel value of the large resolution corresponding to the new position of the small-size image after up-sampling;

for original large-sized pixel values, X_s,Y_sThe spatial positions of two pixel points in the small-size image are obtained; x, Y is X in large-size image_s、Y_sA corresponding spatial position; omega is a kernel function support domain; k is a radical of_XThe sum of the products of the distance weight r and the pixel difference weight d, i.e. the normalization parameter, is given by the following formula:

wherein, the pixel difference weight d is the brightness factor in the physical sense, and the pixel difference weight d is separated into the brightness factors d in the horizontal and vertical directions_h、d_v：

Wherein the content of the first and second substances,

is the (x, y) point pixel value,

pixel values in the horizontal and vertical directions aligned with the (x, y) point, respectively;

adopting the pixel value of the original texture image as a parameter for generating pixel difference weight, and performing directional separation operation; the distance weight r, i.e. the scale factor, is an operation in the window of the initial low-resolution depth image, and is used for obtaining a high-resolution image by upsampling, and the distance of the initial low-resolution is used as a ranging mode, and the formula is as follows:

wherein, r (X)_s,Y_s) Is the distance weight, δ (X)_s,Y_s) Is a distance, σ_rIs a distance weight parameter;

obtaining a first depth image Z based on the parallax of the sub-image after up-sampling_b。

Further, the depth estimation based on focus detection described in step 4 specifically includes the following steps:

the method comprises the steps of taking different focused images obtained by a light field camera, namely focus stack images, as input data of depth estimation, evaluating and measuring the focusing degree of each pixel point in the images by using a focusing detection function, calculating intermediate focusing pixels in the images, calculating clear pixels in fixed focus images, and obtaining the depth corresponding to the pixels by using a lens imaging principle;

the lens imaging principle formula is as follows:

wherein, f is the focal length, v is the image distance, and the object distance Z is the depth data corresponding to the pixel;

the degree of focus F (p, L) in focus detection is expressed by a gradient and a threshold value, and the formula is as follows:

wherein

The gradient of the candidate L image at the point p; l represents images on different focus points, and the depth of field is L in layers; t is t_LThe gradient threshold value is automatically obtained through a roberts edge extraction algorithm; a is a normalized parameter, and W is a cumulative window of N x N;

by a minimization process

Obtaining an L value corresponding to the minimum F value, and taking the L value as the depth value Z 'before conversion of the point p'_f(p); the focused image is then converted to a depth map Z using image acquisition transformation parameters_f。

Further, the depth adaptive mixing described in step 5 is specifically as follows:

according to the difference of the characteristics of a first depth map obtained in a parallax matching mode and a second depth map obtained in a focusing detection mode, introducing characteristic analysis of a smooth region and an edge region of a full-focusing image, respectively assigning values, unifying a parallax range and a focusing level to an L-dimension representation, performing self-adaptive weight distribution by using matching cost and fuzzy measurement parameters corresponding to the same depth in the L-dimension, and finally respectively filling the smooth region, the edge and other regions with respective depth values;

the smooth region estimation in the full-focus image adopts a gradient mean square error method, the size of a neighborhood window in the gradient image is W-N, and the mean square error S is as follows:

wherein p is_centerIs the smooth window center pixel position;

a gradient of p point pixel values I (p);

labeling a smooth region in the image with the variance S and a threshold η:

wherein, I_smoothRepresents a smooth region, the smooth region being represented by 1, the other regions being represented by 0; s_maxIs the maximum value of the variance;

for the acquisition of image edge pixels, a sobel edge operator is adopted to obtain an edge I_edge；

For other regions than smooth and edge, adaptive fusion parameters λ are used_pAs a weight of focusing, the formula is as follows:

wherein there are L candidate depths, F_i,minAnd C_i,minRespectively representing the minimum value of the cost of the focus clue and the parallax clue of the p point in different candidate depths, sigma_fAnd σ_bF (p, i) and C (p, i) are depth focusing degree of p points at i and cost value of parallax clues respectively;

and integrating smoothing, edge and other areas to carry out depth correction, wherein the smooth area selects depth obtained based on parallax, the edge area selects depth obtained by focusing detection, and the other areas are determined by adopting a cost self-adaptive weight method, and the formula is as follows:

wherein Z' (p) is the depth value after fusion, Z_b(p) and Z_f(p) depth values obtained by the parallax clue and the focusing clue respectively; lambda [ alpha ]_pFor adaptive parameters, I_smooth、I_edgeOthers denote smooth regions, edges, and other regions in addition thereto, respectively;

further filtering the depth map by adopting a formula

And denoising the obtained depth map.

Further, the visualization of the depth map in step 6 is specifically as follows:

and quantizing the final depth map obtained by depth grading, and adjusting the depth display range to be within a range of 0-255, wherein the depth inverse quantization formula is as follows:

in the formula, z_maxAnd z_maxMaximum and minimum depth of field in the scene, z, respectively_pFor the corresponding depth value of the pixel obtained from the parallax, I_dI.e. an 8-bit grey scale representation corresponding to the depth value.

Compared with the prior art, the invention has the remarkable advantages that: (1) scene depth information can be obtained by only using one light field camera, and the device is simple and low in cost; (2) and the optimized depth map is obtained by mixing the dual clues of parallax and focusing by adopting a simplified algorithm suitable for hardware design, and the algorithm is simple and has high accuracy.

Drawings

Fig. 1 is a flow chart of a hybrid depth estimation method based on light field data according to the present invention.

Detailed Description

The invention relates to a mixed depth estimation method based on light field data, which comprises the following steps:

step 2, enhancing parallax matching of the CT-SAD algorithm: selecting a sub-image of a horizontal position to simulate a horizontally placed camera based on the sub-image, and performing stereo matching of left and right viewpoints, including enhanced CT transformation, SAD calculation, CT-SAD matching cost fusion, cost accumulation, parallax estimation, parallax post-processing and depth conversion to obtain an initial resolution depth image;

step 6, visualization of the depth map: the final depth map resulting from the depth hierarchy is quantized and the depth display range is adjusted to be in the range of 0-255.

Further, the enhanced CT transform in step 2 is specifically as follows:

the CT transformation process is formulated as follows:

l, R is the conversion result of left image and right image, the conversion window size is M N, the variables M, N are integer and cover the whole window M N; r (u, v), l (u, v) are the right and left image pixel values at (u, v), respectively; xi (·) is a comparison relation, and the calculation method is like xi (p)₁,p₂) Shown in a formula; p is a radical of formula₁、p₂For the value of the original pixel, it is,

the window area pixel average, (u, v) the pixel coordinates,

indicating that each comparison value obtained is orderly arranged;

combining the SAD method based on luminance with the Census transform method, using C_SADRepresenting the matching cost of SAD, C_SADObtaining by calculating the difference value of the brightness:

obtaining a matching cost C based on Census transformation by comparing the Hamming distance of Census transformation values of corresponding pixels at the same positions in the right image and the left image_CTThen the matching cost is passed through the parameter (rho) by two initializations_SAD,ρ_CT) Obtaining the matching cost C of the pixel by fusion_CT-SAD(p,d)：

C_CT-SAD(p,d)＝ρ_SADC_SAD(p,d)+ρ_CTC_CT(p,d)

Wherein C_CT-SADP represents a (u, v) point pixel for the matching cost of the parallax d position obtained based on the parallax principle of the subimage; ρ is a unit of a gradient_SAD、ρ_CTFusing parameters by respective costs; c_SAD(p,d)、C_CT(p, d) are SAD matching cost and CT matching cost of the pixel p in the parallax d respectively;

Further, the disparity postprocessing in step 2 is to perform denoising processing on the disparity map, which is specifically as follows:

median filtering by adopting a k multiplied by l window to obtain a disparity value Di;

the disparity map is represented by pixels, and the disparity is converted into an initial resolution depth image Z' b by using camera calibration parameters and a disparity conversion relation.

wherein

Is the up-sampled pixel value; p is a radical of_s(Y) is the pixel of the small-size image, namely the pixel value of the large resolution corresponding to the new position of the small-size image after up-sampling;

for original large-sized pixel values, X_s,Y_sThe space positions of two pixel points in the small-size image are obtained; x, Y is the sum of X in large-size image_s、Y_sA corresponding spatial position; omega is a kernel function support domain; k is a radical of_XThe sum of the products of the distance weight r and the pixel difference weight d, i.e. the normalization parameter, is given by the following formula:

wherein the pixel difference weight d is a brightness factor in physical sense, and is separated into a brightness factor d in horizontal and vertical directions_h、d_v：

Wherein the content of the first and second substances,

is the (x, y) point pixel value,

taking differently focused images obtained by a light field camera, namely focal point stack images, as input data of depth estimation, evaluating and measuring the focusing degree of each pixel point in the images by using a focusing detection function, calculating intermediate focusing pixels in the images, calculating clear pixels in fixed focus images, and obtaining the depth corresponding to the pixels by using a lens imaging principle;

the lens imaging principle formula is as follows:

wherein f is a focal length, v is an image distance, and the object distance Z is depth data corresponding to the pixel;

wherein

The gradient of the candidate L image at the point p; l represents images on different focus points, and the depth of field is L in layers; t is t_LAutomatically acquiring a gradient threshold value through a roberts edge extraction algorithm; a is a normalization parameter, and W is a cumulative window of N × N;

by a minimization process

according to the difference of the characteristics of a first depth map obtained in a parallax matching mode and a second depth map obtained in a focusing detection mode, introducing characteristic analysis of a smooth region and an edge region of a full focusing image, respectively assigning values, unifying a parallax range and a focusing level to an L-dimension representation, carrying out self-adaptive weight distribution by using matching cost and fuzzy measurement parameters corresponding to the same depth in the L-dimension, and finally respectively filling the smooth region, the edge and other regions with respective depth values;

wherein p is_centerIs the smooth window center pixel position;

a gradient of p point pixel values I (p);

labeling a smooth region in the image with the variance S and a threshold η:

wherein there are L candidate depths, F_i,minAnd C_i,minRespectively representing the minimum value of the cost of the focusing clue and the parallax clue of the p point in different candidate depths, sigma_fAnd σ_bF (p, i) and C (p, i) are the depth focusing degree of p points at i and the cost value of the parallax clue respectively;

wherein Z' (p) is the depth value after fusion, Z_b(p) and Z_f(p) depth values obtained by the parallax clue and the focusing clue respectively; lambda_pAs adaptive parameter, I_smooth、I_edgeOthers denote smooth regions, edges, and other regions in addition thereto, respectively;

further filtering the depth map by adopting a formula

And denoising the obtained depth map.

The invention is described in further detail below with reference to the figures and specific examples.

Example 1

The mixed depth estimation method based on the light field data is used for acquiring the data of a basic light field camera by adopting the prior art, a light field image can be shot by using a Lytro camera, a focus stack image and a full focus image are extracted by using Lytro desktop software, the focus depth is taken as a parameter, the step length is taken as 0.2, and the focus stack image in the maximum depth of field range is extracted from the foreground to the background. The sub-image is obtained using matlab tool library LFToolbox 0.4.

With reference to fig. 1, a method for estimating hybrid depth based on light field data includes the following steps:

step 1: acquiring light field data: acquiring four-dimensional light field image data through a light field camera, and processing to obtain a scene focus stack image, a sub-image, a full focus image and a focus parameter, wherein the focus parameter comprises a white image calibration parameter and a sub-image calibration parameter;

the sub-image calibration is performed on the basis that the micro lens can be regarded as acquisition of different visual angles, and can be used for depth estimation based on parallax error; the focus parameters are used for depth conversion for focus detection.

And 2, step: enhancing the parallax matching of the CT-SAD algorithm: selecting a camera horizontally placed by simulating the subimage of the horizontal position based on the subimage, and performing stereo matching of left and right viewpoints, wherein the stereo matching comprises the functions of enhanced CT transformation, SAD calculation, CT-SAD matching cost fusion, cost accumulation, parallax estimation, parallax post-processing and depth conversion;

reading the number pairs of sub-images, converting the video format, if the data is RGB data, setting to execute the RGB-to-gray function, and if the data is YUV data, setting to execute the Y data extraction function;

the RGB to gray formula is:

Gray＝R*0.299+G*0.587+B*0.114

CT conversion is carried out on the original image data, and the formula is as follows:

wherein L, R are the conversion results of the left image and the right image, respectively, and the conversion window size is M × N, p₁、p₂Is the value of the original pixel(s),

is the average value of the pixels in the window area, u and v are the coordinates of the pixels,

indicating that each of the obtained comparison values is ordered.

Census transform obtains a corresponding array for representing the converted center pixel by comparing the center pixel with surrounding pixels within an mxn window.

The size value of the window M is 9, the size value N is 9, the data after CT transformation adopts Hamming distance to carry out matching cost calculation to obtain the matching cost C based on CT transformation_CT。

Combining the luminance-based SAD method with Census transform method, SAD is used to compensate some deficiencies in Census transform, such as errors in texture repeat regions; with C_SADRepresentation SADMatching cost of C_SADObtaining by calculating the difference value of the brightness:

obtaining a matching cost C based on Census transformation by comparing the Hamming distance of Census transformation values of corresponding pixels_CTThen passes the parameter (rho) by two initial matching costs_SAD，ρ_CT) And obtaining the matching cost C of the pixel by fusion: in this example, (1,1) is shown.

C_CT-SAD(p,d)＝ρ_SADC_SAD(p,d)+ρ_CTC_CT(p,d)

Wherein C_bP represents a (u, v) point pixel for a matching cost obtained based on a sub-image parallax principle;

matching cost accumulation is carried out in the pixel related area, the matching performance of the matching cost is further improved, and a new matching cost C is obtained after the cost accumulation is finished_aggThen, the matching cost C is obtained based on the principle of the parallax of the sub-image_bComprises the following steps:

C_band m-n-8 in the accumulation is taken as the matching cost obtained based on the parallax principle of the sub-image.

Further, the disparity post-processing, that is, denoising the disparity map, is specifically implemented as follows:

median filtering with an mxn window, the formula is as follows:

filtering the median value m-n-8 to obtain a disparity value Dn; the parallax map is represented by pixels, and parallax is converted into a low-resolution depth map Z 'by utilizing camera calibration parameters and parallax conversion relation'_b。

And step 3: bilateral filtering based upsampling: in the process of acquiring the sub-image by the light field data, the pixel resolution is reduced, and an up-sampling process is required; the light field full-focusing image has clear texture characteristics, and the quality of a depth map can be improved; combining the color image of the light field full focus and the small-resolution depth image obtained by parallax estimation, and performing an up-sampling process based on bilateral filtering to obtain a high-definition depth image; in this way, the full-focus texture map is used as a reference, bilateral filtering upsampling is carried out on the depth map, and information such as edges can be effectively reserved; x, Y is the pixel location, X, of the fully focused image_s、Y_sFor the depth map pixel point, the up-sampling process is completed by using the following formula:

wherein

Is the up-sampled pixel value, p_s(Y) is a pixel of the depth map,

the pixel value of the fully focused image corresponding to its new position after upsampling.

The pixel values of the full-focus image are obtained, and Xs and Ys are the spatial positions of two pixel points in the depth image; x, Y are the spatial locations in the fully focused image corresponding to Xs, Ys; Ω is the kernel support domain. k is a radical of_XThe sum of the products of the distance weight r and the pixel difference weight d, i.e. the normalization parameter, is given by the following formula:

where d is a luminance factor, which is separated into luminance factors in both horizontal and vertical directions:

adopting the pixel value of the large-resolution original texture image as a parameter for generating pixel difference weight, and performing directional separation operation;

the scale factor r is operated by adopting a depth image with small resolution, the pixel distance with small resolution is used as a distance measurement mode, and the formula is as follows:

obtaining a high-resolution depth map Z based on the parallax of the sub-image after up-sampling_b。

For the control of the upsampling quality, the upsampling equation has 3 parameters, i.e. the Ω support domain, i.e. the radial window n of the filter, and also a parameter σ_r、σ_d(ii) a The larger the values of the window radius and sigma are, the larger the smoothing effect is, otherwise, the smaller the smoothing effect is; in order to suppress noise in the up-sampling process, noise suppression processing can be performed by selecting a value of σ, and a larger luminance parameter σ is taken for a noise signal point_dAnd the pixel brightness parameter sigma of the smooth region and the edge region_dAs close to zero as possible. Therefore, the brightness factor is set in a self-adaptive mode to achieve the purpose of suppressing noise; the interpolation of the corresponding point pixel and the average value of the filter window pixel is used as sigma_dThe formula is as follows:

σ_d＝D(x,y)＝|I(x,y)-I_mean(x,y)|

and 4, step 4: depth estimation based on focus detection: analyzing the focal stack image by adopting a gradient algorithm, obtaining a focusing pixel layer by adopting a two-dimensional gradient algorithm, and performing depth-of-field conversion by utilizing a focusing parameter to obtain the depth of the focusing pixel layer; repeating the focusing detection until all focusing layers are detected; the focus parameters are used for depth conversion of the focus stack hierarchy.

The degree of focus F (p, L) in the focus detection is expressed by the gradient and the threshold value, and the formula is as follows:

wherein

Gradient at point p, t, for candidate L depth_LAutomatically acquiring a gradient threshold value through a roberts edge extraction algorithm; a is a normalization parameter, W is an accumulation window of N x N, and the value N is 8;

the focusing point has small focusing degree F value according to the formula, and the minimization process

Obtaining an L value corresponding to the minimum F value as a pre-conversion depth value Z 'of the point p'_f(p); the focused image is then converted to a depth map Z using image acquisition transformation parameters_f. The depth conversion is obtained by using a focusing parameter and an image distance, the image distance is a constant, and the focusing parameter is a corresponding relation between the focusing parameter and a focal length obtained through manual measurement and curve fitting and is used for depth conversion of a focusing pixel.

And 5: depth adaptive mixing: the method comprises the steps of firstly balancing depth grading in a parallax matching mode and a focusing detection mode, obtaining a substitution value according to a median approach principle, then performing cost fusion in a self-adaptive parameter mode to obtain a depth grading graph with minimized cost, and performing depth conversion on the depth grading graph by using grading balance parameters to obtain the depth graph.

According to the fact that the depth map based on the parallax is different from the depth map obtained based on focusing in characteristic, characteristic analysis of a smooth region and an edge region of a full-focus image is introduced, values are assigned respectively, a parallax range and a focusing level are unified to L-dimensional representation, adaptive weight distribution is conducted by utilizing corresponding matching cost and fuzzy measurement parameters, and finally the smooth region, the edge and other regions are filled with corresponding depth numerical values respectively, so that the accuracy of the whole depth map is improved;

the smooth region in the image is labeled with the variance S and the threshold η, and η is taken to be 0.3.

For the acquisition of image edge pixels, a sobel edge operator is adopted to obtain an edge I_edge. For other regions than smooth and edge, such as texture rich regions, adaptive fusion parameter λ is used_pAs the weight of focusing, the formula is as follows:

and depth correction is carried out on the smoothing region, the edge region and other regions comprehensively, the depth obtained based on parallax is selected from the smoothing region, the depth obtained by focusing detection is selected from the edge region, and other regions such as a texture-rich region are determined by a cost self-adaptive weight method, wherein the formula is as follows:

further filtering the depth map by adopting a formula

And denoising the obtained depth map.

Step 6: in the visualization of depth maps, the range of the depth map hierarchy obtained is generally small due to the limitations of the parallax hierarchy and the focus hierarchy, and in order to obtain the depth of visualization, the depth map obtained by the depth hierarchy is quantized and the depth display range is adjusted to be in the range of 0 to 255.

Since the range of the parallax and focus stack hierarchies is limited, in order to obtain a visualized depth map, the depth hierarchy is quantized and corresponds to the range of 0-255 of the gray scale.

The inverse quantization formula for depth is:

in the formula Z_maxAnd Z_minMaximum and minimum depth of field in the scene, Z, respectively_pFor the corresponding depth value of the pixel obtained from the parallax, I_dI.e. an 8-bit grey scale representation corresponding to the depth value.

In conclusion, the scene depth information can be obtained by only one light field camera, the equipment is simple, and the cost is low; in addition, the optimized depth map is obtained by mixing double clues of parallax and focusing by adopting a simplified algorithm suitable for hardware design, and the algorithm is simple and high in accuracy.

Claims

1. A mixed depth estimation method based on light field data is characterized by comprising the following steps:

step 1, acquiring light field data: acquiring four-dimensional light field image data through a light field camera, and processing the four-dimensional light field image data to obtain a scene focus stack image, a sub-image, a full-focus image and a focus parameter, wherein the focus parameter comprises a white image calibration parameter and a sub-image calibration parameter;

step 6, visualization of the depth map: quantizing the final depth map obtained by depth grading, and adjusting the depth display range to be within the range of 0-255;

the enhanced CT transform described in step 2 is specifically as follows:

the CT transformation process is formulated as follows:

l, R is the conversion result of left image and right image, the conversion window size is M N, the variables M, N are integer and cover the whole window M N; r (u, v), l (u, v) are the right and left image pixel values at (u, v), respectively; xi (·) is a comparison relation, and the calculation method is like xi (p)₁,p₂) The formula is shown; p is a radical of₁、p₂Is the value of the original pixel(s),

is the average value of the pixels in the window area, (u, v) is the pixel coordinate,

indicating that each of the obtained comparison values is ordered;

census conversion is carried out in an MxN window, a corresponding array is obtained by comparing a central pixel with surrounding pixels and is used for representing the converted central pixel, and thus two images R and L represented by the Census conversion value are obtained;

the enhanced CT-SAD algorithm described in step 2 specifically includes the following steps:

combining the SAD method based on luminance with the Census transform method, using C_SADRepresenting the matching cost of SAD, C_SADObtaining by calculating a difference value of brightness:

wherein, formula C_SAD(μ, v, d) represents the cumulative SAD matching cost for a pixel with disparity d at (u, v), L × K is the cumulative window constant, I_rAnd I_lThe disparity is d pixel values corresponding to a right view and a left view;

C_CT-SAD(p,d)＝ρ_SADC_SAD(p,d)+ρ_CTC_CT(p,d)

Wherein C_CT-SADP represents a (u, v) point pixel for the matching cost of the parallax d position obtained based on the parallax principle of the subimage; rho_SAD、ρ_CTFusing parameters by respective costs; c_SAD(p,d)、C_CT(p, d) are SAD matching cost and CT matching cost of the pixel p in the parallax d respectively;

performing minimum optimization on the matching cost by adopting a WTA optimization method to obtain a parallax matching value, and searching a corresponding pixel position with the minimum matching cost to obtain a corresponding parallax map;

the parallax post-processing in step 2 is to perform denoising processing on the parallax image, and specifically includes:

the parallax image is represented by pixels, and parallax is converted into an initial resolution depth image Z' b by utilizing a camera calibration parameter and a parallax conversion relation;

step 3, the upsampling based on the bilateral filtering is specifically as follows:

wherein

Wherein the content of the first and second substances,

is the (x, y) point pixel value,

wherein r (X)_s,Y_s) Is the distance weight, δ (X)_s,Y_s) Is a distance, σ_rIs a distance weight parameter;

obtaining a first depth image Z based on the parallax of the sub-image after up-sampling_b；

The depth estimation based on focus detection described in step 4 specifically includes the following steps:

the lens imaging principle formula is as follows:

wherein

The gradient of the candidate L image at the point p; l represents images on different focus points, and the depth of field is L in level; t is t_LAutomatically acquiring a gradient threshold value through a roberts edge extraction algorithm; a is a normalization parameter, and W is a cumulative window of N × N;

by a minimization process

Obtaining an L value corresponding to the minimum F value, and taking the L value as the depth value Z 'before conversion of the point p'_f(p); the focused image is then converted to a depth map Z using image acquisition transformation parameters_f；

The depth adaptive mixing in step 5 is specifically as follows:

the smooth region estimation in the full focus image adopts a gradient mean square error method, the size of a neighborhood window in the gradient image is W-N, and the mean square error S is as follows:

wherein p is_centerIs the smooth window center pixel position;

a gradient of p point pixel values I (p);

labeling smooth regions in the image with the variance S and the threshold η:

For other regions than smooth and marginal, adaptive fusion parameter λ is used_pAs the weight of focusing, the formula is as follows:

and depth correction is carried out on the smoothing region, the edge region and other regions comprehensively, the depth obtained based on parallax is selected in the smoothing region, the depth obtained by focusing detection is selected in the edge region, and the depth obtained by focusing detection is determined in the other regions by adopting a cost self-adaptive weight method, wherein the formula is as follows:

wherein Z' (p) is the depth value after fusion, Z_b(p) and Z_f(p) respectively representing depth values obtained by the parallax clue and the focusing clue; lambda [ alpha ]_pAs adaptive parameter, I_smooth、I_edgeOthers denote smooth regions, edges, and other regions in addition thereto, respectively;

further filtering the depth map by adopting a formula

And denoising the obtained depth map.

2. The light-field-data-based hybrid depth estimation method according to claim 1, wherein the visualization of the depth map in step 6 is specifically as follows: