CN105551050B

CN105551050B - A kind of image depth estimation method based on light field

Info

Publication number: CN105551050B
Application number: CN201511019609.4A
Authority: CN
Inventors: 王兴政; 许晨雪; 王好谦; 张永兵; 李莉华; 戴琼海
Original assignee: Shenzhen Weilai Media Technology Research Institute; Shenzhen Graduate School Tsinghua University
Current assignee: Shenzhen Weilai Media Technology Research Institute; Shenzhen Graduate School Tsinghua University
Priority date: 2015-12-29
Filing date: 2015-12-29
Publication date: 2018-07-17
Anticipated expiration: 2035-12-29
Also published as: CN105551050A

Abstract

The present invention relates to a kind of new image depth estimation methods based on light field.This method is mainly made of three piths：Light field initial data viewpoint extracting method, the depth estimation algorithm based on Block- matching and the depth optimization algorithm based on notable feature constraint.The light field initial data viewpoint extracting method proposed carries out viewpoint separation to the light field data without demosaicing；The used depth estimation algorithm based on Block- matching is only pair with central viewpoint viewpoint in the same row or the same column to upper and record the corresponding blocks of identical light color and carry out similarity measurement；To optimize depth estimation result, the present invention proposes the depth optimization algorithm constrained based on notable feature, extracts remarkable characteristic and is matched, using the parallax of remarkable characteristic as Condition of Strong Constraint.Present method avoids viewpoints caused by Interpolation Process to obscure, and improves depth estimation accuracy.

Description

Image depth estimation method based on light field

Technical Field

The invention relates to an image depth extraction method, in particular to an image depth estimation method based on a light field.

Background

Image depth estimation is an essential element of the computer vision field. Depth refers to the distance of a point in the scene to the camera plane. It would be advantageous to implement many computer vision applications if scene depth could be accurately recovered from a truly captured image. In recent years, the advent of light field cameras has brought forth a new solution for image depth estimation. Compared with the traditional camera, the light field camera is provided with the micro lens array in front of the sensor, the position and the angle of any light ray reaching an imaging surface can be recorded during one-time exposure, the four-dimensional light field can be comprehensively described, and various applications such as depth estimation, scene refocusing, viewpoint changing and the like can be realized in subsequent processing.

At present, depth estimation methods based on light fields are proposed, and good results are obtained, but problems also exist. For example, the mainstream light field image depth estimation method focuses on depth estimation of light field data with separated viewpoints, the light field data after demosaicing is adopted during viewpoint separation, however, confusion between viewpoints introduced during the demosaicing process cannot be removed in subsequent processing, and accuracy of depth estimation is greatly restricted. In addition, some algorithms use confidence in the empirically calculated depth estimates, adding human impact to the optimization process.

Disclosure of Invention

The invention aims to provide an image depth estimation method based on a light field, which avoids viewpoint confusion caused by an interpolation process and improves depth estimation accuracy.

Therefore, the image depth estimation method based on the light field is characterized by comprising the following steps of: s1, extracting the light field original data view point: acquiring images of a scene under different viewing angles, performing viewpoint separation on light field data without mosaic removal according to the estimated central position of the sub-images, and performing interpolation processing on information of missing pixels; s2, depth estimation based on block matching: similarity measurement is performed on the viewpoint pairs that are in the same row or column as the center viewpoint and that record the same light color, and the depth that maximizes the similarity is found.

Preferably, the light field-based image depth estimation method of the present invention further comprises the steps of: s3, depth optimization based on the significant feature constraint: and (4) extracting and matching the significant feature points, and optimizing depth estimation by using the estimated parallax as a strong constraint condition.

The method has the advantages that due to the fact that demosaicing processing is not conducted before viewpoint separation, viewpoint confusion is avoided, and therefore depth estimation accuracy is improved.

Furthermore, due to the adoption of a depth optimization algorithm based on the significant feature constraint, the extracted significant feature points are matched among different viewpoints as strong constraint conditions and added into an optimization objective function, and the depth estimation accuracy is further improved.

Drawings

Fig. 1 is a flowchart illustrating an image depth estimation method based on a light field according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of extracting viewpoints from original data according to an embodiment of the present invention.

FIG. 3 is a diagram illustrating filling of blank pixel locations by one-dimensional interpolation according to an embodiment of the present invention.

Detailed Description

FIG. 1 is a framework of an embodiment of the method of the present invention, which comprises three parts: firstly, a light field original data viewpoint extraction method acquires images of a scene under different visual angles, performs viewpoint separation on light field data without mosaic removal according to the estimated center position of sub-images, and then performs interpolation processing on information of missing pixels. Secondly, a block matching-based depth estimation algorithm measures similarity of corresponding blocks which are located on the same row or column of viewpoints as the central viewpoint and record the same light color, and finds a depth which maximizes the similarity. Thirdly, extracting and matching the significant feature points based on a depth optimization algorithm of significant feature constraint, and optimizing depth estimation by taking the estimated parallax as a strong constraint condition.

The basic principle of the embodiment of the present invention is illustrated below by performing mathematical modeling on a light field original data viewpoint extraction method, a depth estimation algorithm based on block matching, and a depth optimization algorithm based on significant feature constraints.

1. Light field original data viewpoint extraction

The viewpoint extraction needs to select and rearrange pixel positions in the light field original image, and form a viewpoint map by using pixels belonging to the same shooting angle, thereby forming a plurality of viewpoints. Each view point map is an image of a scene viewed at a different angle. The image covered by a single microlens on the sensor is called a subimage. Each pixel position in the sub-image corresponds to a different angular resolution. An image composed of pixels in all sub-images at the same position as the center of the sub-image is selected and referred to as a viewpoint.

Estimating the center of the sub-image:

to effectively cover the sensor plane, the microlenses of different shapes are arranged differently, possibly not in a rectangular coordinate system. The origin of the coordinate system formed by the centers of the sub-images tends to deviate from the origin of the sensor coordinate system by a certain amount, and the area covered by the sub-images is not necessarily a whole pixel. Therefore, in order to effectively propose the viewpoint, the corresponding position of the center of each sub-image on the sensor coordinate system needs to be estimated in advance.

Sensor pixels are generally arranged according to a rectangular coordinate system and are marked as a C coordinate system, and the origin points of the sensor pixels are respectively set as the pixel positions of the uppermost left corner; the two-dimensional coordinate system of the microlens array is denoted as an M coordinate system, and the origin points thereof are set as the uppermost left-hand microlens positions, respectively. Assuming that x and M are coordinates of the C coordinate system and the M coordinate system, respectively, the relationship between the two coordinates is:

x＝Tm+o (1)

where o denotes the translation vector of the origin of the two coordinate systems and T denotes a transformation matrix, in particular a miscut transformation matrix T₁Scaling transformation matrix T₂And the rotary transformerChange matrix T₃Multiplication results in:

T＝T₁T₂T₃(2)

in practice, the invention uses a light field camera to shoot an image of a pure white diffuse reflection object, the image is convoluted with a circular light spot mask, and the local maximum value coordinate of the result is the coordinate x of the central position C of the sub-image in the coordinate system_i. Knowing x_iAnd microlens coordinate m_iTo obtain a plurality of groups x_i＝Tm_i+ o, T and o can be estimated using least squares. Thus, the final sub-image center is obtained by:

c_i＝round(Tm_i+o) (3)

where round (.) denotes rounding.

Pixel rearrangement:

to realize viewpoint separation, pixels in the original image of the optical field need to be selected and rearranged. The specific mode is to select pixels at the same position (angular resolution) in the sub-images, and maintain the relative position relationship between the pixels (if there is no pixel at a certain position, the pixels are replaced by blanks), so as to form a viewpoint image. Fig. 2 shows the method of extracting two viewpoints 1 and 2 from raw data, as an example of a Lytro camera, a small "light field" camera.

Because a color filter array is adopted in the light field camera lens, each pixel point actually records only one color of light (one of red, green and blue). Generally speaking, the other two color components lost by each pixel point need to be demosaiced, i.e. interpolated reconstruction is required to recover a full-color image, but viewpoint confusion is introduced in demosaicing the original data of the light field. If demosaicing is performed after separating the views, although information of different views is not mixed, the color filter array of each view may be different, and the existing demosaicing algorithm is not suitable, especially in a high frequency region. Therefore, the invention directly uses the light field original data to extract the viewpoint, and uses an interpolation method to fill the blank pixel position in the separated viewpoint: taking one-dimensional linear interpolation as an example, if two adjacent positions in the same row of a blank pixel are pixels recording the same color light, the blank pixel records the same color light, and the response value is the average value of the two adjacent positions; if two adjacent positions in the same row of the blank pixel record different light colors, the blank pixel records the same color information and response value as the left pixel for simplifying the processing, see fig. 3.

2. Depth estimation based on block matching

The viewpoint images are combined into a viewpoint matrix, and since the viewpoints in the same row or column only have parallax in the horizontal or vertical direction, the epipolar constraint is satisfied. The base line between two adjacent micro lenses of the light field camera is very small, and the shielding formed between viewpoints is negligible. Therefore, for a certain pixel under the central viewpoint, its corresponding pixel position in the same row viewpoint is horizontally shifted by a certain distance only from the position in the central viewpoint, and similarly, its corresponding pixel position in the same column viewpoint is vertically shifted by a certain distance only from the position in the central viewpoint.

Since the color filter array may be different for each view point, the depth estimation based on block matching proposed by the present invention performs similarity measurement only on blocks recording the same light color. Suppose that I represents the center viewpoint, I^pAnd I^qRepresenting views belonging to the same line as the central view, b_pAnd b_qRespectively represent I^pAnd I^qThe baseline distance between the represented lenticule and the central viewpoint lenticule, then in pixels (x) for the central viewpoint₀,y₀) Block B as center₀At a viewpoint I^pAnd I^qThe degree of similarity of the corresponding block in (f) can be measured by weighted mean square error sum (WSSD), and is written as a cost function of disparity d:

wherein,

x_p:＝x+b_pd

x_q:＝x+b_qd (5)

represents the square of the Euclidean distance in RGB space for the corresponding pixel under two viewpoints, and w (. -) represents

A weighting factor.

w(x_p,x_q,y)＝G₀(x,y)·S(x_p,x_q,y) (6)

G₀(x, y) represents a center of (x)₀,y₀) Of Gaussian function, S (x)_p,x_qY) for determining that two pixel records are identical

The light color (one of R, G, B).

In the same way, I can be obtained^p、I^qCost functions for views that belong to the same column as the central view.

The invention estimates the parallax of the pixel in the central viewpoint under different viewpoint pairs by using the multiple viewpoints extracted by the light field camera. Let Π denote the set of pairs of viewpoints on the same row and column as the central viewpoint. Since the edge area of the main lens of the light field camera receives less light and has lower brightness, only the viewpoint pair consisting of 7 multiplied by 7 viewpoint matrixes close to the center is considered.

Thus, pixel (x) under the central viewpoint₀,y₀) The parallax of (a) is:

where Med denotes a median filter. The median filter can be used for removing noise, and a more stable parallax estimation result is obtained.

3. Depth optimization based on significant feature constraints

In order to obtain a more accurate depth estimation result, the method adds a constraint condition of significant feature point matching in the cost function. Extracting the salient feature points by using an SIFT algorithm to perform the extraction on a central viewpoint I and a viewpoint I^pAnd (5) extracting and matching features. If a certain feature point position under the central viewpoint is (x)₀,y₀) Its corresponding point is I^pThe position is (x)_p,y_p) The positional deviation is Δ ═ x_p-x₀,y_p-y₀). The angle coordinate of the known viewpoint I is (0,0), I^pAngle coordinate s of_p＝(u_p,v_p) The positional deviation Δ and the microlens diameter k, and the positional parallax d can be calculated from the equation (10)_c：

Δ＝d_c·k·s_p(9)

The invention uses a viewpoint pair set pi obtained in depth estimation based on block matching, and any pair I of view points in pi pairs^pAnd I^qThe salient feature constraint term is:

wherein,andrespectively representing feature points (x) under a central viewpoint₀,y₀) And I^pAnd I^qParallax of corresponding feature points under the viewpoint, M represents that SIFT operator is detected as a significant feature in the central viewpoint and can be I under the viewpoint^pFinding out a point set of corresponding significant features, and obtaining that SIFT operators are detected as significant features in a central viewpoint and can be detected under the viewpoint I^qA point set N of the corresponding salient features is found.

The final parallax optimization function is obtained as:

from the relation between disparity and depth, the final disparity can be converted to depth, i.e.:

where r represents a constant related to the camera parameters.

The embodiment of the invention has the following beneficial effects:

1. a light field original data viewpoint extraction algorithm is provided, and viewpoint confusion is avoided by estimating the center of a sub-image and rearranging pixels without demosaicing before viewpoint separation.

2. A depth optimization algorithm based on significant feature constraint is provided, the matching of the extracted significant feature points among different viewpoints is used as a strong constraint condition to be added into an optimization objective function, and the accuracy of depth estimation is improved.

The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications, equivalents, and alternatives made by using the contents of the present invention and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. An image depth estimation method based on a light field is characterized by comprising the following steps: s1, extracting the light field original data view point: the method comprises the steps that an optical field camera acquires images of a scene under different visual angles through a micro lens array, the images covered on a sensor by a single micro lens in the micro lens array of the optical field camera are called sub-images, according to the estimated center position of the sub-images, viewpoints are extracted by directly using optical field original data which are not demosaiced, viewpoint separation is carried out, and then interpolation processing is carried out on information of missing pixels; s2, depth estimation based on block matching: for a viewpoint pair consisting of viewpoints which are in the same row or the same column as the central viewpoint, performing similarity measurement on a corresponding pixel block which is corresponding to the viewpoint pair and records the same light color, and finding out the depth which enables the similarity to be maximum;

wherein the step S1 includes: the light field original data viewpoint extraction comprises the following steps: s11, estimating the center of the sub-image: estimating the corresponding position of the center of each sub-image on a sensor coordinate system; s12, pixel rearrangement: selecting pixels at the same position or the same angular resolution from the sub-images, keeping the relative position relationship among the pixels, if no pixel exists at a certain position, replacing the pixels with blanks to form a viewpoint image, and filling the blank pixel positions in the separated viewpoints by using an interpolation method.

2. The light-field-based image depth estimation method of claim 1, further comprising the steps of: s3, depth optimization based on the significant feature constraint: extracting and matching the significant feature points, and optimizing depth estimation by taking the estimated parallax as a strong constraint condition;

in step S3, the final parallax optimization function is obtained as:

wherein d represents parallax, d (x)₀,y₀) Indicating a pixel (x) under the central viewpoint₀,y₀) Where pi denotes a set of pairs of viewpoints on the same row and on the same column as the central viewpoint, p and q belong to a pair of viewpoint pairs in pi, conf^p，q(x₀，y₀And d) is a significant feature constraint term, cost^p，q(x₀，y₀And d) is a cost function of the parallax d;

wherein the significant feature constraint term is determined by: using a viewpoint pair set pi obtained in depth estimation based on block matching, and any pair p and q in pi pairs, wherein the significant feature constraint term is as follows:

wherein,andrespectively representing feature points (x) under a central viewpoint₀,y₀) And the parallax of the corresponding feature points under the p viewpoint and the q viewpoint, M represents the point set of the SIFT operator which is detected as the salient features in the central viewpoint and can find the corresponding salient features under the p viewpoint, and N represents the point set of the SIFT operator which is detected as the salient features in the central viewpoint and can find the corresponding salient features under the q viewpoint.

3. The light-field-based image depth estimation method according to claim 1 or 2, characterized in that: in step S11, the center of the sub-image is obtained by:

c_i＝round(Tm_i+o)

the sensor pixels are arranged according to a rectangular coordinate system and are marked as a C coordinate system; the two-dimensional coordinate system of the microlens array is denoted as an M coordinate system, o denotes a translation vector of the origin of the C coordinate system and the M coordinate system, T denotes a transformation matrix, M_iRepresenting the microlens coordinates and round (.) the rounding.

4. The light-field-based image depth estimation method of claim 1, wherein: the filling of the blank pixel position by using the interpolation method comprises the following steps: if two adjacent positions in the same row of the blank pixel are pixels for recording the same color light, the blank pixel records the same color light, and the response value is the average value of the two adjacent positions; if the color of the light recorded at two adjacent positions in the same row of the blank pixel is different, the color information and the response value recorded by the blank pixel are the same as those recorded by the left pixel.

5. The light-field-based image depth estimation method of claim 1 or 2, characterized by: in step S2, the depth value is given by:

wherein d (x)₀,y₀) Indicating a pixel (x) under the central viewpoint₀,y₀) Where disparity, depth and disparity d are inversely related, r represents a constant related to the camera parameters,

wherein,

med denotes median filter, (x)₀,y₀) As pixel coordinates, cost^p，q(x₀，y₀And d) is a cost function of the disparity d, pi represents a set of pairs of viewpoints on the same row and on the same column as the central viewpoint, and p and q are pairs of viewpoints belonging to pi.