CN112465796A

CN112465796A - Light field feature extraction method fusing focus stack and full-focus image

Info

Publication number: CN112465796A
Application number: CN202011432055.1A
Authority: CN
Inventors: 金欣; 周思瑶
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2021-03-09
Anticipated expiration: 2040-12-07
Also published as: CN112465796B

Abstract

The invention discloses a light field characteristic extraction method for fusing a focus stack and a full focus image, which comprises the following steps of: a1: inputting light field data, decoding and preprocessing the light field data to obtain a light field sub-view image array, and obtaining a focus stack and a full focus image at a plurality of view positions according to the light field sub-view image array; a2: respectively cascading the focus stacks and the full-focus images at a plurality of view angle positions to obtain an image set, and generating a Gaussian difference pyramid according to the image set; a3: and searching local extreme points in the Gaussian difference pyramid as feature point positions, and generating corresponding feature point descriptors. The light field feature extraction method provided by the invention can extract the four-dimensional light field feature points with robust depth and scale.

Description

Light field feature extraction method fusing focus stack and full-focus image

Technical Field

The invention relates to the field of computer vision and digital image processing, in particular to a light field characteristic extraction method for fusing a focus stack and a full-focus image.

Background

The point feature of the image is a sparse vector combination, can represent the self feature of the image which is different from other images, and is the beginning of the computer to identify and understand the image. The point features are also used for finding corresponding positions in different images, so that the method is applied to the fields of image splicing, three-dimensional reconstruction, SLAM and the like.

A micro-lens array is arranged in front of an image sensor of the handheld light field camera, and the spatial position and direction information of scene light rays can be recorded simultaneously in one-time shooting, so that assistance can be provided for research in the fields of image refocusing, depth map estimation, virtual reality and the like. A light field is typically parameterized as a combination of two-dimensional spatial information and two-dimensional angular information, described as a four-dimensional light field. Because the light field has four dimensions, and the common image only comprises two dimensions, the four-dimensional light field is difficult to be directly described through the two-dimensional image, and the traditional two-dimensional image feature extraction method does not consider image angle information and cannot completely represent high-dimensional information of the light field.

The above background disclosure is only for the purpose of assisting understanding of the concept and technical solution of the present invention and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.

Disclosure of Invention

In order to solve the technical problems, the invention provides a light field feature extraction method for fusing a focus stack and a full-focus image, which can extract four-dimensional light field feature points with robust depth and scale.

In order to achieve the purpose, the invention adopts the following technical scheme:

one embodiment of the invention discloses a light field feature extraction method for fusing a focus stack and a full focus image, which comprises the following steps:

a1: inputting light field data, decoding and preprocessing the light field data to obtain a light field sub-view image array, and obtaining a focus stack and a full focus image at a plurality of view positions according to the light field sub-view image array;

a2: respectively cascading the focus stacks and the full-focus images at a plurality of view angle positions to obtain an image set, and generating a Gaussian difference pyramid according to the image set;

a3: and searching local extreme points in the Gaussian difference pyramid as feature point positions, and generating corresponding feature point descriptors.

Preferably, the step a1 of obtaining the focal stack and the fully focused image at the multiple viewing angle positions according to the optical field sub-viewing angle image array specifically includes: and extracting the light field self-viewing angle image at the diagonal viewing angle in the light field sub-viewing angle image array, and generating the focus stack and the full focus image at the diagonal viewing angle according to the light field self-viewing angle image at the diagonal viewing angle.

Preferably, the focal stack at the diagonal view angle generated is:

wherein the content of the first and second substances,

is a coordinate in the angular domain of (u)₀,v₀) λ is a refocusing coefficient, L (u, v, s, t) represents the decoded lightfield from the lightfield data, (u, v) is an angular domain coordinate, (s, t) is a spatial domain coordinate, and U, V is the number of rows and columns of the lightfield sub-view image array;

preferably, the generated fully focused image is:

wherein the content of the first and second substances,

is a coordinate in the angular domain of (u)₀,v₀) Depth is the light field Depth map index.

Preferably, the step a2 of respectively cascading the focal stack and the fully focused image at a plurality of view angle positions to obtain an image set specifically includes: cascading the focal stack and the fully focused image at a plurality of view positions, respectively, using:

wherein the content of the first and second substances,

is a coordinate in the angular domain of (u)₀,v₀) The set of images at the viewing angle of (c),

is a coordinate in the angular domain of (u)₀,v₀) The focal stack at the viewing angle of (a),

is a coordinate in the angular domain of (u)₀,v₀) Is used to obtain a fully focused image at the viewing angle.

Preferably, the step a2 of generating the gaussian difference pyramid according to the image set specifically includes: constructing a gaussian difference pyramid using:

wherein the content of the first and second substances,

is a coordinate in the angular domain of (u)₀,v₀) The gaussian difference pyramid at the view angle of (a),

is a coordinate in the angular domain of (u)₀,v₀) The scale space of the image set at the view angle of (a);

further, the image set is subjected to fuzzy and downsampling processing by adopting a Gaussian function, and an angle domain coordinate (u) is obtained₀,v₀) Of the image set at the viewing angle

Expressed as:

wherein G (s, t, σ)_i) Is a scale-variable gaussian function and is,

is a coordinate in the angular domain of (u)₀,v₀) A set of images at a viewing angle.

Preferably, the scale-variable Gaussian function G (s, t, σ)_i) Expressed as:

where (s, t) is the spatial domain coordinate, σ_iIs a scale;

further, the scale σ_iExpressed as: sigma_i+1＝kσ_iI is more than or equal to 1 and less than or equal to N, N is the number of scales in the Gaussian scale space, and k is a parameter more than 0.

Preferably, the searching for the local extreme point in the gaussian difference pyramid as the feature point position in step a3 specifically includes:

calculating a local Hessian matrix of the Gaussian difference pyramid:

wherein the content of the first and second substances,

(s, t) is a space domain coordinate, and D is the Gaussian difference pyramid;

and when the local Hessian matrix takes the local maximum value, judging that the current point is the characteristic point, and taking the position of the current point as the position of the characteristic point.

Preferably, the step a3 of generating the corresponding feature point descriptor specifically includes: and partitioning the surrounding space region of the positions of the feature points, and calculating gradient histograms in 8 directions in a 4 x 4 window to obtain a 128-dimensional vector characterization descriptor.

Another embodiment of the present invention discloses a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to implement the steps of the light field feature extraction method described above.

Compared with the prior art, the invention has the beneficial effects that: according to the light field characteristic extraction method for fusing the focus stack and the full-focus image, provided by the invention, the characteristic that a light field camera takes pictures first and then focuses is fully utilized, and the spatial information and the angle information which are acquired simultaneously and the relation between the spatial information and the angle information are combined to generate a group of sparse focus stack and full-focus image and calculate the extreme point in the Gaussian difference pyramid space corresponding to the cascade image set of the focus stack and the full-focus image, so that the four-dimensional light field characteristic point with robust depth and scale and high accuracy is obtained, and the application of the light field in the fields of panoramic stitching, SLAM and the like is greatly enriched.

Drawings

FIG. 1 is a flow chart of a light field feature extraction method fusing a focal stack and a fully focused image according to a preferred embodiment of the invention.

Detailed Description

In particular embodiments, when performing the above steps, the following may be followed. It should be noted that the specific methods employed in the practice are merely illustrative, and the scope of the present invention includes, but is not limited to, the following methods. The invention will be further described with reference to the accompanying drawings and preferred embodiments.

Scene information contained in the four-dimensional light field is far larger than that of a common image, and if the feature points are directly extracted, the calculation amount is huge. The three-dimensional focal stack is obtained by reducing the dimension of the light field, is a series of images focused on different planes, and can represent the angle information and the spatial information of the light field. From the focal stack, a fully focused image can be generated by stitching, clearly characterizing the texture information of the light field. The preferred embodiment of the invention is to fully utilize the information of the light field space domain and the angle domain to accurately extract the light field characteristics, and utilize the focal stack and the full-focus image to extract the light field characteristic points.

As shown in fig. 1, the preferred embodiment of the present invention discloses a light field feature extraction method for fusing a focal stack and a full-focus image, which makes full use of information of a light field spatial domain and an angular domain to accurately extract a four-dimensional light field feature point with a robust view angle, and includes the following steps:

a1: inputting light field data, decoding and preprocessing the light field data to obtain a light field sub-view image array, and obtaining a focus stack and a full-focus image at a plurality of view positions according to the light field sub-view image array;

specifically, step a1 includes the following steps:

a11: inputting light field data, and decoding and preprocessing the light field data to obtain a light field sub-view image array;

in this embodiment, L (u, v, s, t) is the input light field, where (u, v) is the angular domain coordinate and (s, t) is the spatial domain coordinate. The method for decoding and preprocessing the input light field to obtain the sub-view image of the light field comprises the following steps:

SAI(u₀,v₀)＝{L(u,v,s,t)|u＝u₀,v＝v₀}. (1)

wherein SAI (u)₀,v₀) Is the light field at the viewing angle (u)₀,v₀) A light field sub-view image of (a); obtaining a light field sub-view image array { SAI (u) after decoding the light field L₀,v₀)|u₀∈[1,U],v₀∈[1,V]}, U, V are the number of rows and columns of the light-field sub-view image array.

And after the light field sub-view image array is obtained through decoding, carrying out denoising and color correction operation pretreatment on the array, and outputting the light field sub-view image array for extracting the characteristic points.

A12: for sparse light fields at diagonal view angles, multiple three-dimensional focal stacks corresponding to these view angle positions are generated.

The light field has multiple views, and objects at different depths in the scene have different corresponding parallaxes in the multi-view sub-images. After the plurality of light field sub-visual angle images are translated and overlapped, synthetic aperture imaging can be realized, and a series of light field images F (s, t, lambda) with focusing planes changing from near to far, namely focus stacks, are generated. The focal stacks can describe depth information of scene points, the focal stack corresponding to each sub-visual angle is extracted, and then the characteristic points are extracted, so that the light field characteristic points with robust depth can be obtained. The direct parallax of each sub-view image of the handheld light field camera is small, light field data contain a large amount of redundancy, and a light field at a diagonal view angle can represent the information of the whole light field. Therefore, in the embodiment, only a sparse group of focus stacks corresponding to the diagonal view angles are generated, and then the light field feature points are extracted, so that on one hand, the calculation amount can be greatly reduced, and on the other hand, the speed of extracting the feature points can be improved.

The focal stack is obtained by translating the light field in the spatial dimension and then integrating the light field in the angular dimension, and the calculation process is as follows:

wherein the content of the first and second substances,

is the angle of view (u)₀,v₀) And λ is the refocusing coefficient. U, V is the number of rows and columns of the light-field sub-view image array.

The above formula (2) can also be expressed as:

when in use

The method comprises the following steps:

when in use

The method comprises the following steps:

when in use

The method comprises the following steps:

when in use

The method comprises the following steps:

the number of sub-view images used for generating the focal stack at different view angle positions is different, and the closer to the central view angle, the more the number of the used peripheral sub-view images is, the wider the focal plane range contained in the generated focal stack is, and therefore, the more complete the contained light field depth information is.

The full focus image is formed by pixel stitching extracting the in-focus position in the focal stack:

wherein the content of the first and second substances,

is the angle of view (u)₀,v₀) The full focus image at, Depth is the light field Depth map index. In this embodiment, the pixels in focus in the focus stack are indexed by the lightfield depth map, and a fully focused image at the view angle position is spliced.

A2: respectively cascading focus stacks and full-focus images at a plurality of visual angle positions to obtain an image set, and generating a Gaussian difference pyramid according to the image set;

specifically, the focal stack and the full-focus image are cascaded in an RGB channel to obtain an image set:

wherein the content of the first and second substances,

Then, for the cascaded image set, carrying out fuzzy and down-sampling processing by adopting a Gaussian function to generate a four-dimensional Gaussian difference pyramid so as to further effectively detect an extreme point of the robustness in a Gaussian scale space;

specifically, the gaussian difference pyramid D constructed in this embodiment is:

wherein the content of the first and second substances,

in order to simulate the multi-scale characteristics of light field data, a Gaussian function is used for carrying out fuzzy and down-sampling processing on the three-dimensional image set after cascading, and the scale space of the obtained image set is as follows:

wherein G (s, t, σ)_i) Is a scale-variable gaussian function, expressed as:

this example constructs a gaussian scale space (DoG) with N scales, i.e. using N scales k times different (k > 0):

σ_i+1＝kσ_i,1≤i≤N (8)

dimension σ_iThe size of the three-dimensional image set determines the smoothness degree of the three-dimensional image set, the overall characteristics of the image are corresponding to the large-scale space, and the local characteristics of the image are corresponding to the small-scale space.

A3: and searching local extreme points in the Gaussian difference pyramid space as feature point positions and generating corresponding feature point descriptors.

Specifically, step a3 includes the following steps:

a31: searching local extreme points in a Gaussian difference pyramid space as feature point positions;

in the gaussian difference pyramid space, its local Hessian matrix is calculated:

wherein the content of the first and second substances,

when the Hessian matrix discriminant takes the local four-dimensional (space s, t + gradient lambda + scale sigma) maximum value, the current point is determined to be an extreme point, and the position of the characteristic point is positioned. The position of the feature point is expressed by four-dimensional coordinates defined by a biplane coordinate system, and the position of the view angle (u)₀,v₀) When searching the Gaussian difference pyramid, the obtained feature point position angle domain coordinate is (u)₀,v₀) The spatial domain coordinate is an extreme value(s)^*,t^*) (ii) a That is, the feature point position of the image set is described as { (u)₀,v₀,s^*,t^*)}。

A32: and for each feature point, calculating a feature point neighborhood gradient histogram and generating a corresponding feature point descriptor.

The feature point descriptor is a group of vectors, and the descriptor comprises key points and points which are around the key points and contribute to the key points, and is used as a basis for target matching, so that the key points can have more invariant characteristics. The invention divides the space area around the feature point into blocks, then calculates the gradient histogram of 8 directions in the window of 4 multiplied by 4, and obtains the 128-dimensional vector characterization descriptor.

Since the feature descriptors in this embodiment are constructed from the focal stack and the fully focused image, the features are robust to interfering objects at different depths, including partial occlusions and objects reflected from smooth surfaces.

The preferred embodiment of the invention provides a light field characteristic extraction method for fusing a focus stack and a full-focus image, which comprises the steps of converting a light field into a three-dimensional focus stack and a two-dimensional full-focus image on a group of sparse diagonal view angle image positions, and extracting an extreme point in a focus stack Gaussian difference pyramid space as a light field four-dimensional characteristic point position. Firstly, decoding and preprocessing light field data to obtain a light field sub-view image array, and then generating a plurality of three-dimensional focus stacks and two-dimensional full-focusing images corresponding to the view angle positions for a sparse light field at a diagonal view angle; cascading a focus stack and a full focus image, and performing fuzzy and down-sampling processing on the image set after cascading by adopting a Gaussian function to generate a four-dimensional Gaussian difference pyramid; and searching local extreme points in the Gaussian difference pyramid space as feature point positions and generating corresponding feature point descriptors. Experimental results show that compared with the existing algorithm, the method can extract the four-dimensional light field characteristic points with robust depth and scale.

Embodiments of the present invention further provide a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are called and executed by a processor, the computer-executable instructions cause the processor to implement the light field feature extraction method described above, and specific implementation may refer to method embodiments, and is not described herein again.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A light field feature extraction method fusing a focus stack and a full focus image is characterized by comprising the following steps:

2. The light field feature extraction method according to claim 1, wherein the obtaining of the focal stack and the fully focused image at the plurality of view positions from the light field sub-view image array in step a1 specifically includes: and extracting the light field self-viewing angle image at the diagonal viewing angle in the light field sub-viewing angle image array, and generating the focus stack and the full focus image at the diagonal viewing angle according to the light field self-viewing angle image at the diagonal viewing angle.

3. The light field feature extraction method according to claim 2, wherein the focal stack at the diagonal view angle generated is:

wherein the content of the first and second substances,

is a coordinate in the angular domain of (u)₀,v₀) λ is a refocusing coefficient, L (u, v, s, t) represents the decoded lightfield from the lightfield data, (u, v) is an angular domain coordinate, (s, t) is a spatial domain coordinate, and U, V is the number of rows and columns of the lightfield sub-view image array.

4. The light field feature extraction method according to claim 3, wherein the generated fully focused image is:

wherein the content of the first and second substances,

5. The light field feature extraction method according to claim 1, wherein the step a2 of respectively cascading the focus stack and the fully focused image at a plurality of view positions to obtain an image set specifically includes: cascading the focal stack and the fully focused image at a plurality of view positions, respectively, using:

wherein the content of the first and second substances,

is a coordinate in the angular domain of (u)₀,v₀) To seeThe set of images at the corners of the image,

6. The light field feature extraction method according to claim 1, wherein the step a2 of generating the gaussian difference pyramid from the image set specifically comprises: constructing a gaussian difference pyramid using:

wherein the content of the first and second substances,

Expressed as:

wherein G (s, t, σ)_i) Is a scale-variable gaussian function and is,

7. The light field feature extraction method according to claim 6, wherein the scale-variable Gaussian function G (s, t, σ)_i) Expressed as:

where (s, t) is the spatial domain coordinate, σ_iIs a scale;

8. The light field feature extraction method according to claim 1, wherein the step a3 of searching the local extreme points in the gaussian difference pyramid as feature point positions specifically comprises:

calculating a local Hessian matrix of the Gaussian difference pyramid:

wherein the content of the first and second substances,

(s, t) is a space domain coordinate, and D is the Gaussian difference pyramid;

9. The light field feature extraction method according to claim 1, wherein the generating of the corresponding feature point descriptor in step a3 specifically includes: and partitioning the surrounding space region of the positions of the feature points, and calculating gradient histograms in 8 directions in a 4 x 4 window to obtain a 128-dimensional vector characterization descriptor.

10. A computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to carry out the steps of the light field feature extraction method of any one of claims 1 to 9.