CN116912138A

CN116912138A - Dynamic multi-exposure light field image fusion method based on structure consistency detection

Info

Publication number: CN116912138A
Application number: CN202310707326.7A
Authority: CN
Inventors: 金佳锋; 蒋刚毅; 陈晔曜; 郁梅
Original assignee: Ningbo University
Current assignee: Ningbo University
Priority date: 2023-06-15
Filing date: 2023-06-15
Publication date: 2023-10-20

Abstract

The invention discloses a dynamic multi-exposure light field image fusion method based on structural consistency detection, which carries out high-order singular value decomposition on the angular dimension of a registration light field image corresponding to a light field image in a dynamic multi-exposure light field image sequence to obtain a multi-exposure main base sequence and a multi-exposure non-main base sequence; inputting the multi-exposure main base sequence into a main base fusion module based on structure consistency detection to obtain a fusion main base; inputting the multi-exposure non-main base sequence into other baseband fusion modules based on mask division to obtain a fusion non-main base; obtaining a fusion baseband tensor according to the fusion main base and all fusion non-main bases, reconstructing by combining the fusion baseband tensor and orthogonal factor matrixes of two angular dimensions in high-order singular value decomposition, recovering the fusion baseband tensor into a light field structure, and recovering the angular information to obtain a fusion light field image containing complete angular information; the method has the advantage that the method can be used for obtaining the fused light field image with good space quality and angle quality and high dynamic range.

Description

Dynamic multi-exposure light field image fusion method based on structure consistency detection

Technical Field

The invention relates to a multi-exposure light field image fusion technology, in particular to a dynamic multi-exposure light field image fusion method based on structural consistency detection.

Background

The light field describes the amount of light flowing in any direction through each point in free space and is a complete representation of the collection of rays in the three-dimensional world. Light field imaging can record the position and direction information of light rays in space at the same time, and has effective application in digital refocusing, depth estimation, multi-view imaging and other aspects. The advent of light field cameras has prompted the development of light field imaging, as well as bringing further demands on physical world capture and perception by humans. The light field camera is limited by hardware equipment, the dynamic range of the captured single image is far smaller than that of a natural scene, and the phenomenon of overexposure or exposure of partial areas can occur.

Aiming at the technical problems of the light field camera, a group of images containing scene information with different exposure degrees are fused into a fused image by using a multi-exposure fusion method. However, there are often moving objects in a real scene, so that the fused image generates distortion phenomena such as artifacts and blurring. At present, the research on the fusion of multi-exposure light field images in a dynamic scene is relatively less, and therefore, the research on a dynamic multi-exposure light field image fusion method is very necessary.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a dynamic multi-exposure light field image fusion method based on structure consistency detection, by which a fused light field image with good space quality and angle quality and high dynamic range can be obtained.

The technical scheme adopted for solving the technical problems is as follows: the dynamic multi-exposure light field image fusion method based on the structure consistency detection is characterized by comprising the following steps of:

step 1: acquiring M light field images with different exposure degrees shot at the same space position, forming a dynamic multi-exposure light field image sequence by the M light field images with different exposure degrees, and recording as { L } _i ∈R ^{U×V×H×W×C} I 1 is not less than i is not more than M; wherein M is greater than 1, L _i The light field image representing the ith exposure degree is also the ith light field image, R represents a real number set, L _i The spatial resolution of (2) is W multiplied by H, the angular resolution is V multiplied by U, the number of color channels is C, i is more than or equal to 1 and less than or equal to M;

step 2: for { L ] _i ∈R ^{U×V×H×W×C} The sub-aperture images of each light field image in I1 is less than or equal to i is less than or equal to M are aligned in a central view to obtain a corresponding registration light field image, and L is taken as follows _i The corresponding registered light field image is denoted as L _A(i) The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _A(i) ∈R ^U ^×V×H×W×C ，L _A(i) The spatial resolution of (2) is W×H, the angular resolution is V×U, and the number of color channels is C;

Step 3: performing high-order singular value decomposition on the angular dimension of each registered light field image to obtain a corresponding baseband tensor, taking a first component in the baseband tensor as a main base, taking the rest baseband except the first component in the baseband tensor as a non-main base, and taking L _A(i) The primary basis in the corresponding baseband tensor is P _B(i) A set { P } composed of principal bases in baseband tensors corresponding to all registered light field images _B(i) I1 is not less than i is not more than M as a multi-exposure main base sequence, L is taken as a multi-exposure main base sequence _A(i) The kth non-primary base in the corresponding baseband tensor is P _O(i),k A set { P } formed by the kth non-principal basis at the same position in the baseband tensor corresponding to all the registered light field images _O(i),k I1 is not less than i is not more than M } as a kth multi-exposure non-main base sequence, wherein M non-main bases are arranged in the kth multi-exposure non-main base sequence; wherein, K is more than or equal to 1 and less than or equal to K, K represents the quantity of the rest baseband except the first component in the baseband tensor corresponding to each registration light field image, and K=V multiplied by U-1;

step 4: inputting a multi-exposure master base sequence into a structure-based oneIn the primordial fusion module of the primordial detection, a fusion primordial is obtained and is marked as P _BB The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the fusion master group P is obtained _BB Acquiring a structural consistency detection binary image of each main base in a multi-exposure main base sequence;

Step 5: inputting the kth multi-exposure non-main base sequence into the rest baseband fusion modules based on mask division to obtain the kth fusion non-main base, which is marked as P _OB,k The method comprises the steps of carrying out a first treatment on the surface of the The mask is divided by taking a structure consistency detection binary image corresponding to the main base as a mask;

step 6: according to P _BB And a set of all fused non-primordial groups { P } _OB,k And K is equal to or more than 1 and is equal to or less than K to obtain a fusion baseband tensor which is recorded asThen combine->Reconstructing orthogonal factor matrixes of two angular dimensions when the angular dimensions of the registered light field image are subjected to high-order singular value decomposition, recovering the orthogonal factor matrixes into a light field structure, and recording the obtained fused light field image as +.>

Step 7: for a pair ofRecovering the angle information of the image to obtain a fused light field image containing complete angle information, which is marked as L _H 。

The specific process of the step 2 is as follows: { L } calculation using optical flow estimation method _i ∈R ^{U×V×H×W×C} Parallax between the central sub-aperture image and each non-central sub-aperture image of each light field image in i 1 is less than or equal to i is less than or equal to M; then mapping each non-central sub-aperture image of each light field image onto the central sub-aperture image according to the corresponding parallax relation by utilizing reverse drawing to realize central view alignment, so as to obtain a corresponding registration light field image; wherein the non-central sub-aperture map Like the remaining sub-aperture images except the central sub-aperture image.

The optical flow estimation method specifically is a SIFT flow optical flow estimation method.

In the step 3, P _B(i) And P _O(i),k The acquisition process of (1) is as follows:

step 3.1: for L _A(i) Performing high-order singular value decomposition, wherein the decomposition formula is as follows: l (L) _A(i) ＝S _i × ₁ O ⁽¹⁾ × ₂ O ⁽²⁾ × ₃ O ⁽³⁾ × ₄ O ⁽⁴⁾ × ₅ O ⁽⁵⁾ The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is _i Represents L _A(i) Core tensor obtained through high-order singular value decomposition, S _i ∈R ^{U'×V'×H'×W'×C'} U ' and V ', H ', W ', C ' are S _i O, five dimensions of (2) ⁽¹⁾ And O ⁽²⁾ For two angular dimensions of the orthogonal factor matrix, O ⁽¹⁾ ∈R ^U×U' ，O ⁽²⁾ ∈R ^V×V' ，O ⁽³⁾ And O ⁽⁴⁾ For two spatial dimensions of orthogonal factor matrix, O ⁽³⁾ ∈R ^H×H' ，O ⁽⁴⁾ ∈R ^W×W' ，O ⁽⁵⁾ Orthogonal factor matrix representing color channel dimension, O ⁽⁵⁾ ∈R ^C×C' ；

Step 3.2: will L _A(i) ＝S _i × ₁ O ⁽¹⁾ × ₂ O ⁽²⁾ × ₃ O ⁽³⁾ × ₄ O ⁽⁴⁾ × ₅ O ⁽⁵⁾ Conversion toWherein the superscript "T" represents a transpose of a vector or matrix;

step 3.3: at the position ofOn the basis of (1), calculate L _A(i) The baseband tensor obtained by the angular dimension of (a) through the high-order singular value decomposition is recorded as P _i ，/>Will P _i The first component of (a) being the primary radical P _B(i) Will P _i The kth remaining base band except the first component is used as the kth non-primary base P _O(i),k 。

In the step 4, a main base fusion module based on structural consistency detection is utilized to process the multi-exposure main base sequence to obtain a fusion main base P _BB The process of (1) is as follows:

step 4.1: dividing each master in a multi-exposure master sequence into a plurality of overlapping color blocks under RGB space; when the overlapped color blocks are segmented, a sliding window is adopted to slide in a main base with the step length of 2 pixel points to realize segmentation, and the size of the color blocks is 24 multiplied by 24;

Step 4.2: selecting a reference principal group from the multi-exposure principal group sequence, and marking the reference principal group as P _r The method comprises the steps of carrying out a first treatment on the surface of the Then uses the intensity mapping function and the reference principal base P _r To generate a multi-exposure virtual master base sequence, and marking the ith virtual master base in the multi-exposure virtual master base sequence as P _X(i) The method comprises the steps of carrying out a first treatment on the surface of the Traversing each principal in the multi-exposure principal base sequence, and defining the currently traversed principal base as a current principal base; wherein the reference principal group is { L ] _i ∈R ^{U×V×H×W×C} The i 1 is not less than i is not more than M, and the principal basis in the baseband tensor corresponding to the registration light field image corresponding to the light field image with the optimal exposure degree in the i is not less than M;

step 4.3: setting the current main base as P _B(i) Will P _B(i) The j-th color block in (a) is denoted as x _i,j And x is taken as _i,j Expressed as a signal strength component c _i,j Signal structure component s _i,j And an average intensity component l _i,j Three parts of the three-way valve are arranged on the bottom of the valve,wherein, when dividing overlapped color blocks, a sliding window is adopted to obtain a step length of 2 pixel points at P _B(i) The middle sliding realizes the segmentation, the size of the color block is 24 multiplied by 24, J is more than or equal to 1 and less than or equal to J, and J represents P _B(i) The total number of color blocks, c _i,j And l _i,j Are scalar quantities, s _i,j In the form of a vector which is a vector, "||||" is L2 the norm calculates the sign of the sign, and (2)>Represents x _i,j The average value of the pixel values of all the pixel points in (a), ".

Step 4.4: p pair P _B(i) Partitioning P _B(i) The method comprises the steps of dividing the light source into an overexposure region and a non-overexposure region; then according to P _B(i) Overexposed and non-overexposed regions to obtain P _B(i) Corresponding overexposure binary pattern is recorded as

Step 4.5: obtaining P by using a perceptual hash algorithm _B(i) The corresponding first structure consistency region detection binary image is recorded asThe acquisition process comprises the following steps: calculating a reference principal point P _r Each color block of (a) and P _B(i) Hamming distance of hash values of corresponding color blocks in the color image will be referred to the principal base P _r The j-th color block x in (1) _r,j And P _B(i) The j-th color block x in (1) _i,j The Hamming distance of the hash value of (2) is noted as delta _i,j ，δ _i,j ＝F _PHA (x _r,j ,x _i,j ) The method comprises the steps of carrying out a first treatment on the surface of the Then according to the reference principal point P _r All color blocks and P in (1) _B(i) Hamming distance of hash value of corresponding color block, get +.>The number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r One of the color blocks and P _B(i) The Hamming distance of the hash value of the corresponding color block in is smaller than or equal to the threshold th1, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r One of the color blocks and P _B(i) The Hamming distance of the hash value of the corresponding color block in is larger than the threshold th1, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; wherein J is more than or equal to 1 and less than or equal to J, J represents P _B(i) The total number of color blocks, delta _i,j ∈[0,64]，F _PHA () Representing performing a perceptual hash algorithm operation, th1=5;

step 4.6: obtaining P by using normalized cross-correlation algorithm _B(i) The corresponding second structure consistency region detection binary image is recorded asThe acquisition process comprises the following steps: calculating a reference principal point P _r Signal structure component and P of each color block in (a) _B(i) The inner product of the signal structure components of the corresponding color block will be referred to the principal basis P _r The j-th color block x in (1) _r,j Is of signal structure component s _r,j And P _B(i) The j-th color block x in (1) _i,j Is of signal structure component s _i,j The inner product of (2) is denoted as θ _i,j ，Then according to the reference principal point P _r Signal structure component and P of all color blocks in (a) _B(i) The inner product of the signal structure components of the corresponding color block, is obtained +.>The number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r Signal structure component and P of one color block in (3) _B(i) The inner product of the signal structure components of the corresponding color block is greater than or equal to the threshold constant th2, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r Signal structure component and P of one color block in (3) _B(i) The inner product of the signal structure components of the corresponding color block is smaller than the threshold constant th2, then +. >The pixel value of the pixel point at the corresponding position in the pixel array is 1; wherein θ _i,j ∈[-1,1]，l _r,j Representing a reference principal group P _r The j-th color block x in (1) _r,j Epsilon is an infinitesimal number for preventing the denominator from being 0, th2=0.8;

step 4.7: calculating a reference principal point P _r Average intensity component and P of each color block in (3) _X(i) The average intensity difference between the average intensity components of the corresponding color blocks will be referred to the principal basis P _r The j-th color block x in (1) _r,j The average intensity component l of (2) _r,j And P _X(i) The j-th color block x in (1) _X(i),j The average intensity component l of (2) _X(i),j The average intensity difference between them is noted as ζ _i,j ，ζ _i,j ＝||l _r,j -l _X(i),j I; then according to the reference principal point P _r Average intensity component and P of all color blocks in (1) _X(i) Average intensity difference between average intensity components of corresponding color blocks to obtainThe number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r Average intensity component of one color block and P _X(i) The average intensity difference between the average intensity components of the corresponding color blocks is greater than or equal to the threshold constant th3, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r Average intensity component of one color block and P _X(i) The average intensity difference between the average intensity components of the corresponding color blocks is smaller than the threshold constant th3, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; wherein th3=0.1;

step 4.8: according toObtaining P _B(i) The structural consistency detection binary pattern of (2) is marked as Z _i ，/>

Step 4.9: judging P _B(i) Whether or not each color block in (1) belongs to a structure consistency region, for P _B(i) The j-th color block x in (1) _i,j If Z _i Intermediate and x _i,j If the pixel value of the pixel point at the corresponding position is 0, then consider x _i,j Belongs to a structure consistency region; if Z _i Intermediate and x _i,j If the pixel value of the pixel point at the corresponding position is 1, then consider x _i,j Belongs to a structure inconsistency area;

step 4.10: for P _B(i) Color blocks belonging to the region of structural inconsistency, P is used _X(i) The corresponding color blocks in (a) are replaced to realize compensation, namely: let x be _i,j Belonging to the region of structural inconsistency, then P is used _X(i) The j-th one of (3)Color block x _X(i),j Substitution x _i,j The method comprises the steps of carrying out a first treatment on the surface of the Will pair P _B(i) The compensation main base obtained after all the color blocks belonging to the structure inconsistency area are compensated is P' _B(i) ；

Step 4.11: taking the next traversed main base in the multi-exposure main base sequence as the current main base, and returning to the step 4.3 to continue execution until a compensation main base corresponding to each main base in the multi-exposure main base sequence is obtained;

Step 4.12: according to the color blocks at the same position in the compensation main base corresponding to all main bases in the multi-exposure main base sequence, namely M color blocks, a corresponding fusion image block is obtained by reconstruction, and the method is characterized in that the method comprises the following steps of' _i,j M color blocks with i 1 being less than or equal to i being less than or equal to M are reconstructed to obtain a fusion image block which is recorded as Expressed as signal intensity component->Signal structure component->And average intensity component->Three parts (I)>Wherein x' _i,j Representing P' _B(i) The j-th color block of (a), a->And->Are scalar amounts->As vector (I)>max () is a function taking the maximum value, representation->Is used for the signal structure component of the (c) signal,ω ₁ () As a weighting function omega ₁ () Expressed by an exponential function, ">Beta is an exponential parameter, beta=4, +.>μ' _i Representing P' _B(i) Global mean, ω ₂ () As a weighting function omega ₂ () By two-dimensional normal distribution->exp () represents an exponential function based on natural radix e, e=2.71 …, μ _c 、l _c 、σ _x 、σ _y Are parameters in two-dimensional normal distribution, sigma _x ＝0.2，σ _y ＝0.5，μ _c ＝0.5，l _c ＝0.5，c' _i,j Represents x' _i,j Is's' _i,j Represents x' _i,j Signal structure component, l' _i,j Represents x' _i,j Mean intensity component of> Represents x' _i,j An average value of pixel values of all pixel points in the image;

step 4.13: according to P _B(i) The inverse of the segmentation into overlapping color blocks willIs recombined into a master group as a fusion master group P _BB The method comprises the steps of carrying out a first treatment on the surface of the The average value of the pixel values of the pixel points at the same position in the overlapping area during recombination is used as the final pixel value after recombination.

The specific process of the step 4.4 is as follows:

step 4.4.1: will P _B(i) Is converted from RGB space into HSV model; then P is carried out under HSV model _B(i) Dividing into a plurality of overlapping blocks; re-comparing P _B(i) The size of the region division index and the set division threshold value of each overlapping block, and if the region division index is larger than or equal to the set division threshold value, the overlapping block is considered to belong to the overexposure region; if the area division index is smaller than the set division threshold, the overlapped block is considered to belong to the non-overexposure area; wherein, when dividing the overlapped block, a sliding window is adopted to obtain a step length of 2 pixel points at P _B(i) The size of the overlapped block is 24 multiplied by 24, the area division index is equal to the number of pixel points with the V component value exceeding 240 in the overlapped block divided by the total number of pixel points contained in the overlapped block, and the set division threshold is equal to the product of 0.8 and the total number of pixel points contained in the overlapped block;

step 4.4.2: order theRepresenting P _B(i) Corresponding overexposure binary pattern, +.>The number of pixel points in each row is equal to P _B(i) The number of overlapping blocks per line is equal, < > >The number of pixel points in each column and P _B(i) The number of overlapped blocks in each column is equal, if P _B(i) One of the overlapping blocks belonging to the overexposed region +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; if P _B(i) One of the overlapping blocks belonging to the non-overexposed region>The pixel value of the pixel point at the corresponding position in (a) is 0.

In the step 5, the k multi-exposure non-primary base sequence is processed by using the rest of the baseband fusion modules based on mask division to obtain a k fusion non-primary base P _OB,k The process of (1) is as follows:

step 5.1: dividing each non-master base in a kth multi-exposure non-master base sequence into a plurality of overlapping color blocks in RGB space; when the overlapped color blocks are segmented, a sliding window is adopted to slide in a non-main base with the step length of 2 pixel points to realize segmentation, and the size of the color blocks is 24 multiplied by 24;

step 5.2: classifying all color blocks in each non-master base in the kth multi-exposure non-master base sequence into masked regions and unmasked regions for P _O(i),k Will Z _i As a mask pair P _O(i),k The process of classifying each color block is as follows: for P _O(i),k The j-th color block x in (1) _i,k,j If Z _i Intermediate and x _i,k,j If the pixel value of the pixel point at the same position is 1, then consider x _i,k,j Belonging to a mask region; if Z _i Intermediate and x _i,k,j If the pixel value of the pixel point at the same position is 0, then consider x _i,k,j Belongs to a non-mask area;

step 5.3: obtaining the kth fusion non-master P _OB,k Is denoted as P _Os,k ，Wherein, max () is a maximum function, ">Representation pair Z _i Performing inverse operation to obtain a complementary graph;

step 5.4: obtaining the kth fusion non-master P _OB,k Is denoted as P _Od,k The acquisition process comprises the following steps:

step 5.4.1: for the color blocks belonging to the mask region in each non-master in the kth multi-exposure non-master sequence, P is used _X(i) The corresponding color blocks in (a) are replaced to realize compensation, namely: for P _O(i),k Let x be _i,k,j Belonging to the mask region, then P is used _X(i) The j-th color block x in (1) _X(i ) _,j Substitution x _i,k,j The method comprises the steps of carrying out a first treatment on the surface of the Will pair P _O(i),k The compensation non-main base obtained after compensating all the color blocks belonging to the mask area is P' _O(i),k ；

Step 5.4.2: according to the color blocks corresponding to all non-main bases in the kth multi-exposure non-main base sequence and compensating the same position in the non-main base, namely M color blocks in total, reconstructing to obtain a corresponding fusion image block, and reconstructing the fusion image block according to { x' _i,k,j M color blocks with i 1 being less than or equal to i being less than or equal to M are reconstructed to obtain a fusion image block which is recorded as Expressed as signal intensity component- >Signal structure component->And average intensity component->Three parts (I)>Wherein x' _i,k,j Representing P' _O(i),k The j-th color block of (a), a->Obtained as in step 4.12;

step 5.4.3: according to P _O(i),k The inverse of the segmentation into overlapping color blocks willIs recombined into a mask region P _Od,k The method comprises the steps of carrying out a first treatment on the surface of the The average value of the pixel values of the pixel points at the same position in the overlapping area during recombination is used as a final pixel value after recombination;

step 5.5: obtaining the kth fusion non-master P _OB,k ，P _OB,k ＝P _Os,k +P _Od,k 。

In the step 6 described above, the step of,

the specific process of the step 7 is as follows: according to { L _i ∈R ^{U×V×H×W×C} Parallax between a central sub-aperture image of a light field image with the optimal exposure degree in I1 is less than or equal to i is less than or equal to M and each non-central sub-aperture image is achieved by forward drawingThe non-central sub-aperture images aligned with the central views are mapped to the corresponding positions to restore the angle information, so as to obtain L _H 。

Compared with the prior art, the invention has the advantages that:

1) The method of the invention considers that the fusion light field can generate artifacts, blurring and other distortions caused by the moving objects existing in the dynamic scene, carries out structural consistency detection through a perceptual hash algorithm and a normalized cross-correlation algorithm, and applies the structural consistency detection to the fusion processing of the multi-exposure main base sequence and the multi-exposure non-main base (rest base band) sequence respectively, thereby obtaining the high dynamic range light field with good space quality and angular quality.

2) According to the method, the fact that the light field data contain more redundant information is considered, the high-order singular value decomposition is adopted to carry out dimension reduction processing on the high-dimension data of the 4D light field, the conversion from the 4D light field to the 2D image is realized, and the calculation complexity is reduced to a certain extent; meanwhile, the alignment of the light field center view is realized by adopting the methods of light flow estimation and parallax drawing, and the phenomenon that the baseband is blurred after the dimension reduction is caused by the difference of different visual angles of the light field image is effectively relieved.

3) The method takes the importance of the multi-exposure non-primary base (other baseband) sequence in the light field information into consideration, is different from the method which directly adopts any other baseband information after any exposure light field high-order singular value decomposition as other baseband information of the reconstructed light field to reconstruct the light field, and adopts the other baseband fusion method based on a mask to effectively fuse the high-frequency information of the multi-exposure light field image sequence, thereby effectively improving the imaging quality of the fused light field.

Drawings

FIG. 1 is a block diagram of the overall implementation of the method of the present invention;

FIG. 2 is a schematic diagram of the processing procedure of a main base band fusion module based on structure consistency detection and other base band fusion modules based on mask division in the method of the present invention;

FIG. 3a is an underexposed light field image of a 4 th sequence of test light field images, shown here with a central sub-aperture image (including a partial magnified view) and its corresponding magnified EPI view;

FIG. 3b is a mid-exposure light field image in the 4 th test light field image sequence, shown here with a central sub-aperture image (including a partial magnified view) and its corresponding magnified EPI view;

FIG. 3c is an overexposed field image in the 4 th test field image sequence, shown here with a central sub-aperture image (including a partial magnified view) and its corresponding EPI magnified view;

FIG. 3d is a fused light field image obtained by processing the 4 th sequence of test light field images using the method of Mertens et al, where the fusion result of the central sub-aperture image (including the partial magnified image) and its corresponding magnified EPI image are taken for display;

FIG. 3e is a fused light field image obtained by processing the 4 th test light field image sequence using the method of Li et al (GFF), where the fusion result of the central sub-aperture image (including the partial magnified image) and its corresponding EPI magnified image are taken for illustration;

FIG. 3f is a fused light field image obtained by processing the 4 th sequence of test light field images using the method of Liu et al (DSIFT-MEF), where the fusion result of the central sub-aperture image (including the partial magnified view) and its corresponding EPI magnified view are shown;

FIG. 3g is a fused light field image obtained by processing the 4 th test light field image sequence by the method of Hayat et al, where the fusion result of the central sub-aperture image (including the partial magnified image) and its corresponding EPI magnified image are taken for display;

FIG. 3h is a fused light field image obtained by processing the 4 th test light field image sequence using the method of Li et al (FMSSPD-MEF), where the fusion result of the central sub-aperture image (including the partial magnified image) and its corresponding EPI magnified image are shown;

FIG. 3i is a fused light field image obtained by processing the 4 th test light field image sequence by the method of Li et al (MESPD), where the fusion result of the central sub-aperture image (including the partial magnified image) and its corresponding EPI magnified image are taken for display;

FIG. 3j is a fused light field image obtained by processing the 4 th test light field image sequence using the method of the present invention, where the fusion result of the central sub-aperture image (including the partial magnified image) and the corresponding EPI magnified image are shown;

FIG. 4a is a central sub-aperture image of an underexposed light field image in the 22 nd test light field image sequence;

FIG. 4b is a central sub-aperture image of the mid-exposure light field image in the 22 nd test light field image sequence;

FIG. 4c is a central sub-aperture image of an overexposed light field image in the 22 nd test light field image sequence;

FIG. 4d is a depth map of an underexposed light field image in the 22 nd test light field image sequence;

FIG. 4e is a depth map of an intermediate exposure light field image in the 22 nd test light field image sequence;

FIG. 4f is a depth map of an overexposed light field image in the 22 nd test light field image sequence;

FIG. 4g is a depth map of a fused light field image obtained by processing the 22 nd test light field image sequence using the method of Mertens et al;

FIG. 4h is a depth map of a fused light field image obtained by processing the 22 nd test light field image sequence using the method of Li et al (GFF);

FIG. 4i is a depth map of a fused light field image obtained by processing the 22 nd test light field image sequence using the method of Liu et al (DSIFT-MEF);

FIG. 4j is a depth map of a fused light field image obtained by processing the 22 nd test light field image sequence using the method of Hayat et al;

FIG. 4k is a depth map of a fused light field image obtained by processing the 22 nd test light field image sequence using the method of Li et al (FMSSPD-MEF);

FIG. 4l is a depth map of a fused light field image obtained by processing the 22 nd test light field image sequence using the method of Li et al (MESPD);

FIG. 4m is a depth map of a fused light field image obtained by processing the 22 nd test light field image sequence using the method of the present invention.

Detailed Description

The invention is described in further detail below with reference to the embodiments of the drawings.

With the continuous development of computer technology and photoelectric technology, light field imaging has gradually become one of research hotspots in the fields of computer vision and optical imaging. To a certain extent, the light field imaging technology solves the problem of neglecting light angle information in the traditional imaging technology, innovates the imaging technology and provides a theoretical basis for the birth of a light field camera. The advent of light field cameras has prompted the development of light field imaging, as well as bringing further demands on physical world capture and perception by humans. Since the invention of a light field camera, improving the imaging capability of the light field camera is always a target pursued by vast scientific researchers. Limited by factors such as light characteristics and physical parameters of a sensor, the imaging of a light field camera has the problem of insufficient dynamic range, and a new light field camera is difficult to develop to improve the imaging quality of the light field camera. Currently, there are two types of methods to extend the field angle and dynamic range of light field camera imaging. The first method is to design a special imaging device to improve imaging quality from the perspective of hardware equipment, but most of the methods have the problems of complex design and difficult carrying. Therefore, another type of method is generally adopted, namely, from the viewpoint of computational photography, a suitable imaging method is designed to improve the imaging quality of the light field camera. Aiming at the problem, from the perspective of high-dimensional characteristics of light field data and detection of a motion area, the invention provides a dynamic multi-exposure light field image fusion method based on structure consistency detection, the method is mainly divided into three parts, firstly, multi-exposure light field image sequences are respectively registered based on a central view to eliminate the difference between different visual angles, and a multi-exposure main base sequence and a multi-exposure non-main base sequence are obtained after further high-order singular value decomposition; secondly, aiming at the multi-exposure main base sequence, after the multi-exposure main base sequence is partitioned, detecting a structure inconsistent area by combining a perception hash algorithm and a normalized cross-correlation method, and obtaining a fusion main base with artifact removed by combining an image block decomposition method; thirdly, aiming at the multi-exposure non-main base sequence, taking a structure consistency region detection binary image obtained in the multi-exposure main base sequence as a mask image, dividing the multi-exposure non-main base sequence into a mask region and a non-mask region, and respectively fusing by adopting different fusion modes to obtain a fused non-main base; and finally, performing tensor reconstruction by using the fusion principal basis and the fusion non-principal basis, and obtaining the artifact-free high dynamic range light field through angle recovery.

The invention provides a dynamic multi-exposure light field image fusion method based on structural consistency detection, which is generally implemented as a flow chart shown in figure 1 and comprises the following steps:

step 1: acquiring M light field images with different exposure degrees shot at the same space position, forming a dynamic multi-exposure light field image sequence by the M light field images with different exposure degrees, and recording as { L } _i ∈R ^{U×V×H×W×C} I 1 is not less than i is not more than M; wherein M > 1, m=3 in this embodiment, l _i The light field image representing the ith exposure degree is also the ith light field image, R represents a real number set, L _i In the present embodiment, w×h is 600×400, v×u is 9×9, and C is 3 (three color channels of RGB), 1.ltoreq.i.ltoreq.m.

Step 2: for { L ] _i ∈R ^{U×V×H×W×C} The sub-aperture images of each light field image in I1 is less than or equal to i is less than or equal to M are aligned in a central view to obtain a corresponding registration light field image, and L is taken as follows _i The corresponding registered light field image is denoted as L _A(i) The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _A(i) ∈R ^U ^×V×H×W×C ，L _A(i) The spatial resolution of (2) is W×H, the angular resolution is V×U, and the number of color channels is C.

In this embodiment, the specific process of step 2 is: { L } calculation using optical flow estimation method _i ∈R ^{U×V×H×W×C} Parallax between the central sub-aperture image and each non-central sub-aperture image of each light field image in i 1 is less than or equal to i is less than or equal to M; then mapping each non-central sub-aperture image of each light field image onto the central sub-aperture image according to the corresponding parallax relation by utilizing reverse drawing to realize central view alignment, so as to obtain a corresponding registration light field image; wherein the non-central sub-aperture image is the remaining sub-aperture images except the central sub-aperture image. The optical flow estimation method is specifically a siftflow optical flow estimation method.

Step 3: performing high-order singular value decomposition on the angular dimension of each registered light field image to obtain a corresponding baseband tensor, taking a first component in the baseband tensor as a main base, taking the rest baseband except the first component in the baseband tensor as a non-main base, and taking L _A(i) The primary basis in the corresponding baseband tensor is P _B(i) A set { P } composed of principal bases in baseband tensors corresponding to all registered light field images _B(i) I/1 is not less than i is not more than M as a multi-exposure main base sequence, L is taken as a multi-exposure main base sequence _A(i) The kth non-primary base in the corresponding baseband tensor is P _O(i),k A set { P } formed by the kth non-principal basis at the same position in the baseband tensor corresponding to all the registered light field images _O(i),k I/1 is less than or equal to i is less than or equal to M } as a kth multi-exposure non-main base sequence, wherein M non-main bases are arranged in the kth multi-exposure non-main base sequence; wherein, K is equal to or greater than 1 and equal to or less than K, K represents the quantity of the rest baseband except the first component in the baseband tensor corresponding to each registration light field image, and K=V×U-1.

In this embodiment, in step 3, P _B(i) And P _O(i),k The acquisition process of (1) is as follows:

step 3.1: for L _A(i) Performing high-order singular value decomposition, wherein the decomposition formula is as follows: l (L) _A(i) ＝S _i × ₁ O ⁽¹⁾ × ₂ O ⁽²⁾ × ₃ O ⁽³⁾ × ₄ O ⁽⁴⁾ × ₅ O ⁽⁵⁾ The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is _i Represents L _A(i) Core tensor obtained through high-order singular value decomposition, S _i ∈R ^{U'×V'×H'×W'×C'} U ' and V ', H ', W ', C ' are S _i O, five dimensions of (2) ⁽¹⁾ And O ⁽²⁾ For two angular dimensions of the orthogonal factor matrix, O ⁽¹⁾ ∈R ^U×U' ，O ⁽²⁾ ∈R ^V×V' ，O ⁽³⁾ And O ⁽⁴⁾ For two spatial dimensions of orthogonal factor matrix, O ⁽³⁾ ∈R ^H×H' ，O ⁽⁴⁾ ∈R ^W×W' ，O ⁽⁵⁾ Orthogonal factor matrix representing color channel dimension, O ⁽⁵⁾ ∈R ^C×C' 。

Step 3.2: will L _A(i) ＝S _i × ₁ O ⁽¹⁾ × ₂ O ⁽²⁾ × ₃ O ⁽³⁾ × ₄ O ⁽⁴⁾ × ₅ O ⁽⁵⁾ Conversion toWherein the superscript "T" denotes a transpose of the vector or matrix。

Step 3.3: at the position ofOn the basis of (1) in order to realize the dimension reduction of the light field data, singular value decomposition is only needed to be carried out on the angular dimension of the light field, namely L is calculated _A(i) The baseband tensor obtained by the angular dimension of (a) through the high-order singular value decomposition is recorded as P _i ，/>Will P _i The first component of (a) being the primary radical P _B(i) Will P _i The kth remaining base band except the first component is used as the kth non-primary base P _O(i),k . Here, P _i All base band information is contained, the main base band contains consistent spatial information under all view angles, and the rest base bands except the first component, namely the non-main base band, mainly capture angle high-frequency information of images with different sub-apertures.

Step 4: inputting the multi-exposure main base sequence into a main base fusion module based on structural consistency detection to obtain a fusion main base, which is marked as P _BB The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the fusion master group P is obtained _BB And (3) acquiring a structural consistency detection binary image of each principal group in the multi-exposure principal group sequence.

In this embodiment, as shown in fig. 2, in step 4, a main base fusion module based on structural consistency detection is used to process the multi-exposure main base sequence to obtain a fusion main base P _BB The process of (1) is as follows:

step 4.1: dividing each master in a multi-exposure master sequence into a plurality of overlapping color blocks under RGB space; when the overlapped color blocks are segmented, a sliding window is adopted to slide in the main base with the step length of 2 pixel points, so that the segmentation is realized, and the size of the color blocks is 24 multiplied by 24.

Step 4.2: selecting a reference principal group from the multi-exposure principal group sequence, and marking the reference principal group as P _r The method comprises the steps of carrying out a first treatment on the surface of the Then uses the intensity mapping function and the reference principal base P _r To generate a multi-exposure virtual master base sequence, and marking the ith virtual master base in the multi-exposure virtual master base sequence as P _X(i) The method comprises the steps of carrying out a first treatment on the surface of the Traversing each principal in the multi-exposure principal base sequence, and defining the currently traversed principal base as a current principal base; wherein the reference principal group is { L ] _i ∈R ^{U×V×H×W×C} In the embodiment, under-exposure light field images, middle-exposure light field images and overexposure light field images are shot at the same spatial position, and a principal basis in a baseband tensor corresponding to a registration light field image corresponding to a light field image with the optimal exposure degree in i 1 is less than or equal to i and less than or equal to M, and the principal basis corresponding to the middle-exposure light field images is selected as a reference principal basis.

In this embodiment, the specific process of step 4.4 is:

step 4.4.1: will P _B(i) Is converted from RGB space into HSV model; then P is carried out under HSV model _B(i) Dividing into a plurality of overlapping blocks; re-comparing P _B(i) The size of the region division index and the set division threshold value of each overlapping block, and if the region division index is larger than or equal to the set division threshold value, the overlapping block is considered to belong to the overexposure region; if the area division index is smaller than the set division threshold, the overlapped block is considered to belong to the non-overexposure area; wherein, when dividing the overlapped block, a sliding window is adopted to obtain a step length of 2 pixel points at P _B(i) The size of the overlapped block is 24 multiplied by 24, the area division index is equal to the number of pixels with V component value exceeding 240 in the overlapped block divided by the total number of pixels contained in the overlapped block, and the set division threshold is equal to the product of 0.8 and the total number of pixels contained in the overlapped block.

Step 4.4.2: order theRepresenting P _B(i) Corresponding overexposure binary pattern, +.>The number of pixel points in each row is equal to P _B(i) The number of overlapping blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of overlapped blocks in each column is equal, if P _B(i) One of the overlapping blocks belonging to the overexposed region +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; if P _B(i) One of the overlapping blocks belonging to the non-overexposed region>Corresponding position in (a)The pixel value of the pixel point of (2) is 0.

Step 4.5: obtaining P by using a perceptual hash algorithm _B(i) The corresponding first structure consistency region detection binary image is recorded asThe acquisition process comprises the following steps: calculating a reference principal point P _r Each color block of (a) and P _B(i) Hamming distance of hash values of corresponding color blocks in the color image will be referred to the principal base P _r The j-th color block x in (1) _r,j And P _B(i) The j-th color block x in (1) _i,j The Hamming distance of the hash value of (2) is noted as delta _i,j ，δ _i,j ＝F _PHA (x _r,j ,x _i,j ) The method comprises the steps of carrying out a first treatment on the surface of the Then according to the reference principal point P _r All color blocks and P in (1) _B(i) Hamming distance of hash value of corresponding color block, get +.> The number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r One of the color blocks and P _B(i) The Hamming distance of the hash value of the corresponding color block in is smaller than or equal to the threshold th1, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r One of the color blocks and P _B(i) The Hamming distance of the hash value of the corresponding color block in is larger than the threshold th1, then +.>Image of pixel point at corresponding position in the image frameThe prime value is 1; wherein J is more than or equal to 1 and less than or equal to J, J represents P _B(i) The total number of color blocks, delta _i,j ∈[0,64]，F _PHA () Representing performing a perceptual hash algorithm operation, th1=5. />

Here, the perceptual hash algorithm is an existing algorithm, and δ is calculated using the same _i,j The general procedure of (1) is as follows: will P _r And P _B(i) The size of the j-th color block in each color block is reduced to 8 multiplied by 8 and then converted into a gray block; then calculating DCT (cosine discrete transformation) coefficients of each pixel point in each gray scale block to obtain DCT coefficient matrixes; then calculating the average value of DCT coefficient matrix of each gray block; then at P _r Average value sum P of DCT coefficient matrix of gray block corresponding to jth color block in (a) _B(i) The delta is obtained on the basis of the average value of DCT coefficient matrixes of gray blocks corresponding to the jth color block _i,j 。

Step 4.6: obtaining P by using normalized cross-correlation algorithm _B(i) The corresponding second structure consistency region detection binary image is recorded asThe acquisition process comprises the following steps: calculating a reference principal point P _r Signal structure component and P of each color block in (a) _B(i) The inner product of the signal structure components of the corresponding color block will be referred to the principal basis P _r The j-th color block x in (1) _r,j Is of signal structure component s _r,j And P _B(i) The j-th color block x in (1) _i,j Is of signal structure component s _i,j The inner product of (2) is denoted as θ _i,j ，Then according to the reference principal point P _r Signal structure component and P of all color blocks in (a) _B(i) The inner product of the signal structure components of the corresponding color block, is obtained +.> Each of (3)The number of pixel points of the row and P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r Signal structure component and P of one color block in (3) _B(i) The inner product of the signal structure components of the corresponding color block is greater than or equal to the threshold constant th2, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r Signal structure component and P of one color block in (3) _B(i) The inner product of the signal structure components of the corresponding color block is smaller than the threshold constant th2, then +. >The pixel value of the pixel point at the corresponding position in the pixel array is 1; wherein θ _i,j ∈[-1,1]，l _r,j Representing a reference principal group P _r The j-th color block x in (1) _r,j Epsilon is an infinitesimal number for preventing the denominator from being 0, th2=0.8.

Step 4.7: calculating a reference principal point P _r Average intensity component and P of each color block in (3) _X(i) The average intensity difference between the average intensity components of the corresponding color blocks will be referred to the principal basis P _r The j-th color block x in (1) _r,j The average intensity component l of (2) _r,j And P _X(i) The j-th color block x in (1) _X(i),j The average intensity component l of (2) _X(i),j The average intensity difference between them is noted as ζ _i,j ，ζ _i,j ＝||l _r,j -l _X(i),j I; then according to the reference principal point P _r Average intensity component and P of all color blocks in (1) _X(i) Average intensity difference between average intensity components of corresponding color blocks to obtain The number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r Average intensity component of one color block and P _X(i) The average intensity difference between the average intensity components of the corresponding color blocks is greater than or equal to the threshold constant th3, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r Average intensity component of one color block and P _X(i) The average intensity difference between the average intensity components of the corresponding color blocks is smaller than the threshold constant th3, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; wherein l _X(i),j Obtained as in step 4.3, th3=0.1.

Step 4.9: judging P _B(i) Whether or not each color block in (1) belongs to a structure consistency region, for P _B(i) The j-th color block x in (1) _i,j If Z _i Intermediate and x _i,j If the pixel value of the pixel point at the corresponding position is 0, then consider x _i,j Belongs to a structure consistency region; if Z _i Intermediate and x _i,j Image of pixel point at corresponding positionThe prime value is 1, then x is considered to be _i,j Belonging to the area of structural inconsistency.

Step 4.10: for P _B(i) Color blocks belonging to the region of structural inconsistency, P is used _X(i) The corresponding color blocks in (a) are replaced to realize compensation, namely: let x be _i,j Belonging to the region of structural inconsistency, then P is used _X(i) The j-th color block x in (1) _X(i),j Substitution x _i,j The method comprises the steps of carrying out a first treatment on the surface of the Will pair P _B(i) The compensation main base obtained after all the color blocks belonging to the structure inconsistency area are compensated is P' _B(i) 。

Step 4.11: and taking the next traversed main base in the multi-exposure main base sequence as the current main base, and returning to the step 4.3 to continue execution until the compensation main base corresponding to each main base in the multi-exposure main base sequence is obtained.

Step 4.12: according to the color blocks at the same position in the compensation main base corresponding to all main bases in the multi-exposure main base sequence, namely M color blocks, a corresponding fusion image block is obtained by reconstruction, and the method is characterized in that the method comprises the following steps of' _i,j M color blocks with i 1 being less than or equal to i being less than or equal to M are reconstructed to obtain a fusion image block which is recorded as Expressed as signal intensity component->Signal structure component->And average intensity component->Three parts (I)>Wherein x' _i,j Representing P' _B(i) The j-th color block of (a), a->And->Are scalar amounts->As vector (I)>max () is a function taking the maximum value, representation->Is used for the signal structure component of the (c) signal,ω ₁ () As a weighting function omega ₁ () Expressed by an exponential function, ">Beta is an exponential parameter, beta=4, +.>μ' _i Representing P' _B(i) Global mean, ω ₂ () As a weighting function omega ₂ () By two-dimensional normal distribution->exp () represents an exponential function based on natural radix e, e=2.71 …, μ _c 、l _c 、σ _x 、σ _y Are parameters in two-dimensional normal distribution, sigma _x ＝0.2，σ _y ＝0.5，μ _c ＝0.5，l _c ＝0.5，c' _i,j Represents x' _i,j Is's' _i,j Represents x' _i,j Signal structure component, l' _i,j Represents x' _i,j Mean intensity component of> Represents x' _i,j An average value of pixel values of all pixel points in the image.

Step 5: inputting the kth multi-exposure non-main base sequence into the rest baseband fusion modules based on mask division to obtain the kth fusion non-main base, which is marked as P _OB,k The method comprises the steps of carrying out a first treatment on the surface of the And when the mask is divided, the structure consistency detection binary image corresponding to the main base is taken as the mask.

In this embodiment, as shown in fig. 2, in step 5, the kth multi-exposure non-primary base sequence is processed by using the rest of baseband fusion modules based on mask division to obtain the kth fusion non-primary base P _OB,k The process of (1) is as follows:

step 5.1: dividing each non-master base in a kth multi-exposure non-master base sequence into a plurality of overlapping color blocks in RGB space; when the overlapped color blocks are segmented, a sliding window is adopted to slide in a non-main base with a step length of 2 pixel points, so that segmentation is realized, and the size of the color blocks is 24 multiplied by 24.

Step 5.2: classifying all color blocks in each non-master base in the kth multi-exposure non-master base sequence into masked regions and unmasked regions for P _O(i),k Will Z _i As a mask pair P _O(i),k The process of classifying each color block is as follows: for P _O(i),k The j-th color block x in (1) _i,k,j If Z _i Intermediate and x _i,k,j If the pixel value of the pixel point at the same position is 1, then consider x _i,k,j Belonging to a mask region; if Z _i Intermediate and x _i,k,j If the pixel value of the pixel point at the same position is 0, then consider x _i,k,j Belonging to the non-mask area.

Step 5.3: obtaining the kth fusion non-master P _OB,k Is denoted as P _Os,k ，Wherein, max () is a maximum function, ">Representation pair Z _i And (5) performing inverse operation to obtain a complementary graph.

step 5.4.1: for the color blocks belonging to the mask region in each non-master in the kth multi-exposure non-master sequence, P is used _X(i) The corresponding color blocks in (a) are replaced to realize compensation, namely: for P _O(i),k Let x be _i,k,j Belonging to the mask region, then P is used _X(i) The j-th color block x in (1) _X(i),j Substitution x _i,k,j The method comprises the steps of carrying out a first treatment on the surface of the Will pair P _O(i),k The compensation non-main base obtained after compensating all the color blocks belonging to the mask area is P' _O(i),k 。

Step 5.4.2: according to the color blocks corresponding to all non-main bases in the kth multi-exposure non-main base sequence and compensating the same position in the non-main base, namely M color blocks in total, reconstructing to obtain a corresponding fusion image block, and reconstructing the fusion image block according to { x' _i,k,j M color blocks with i 1 being less than or equal to i being less than or equal to M are reconstructed to obtain a fusion image block which is recorded as Expressed as signal intensity component->Signal structure component->And average intensity component->Three parts (I)>Wherein x' _i,k,j Representing P' _O(i),k The j-th color block of (a), a->Obtained as in step 4.12.

Step 5.4.3: according to P _O(i),k The inverse of the segmentation into overlapping color blocks willIs recombined into a mask region P _Od,k The method comprises the steps of carrying out a first treatment on the surface of the The average value of the pixel values of the pixel points at the same position in the overlapping area during recombination is used as the final pixel value after recombination.

In this embodiment, the specific process of step 7 is: according to { L _i ∈R ^{U×V×H×W×C} Parallax between a central sub-aperture image of a light field image with the optimal exposure degree in I1 is less than or equal to i is less than or equal to M and each non-central sub-aperture image is achieved by forward drawing The non-central sub-aperture images aligned with the central views are mapped to the corresponding positions to restore the angle information, so as to obtain L _H 。

To further illustrate the feasibility and effectiveness of the method of the invention, experimental verification of the method of the invention was performed.

In order to verify the performance of the inventive method, the inventive method employs a multi-exposure light field image dataset of a dynamic scene as a test dataset. In the dataset, 27 sets of dynamic multi-exposure light field image sequences with exposure values of-2 EV to +2EV are selected. Wherein each scene contains 3 light field images of different exposure levels, and each of them differs by 2EV. These include indoor and outdoor scenes, rigid objects and dynamic objects, motion of near and far objects, motion of large or small magnitude, and motion of single and multiple people.

To quantitatively evaluate the performance of the method of the invention, the method of the invention is compared with the existing two-dimensional image multi-exposure fusion method, and is a simple and practical alternative high dynamic range photographing method proposed by Mertens et al, a guided filter based image fusion method (GFF) proposed by Li et al, a ghost-free multi-exposure fusion method (DSIFT-MEF) proposed by Liu et al, a ghost-free multi-exposure image fusion method based on dense SIFT descriptors and guided filter proposed by Hayat et al, a rapid multi-Scale multi-exposure fusion method (FMSSPD-MEF) based on block decomposition of structures proposed by Li et al, a detail protection and edge-preserving block decomposition multi-exposure fusion Method (MESPD) proposed by Li et al.

Here, consider that the multi-exposure light field image dataset does not have a truth chart as a reference image; meanwhile, the data set contains a multi-exposure light field image sequence of the dynamic scene, and objective evaluation indexes of most multi-exposure image fusion methods are not applicable here. Therefore, a quality model MEF-SSIM suitable for dynamic scene Multi-Exposure Fusion (MEF) is adopted _d As objective evaluation index, the performance of the high dynamic range light field obtained by different MEF methods is evaluated. MEF-SSIM _d Is a quality evaluation index based on the principle of structural similarity, and considers the influence of a static area and a dynamic area on the fusion quality. The quality evaluation index respectively measures the structural similarity between corresponding sequences in two areas in the image, and combines the quality measurement values of the two areas into an overall quality score, wherein the larger the numerical value is, the better the performance of the method is. Table 1 shows MEF-SSIM of different MEF methods on a multi-exposure light field image dataset _d Score, where bold indicates the method of best performance and underline indicates the method of suboptimal performance.

TABLE 1 Multi-exposure fusion of two-dimensional images in MEF-SSIM using the method of the present invention and the existing two-dimensional image _d Comparison on objective evaluation index

Sequence(s)	Mertens	GFF	DSIFT-MEF	Hayat19	FMSSPD-MEF	MESPD	The method of the invention
								1	0.9058	0.9262	0.9050	0.9339	0.9180	0.9222	0.8996
2	0.8127	0.7883	0.8166	0.8260	0.8048	0.8142	0.9010
								3	0.8920	0.8874	0.8655	0.8945	0.9310	0.9289	0.8453
4	0.7907	0.7396	0.7964	0.7779	0.8421	0.8440	0.8594
								5	0.7736	0.7610	0.8766	0.8337	0.8494	0.8469	0.8676
6	0.9079	0.8885	0.8775	0.8810	0.9225	0.9264	0.8331
								7	0.8486	0.8501	0.8638	0.8449	0.8920	0.8929	0.8868
8	0.7893	0.7014	0.7782	0.7987	0.8091	0.8157	0.9270
								9	0.7952	0.7850	0.8034	0.8012	0.8173	0.8204	0.8837
10	0.8866	0.8765	0.8768	0.8953	0.9120	0.9151	0.8679
								11	0.8793	0.8839	0.8044	0.8723	0.9144	0.9178	0.8424
12	0.8293	0.8168	0.8424	0.8404	0.8477	0.8524	0.8017
								13	0.7780	0.7737	0.7927	0.8759	0.8218	0.8251	0.8774
14	0.8404	0.8354	0.8511	0.8974	0.8764	0.8784	0.9021
								15	0.8322	0.8281	0.7926	0.7752	0.8543	0.8574	0.8833
16	0.8036	0.8010	0.7785	0.8038	0.8424	0.8483	0.8768
								17	0.8743	0.8415	0.8725	0.8687	0.8869	0.8852	0.8887
18	0.8850	0.8830	0.8856	0.8378	0.9082	0.9128	0.8836
								19	0.8261	0.8263	0.7901	0.8124	0.8677	0.8711	0.8856
20	0.8219	0.7754	0.8505	0.8924	0.8372	0.8370	0.9178
								21	0.8723	0.8241	0.8148	0.7697	0.8937	0.8970	0.8378
22	0.8425	0.8550	0.8862	0.9000	0.8741	0.8850	0.9091
								23	0.9125	0.8999	0.9204	0.9127	0.9300	0.9332	0.9052
24	0.8415	0.8498	0.8431	0.8314	0.8727	0.8778	0.9036
								25	0.8306	0.8494	0.8762	0.8278	0.8640	0.8731	0.9047
26	0.8036	0.8079	0.8275	0.8118	0.8348	0.8394	0.8481
								27	0.8658	0.8973	0.8434	0.8463	0.9106	0.9160	0.9073
Average of	0.8423	0.8316	0.8419	0.8468	0.8717	0.8753	0.8795

From the MEF-SSIM set forth in Table 1 _d The score shows that the method of the invention has comparable performance with the FMSSPD-MEF method and the MESPD method in the fusion light field of 27 scenes, and is obviously superior to the methods of Mertens, GFF method, DSIFT-MEF method and Hayat. Compared with the existing two-dimensional image multi-exposure fusion method, the method can effectively combine the low dynamic range images of the underexposure light field, the middle exposure light field and the overexposure light field, can effectively improve the dynamic range of the light field image, and can obtain the artifact-free high dynamic range light field with better quality.

Fig. 3a, 3b, 3c show the central sub-aperture images of the underexposure light field image, the intermediate exposure light field image, and the overexposure light field image, respectively, in the 4 th test light field image sequence, and fig. 3d to 3j show the central sub-aperture images of the fused light field image obtained by processing the 4 th test light field image sequence using the method of Mertens et al, the method of Li et al (GFF), the method of Liu et al (DSIFT-MEF), the method of Hayat et al, the method of Li et al (FMSSPD-MEF), the method of Li et al (MESPD), and the method of the present invention, respectively. As can be seen from fig. 3d to 3i in combination with fig. 3a, 3b and 3c, the method of Mertens et al and the method of Li et al (GFF) do not give good fusion results in dynamic scenes, there is a large degree of artifacts in the forefront doll and some color distortion in the glass region of the background; the method of Liu et al (DSIFT-MEF) can solve the color distortion problem of the method, and alleviates the artifact phenomenon to a certain extent through SIFT descriptor matching, but cannot completely solve the problem for objects with larger motion amplitude; the method of Li et al (FMSSPD-MEF) and the method of Li et al (MESPD) enhance the content information and color of the fused region, as can be seen from the partial enlarged views in fig. 3h and 3i, but do not completely solve the artifact problem; the Hayat et al method can better detect the motion region, but cannot better calculate the weight map for fusion of the spot and doll. As can be seen from fig. 3j, the method of the present invention can better remove the artifacts in the motion region. From the straight line representing the EPI enlarged graph, the EPI oblique lines of the Mertens method, the GFF method and the DSIFT-MEF method have certain distortion, the color information of the Hayat method and the FMSSPD-MEF method is not rich enough, and the MESPD method and the method can well maintain the angle consistency and the color detail.

In addition to comparing the performance of the obtained fused light field image by observing the fusion result of the central sub-aperture image and the straight line retention condition of the EPI, in order to further visualize the imaging quality of the fused light field obtained by different methods, the capability of the different methods for retaining the geometrical characteristics of the light field is measured by adopting a light field image depth estimation algorithm proposed by Chen et al, and the more accurate the obtained depth map, the better the angular consistency of the method is maintained. Fig. 4a, 4b, and 4c show the central sub-aperture images of the underexposure light field image, the middle exposure light field image, and the overexposure light field image, respectively, in the 22 nd test light field image sequence, and fig. 4d to 4m show depth maps of the fused light field image obtained by processing the 22 nd test light field image sequence using the method of Mertens et al, the method of Li et al (GFF), the method of Liu et al (DSIFT-MEF), the method of Hayat et al, the method of Li et al (FMSSPD-MEF), the method of Li et al (MESPD), and the method of the present invention, respectively. The depth maps shown in fig. 4a, 4b, 4c do not reveal more complete scene objects, such as the table lamp and part of the desktop area book in fig. 4d, 4e, 4 f. It can be seen from the depth maps shown in fig. 4g, 4h, 4i, 4j, 4k, and fig. 4l, 4m that the depth map of the fused light field image obtained by all the multi-exposure image fusion methods is significantly improved compared with the depth map obtained by the original multi-exposure light field image sequence. As can be seen from fig. 4m, the method of the present invention can perform depth estimation of an image well in a wall area and a book portion.

In summary, the method can better maintain the geometric structure and the angular consistency of the light field, and further can optimize the generation of the depth map of the fused light field image.

Claims

1. The dynamic multi-exposure light field image fusion method based on the structure consistency detection is characterized by comprising the following steps of:

step 2: for { L ] _i ∈R ^{U×V×H×W×C} The sub-aperture images of each light field image in I1 is less than or equal to i is less than or equal to M are aligned in a central view to obtain a corresponding registration light field image, and L is taken as follows _i The corresponding registered light field image is denoted as L _A(i) The method comprises the steps of carrying out a first treatment on the surface of the Wherein L is _A(i) ∈R ^{U×V×H×W×C} ，L _A(i) The spatial resolution of (2) is W×H, the angular resolution is V×U, and the number of color channels is C;

step 3: performing high-order singular value decomposition on the angular dimension of each registered light field image to obtain a corresponding baseband tensor, taking a first component in the baseband tensor as a main base, taking the rest baseband except the first component in the baseband tensor as a non-main base, and taking L _A(i) The primary basis in the corresponding baseband tensor is P _B(i) A set { P } composed of principal bases in baseband tensors corresponding to all registered light field images _B(i) I1 is not less than i is not more than M as a multi-exposure main base sequence, L is taken as a multi-exposure main base sequence _A(i) In corresponding baseband tensorsThe kth non-primary radical of (2) is denoted as P _O(i),k A set { P } formed by the kth non-principal basis at the same position in the baseband tensor corresponding to all the registered light field images _O(i),k I1 is not less than i is not more than M } as a kth multi-exposure non-main base sequence, wherein M non-main bases are arranged in the kth multi-exposure non-main base sequence; wherein, K is more than or equal to 1 and less than or equal to K, K represents the quantity of the rest baseband except the first component in the baseband tensor corresponding to each registration light field image, and K=V multiplied by U-1;

step 4: inputting the multi-exposure main base sequence into a main base fusion module based on structural consistency detection to obtain a fusion main base, which is marked as P _BB The method comprises the steps of carrying out a first treatment on the surface of the Wherein, the fusion master group P is obtained _BB Acquiring a structural consistency detection binary image of each main base in a multi-exposure main base sequence;

Step 7: for a pair ofRecovering the angle information of the angle to obtain a complete angleInformation fusion light field image, denoted as L _H 。

2. The dynamic multi-exposure light field image fusion method based on the structural consistency detection according to claim 1, wherein the specific process of the step 2 is as follows: { L } calculation using optical flow estimation method _i ∈R ^{U×V×H×W×C} Parallax between the central sub-aperture image and each non-central sub-aperture image of each light field image in i 1 is less than or equal to i is less than or equal to M; then mapping each non-central sub-aperture image of each light field image onto the central sub-aperture image according to the corresponding parallax relation by utilizing reverse drawing to realize central view alignment, so as to obtain a corresponding registration light field image; wherein the non-central sub-aperture image is the remaining sub-aperture images except the central sub-aperture image.

3. The dynamic multi-exposure light field image fusion method based on the structural consistency detection according to claim 2, wherein the light flow estimation method is specifically a siftflow light flow estimation method.

4. A method of dynamic multi-exposure light field image fusion based on structural consistency detection according to any of claims 1-3, characterized in that in step 3, P _B(i) And P _O(i),k The acquisition process of (1) is as follows:

step 3.1: for L _A(i) Performing high-order singular value decomposition, wherein the decomposition formula is as follows: l (L) _A(i) ＝S _i × ₁ O ⁽¹⁾ × ₂ O ⁽²⁾ × ₃ O ⁽³⁾ × ₄ O ⁽⁴⁾ × ₅ O ⁽⁵⁾ The method comprises the steps of carrying out a first treatment on the surface of the Wherein S is _i Represents L _A(i) Core tensor obtained through high-order singular value decomposition, S _i ∈R ^{U'×V'×H'×W'×C'} U ' and V ', H ', W ', C ' are S _i O, five dimensions of (2) ⁽¹⁾ And O ⁽²⁾ For two angular dimensions of the orthogonal factor matrix, O ⁽¹⁾ ∈R ^U×U' ，O ⁽²⁾ ∈R ^V×V' ，O ⁽³⁾ And O ⁽⁴⁾ Orthogonal factor for two spatial dimensionsMatrix, O ⁽³⁾ ∈R ^H×H' ，O ⁽⁴⁾ ∈R ^W×W' ，O ⁽⁵⁾ Orthogonal factor matrix representing color channel dimension, O ⁽⁵⁾ ∈R ^C×C' ；

5. The method for dynamic multi-exposure light field image fusion based on structure consistency detection as claimed in claim 4, wherein in said step 4, a multi-exposure main base sequence is processed by a main base fusion module based on structure consistency detection to obtain a fusion main base P _BB The process of (1) is as follows:

step 4.2: selecting a reference principal group from the multi-exposure principal group sequence, and marking the reference principal group as P _r The method comprises the steps of carrying out a first treatment on the surface of the Then make use of the intensityDegree mapping function and reference principal base P _r To generate a multi-exposure virtual master base sequence, and marking the ith virtual master base in the multi-exposure virtual master base sequence as P _X(i) The method comprises the steps of carrying out a first treatment on the surface of the Traversing each principal in the multi-exposure principal base sequence, and defining the currently traversed principal base as a current principal base; wherein the reference principal group is { L ] _i ∈R ^{U×V×H×W×C} The i 1 is not less than i is not more than M, and the principal basis in the baseband tensor corresponding to the registration light field image corresponding to the light field image with the optimal exposure degree in the i is not less than M;

Step 4.5: obtaining P by using a perceptual hash algorithm _B(i) The corresponding first structure consistency region detection binary image is recorded asThe acquisition process comprises the following steps: calculating a reference principal point P _r Each color block of (a) and P _B(i) Hamming distance of hash values of corresponding color blocks in the color image will be referred to the principal base P _r The j-th color block x in (1) _r,j And P _B(i) The j-th color block x in (1) _i,j The Hamming distance of the hash value of (2) is noted as delta _i,j ，δ _i,j ＝F _PHA (x _r,j ,x _i,j ) The method comprises the steps of carrying out a first treatment on the surface of the Then according to the reference principal point P _r All color blocks and P in (1) _B(i) Hamming distance of hash value of corresponding color block, get +.> The number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r One of the color blocks and P _B(i) The Hamming distance of the hash value of the corresponding color block in is smaller than or equal to the threshold th1, then +. >The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r One of the color blocks and P _B(i) The Hamming distance of the hash value of the corresponding color block in is larger than the threshold th1, then +.>Corresponding to (a)The pixel value of the pixel point of the position is 1; wherein J is more than or equal to 1 and less than or equal to J, J represents P _B(i) The total number of color blocks, delta _i,j ∈[0,64]，F _PHA () Representing performing a perceptual hash algorithm operation, th1=5;

step 4.6: obtaining P by using normalized cross-correlation algorithm _B(i) The corresponding second structure consistency region detection binary image is recorded asThe acquisition process comprises the following steps: calculating a reference principal point P _r Signal structure component and P of each color block in (a) _B(i) The inner product of the signal structure components of the corresponding color block will be referred to the principal basis P _r The j-th color block x in (1) _r,j Is of signal structure component s _r,j And P _B(i) The j-th color block x in (1) _i,j Is of signal structure component s _i,j The inner product of (2) is denoted as θ _i,j ，/>Then according to the reference principal point P _r Signal structure component and P of all color blocks in (a) _B(i) The inner product of the signal structure components of the corresponding color block, is obtained +.> The number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r Signal structure component and P of one color block in (3) _B(i) The inner product of the signal structure components of the corresponding color block is greater than or equal to the threshold constant th2, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r Signal structure component and P of one color block in (3) _B(i) The inner product of the signal structure components of the corresponding color block is smaller than the threshold constant th2, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; wherein θ _i,j ∈[-1,1]，l _r,j Representing a reference principal group P _r The j-th color block x in (1) _r,j Epsilon is an infinitesimal number for preventing the denominator from being 0, th2=0.8;

step 4.7: calculating a reference principal point P _r Average intensity component and P of each color block in (3) _X(i) The average intensity difference between the average intensity components of the corresponding color blocks will be referred to the principal basis P _r The j-th color block x in (1) _r,j The average intensity component l of (2) _r,j And P _X(i) The j-th color block x in (1) _X(i),j The average intensity component l of (2) _X(i),j The average intensity difference between them is noted as ζ _i,j ，ζ _i,j ＝||l _r,j -l _X(i),j I; then according to the reference principal point P _r Average intensity component and P of all color blocks in (1) _X(i) Average intensity difference between average intensity components of corresponding color blocks to obtain The number of pixel points in each row is equal to P _B(i) The number of color blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of color blocks in each column is equal, if the primary base P is referred to _r Average intensity component of one color block and P _X(i) The average intensity difference between the average intensity components of the corresponding color blocks is greater than or equal to the threshold constant th3, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 0; if reference is made to principal group P _r Average intensity component of one color block and P _X(i) The average intensity difference between the average intensity components of the corresponding color blocks is smaller than the threshold constant th3, then +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; wherein th3=0.1;

step 4.8: according toObtaining P _B(i) The structural consistency detection binary pattern of (2) is marked as Z _i ，

step 4.10: for P _B(i) Color blocks belonging to the region of structural inconsistency, P is used _X(i) The corresponding color blocks in (a) are replaced to realize compensation, namely: let x be _i,j Belonging to the region of structural inconsistency, then P is used _X(i) The j-th color block x in (1) _X(i),j Substitution x _i,j The method comprises the steps of carrying out a first treatment on the surface of the Will pair P _B(i) In areas of structural inconsistencyThe compensation main base obtained after all the color blocks are compensated is P' _B(i) ；

step 4.12: according to the color blocks at the same position in the compensation main base corresponding to all main bases in the multi-exposure main base sequence, namely M color blocks, a corresponding fusion image block is obtained by reconstruction, and the method is characterized in that the method comprises the following steps of' _i,j M color blocks with i 1 being less than or equal to i being less than or equal to M are reconstructed to obtain a fusion image block which is recorded as Expressed as signal intensity component->Signal structure component->And average intensity component->Three parts (I)>Wherein x' _i,j Representing P' _B(i) The j-th color block of (a), a->And->Are scalar amounts->In the form of a vector which is a vector,max () is a function taking the maximum value, representation->Is used for the signal structure component of the (c) signal,ω ₁ () As a weighting function omega ₁ () Expressed by an exponential function, ">Beta is an exponential parameter, beta=4, +.>μ' _i Representing P' _B(i) Global mean, ω ₂ () As a weighting function omega ₂ () By two-dimensional normal distribution->exp () represents an exponential function based on natural radix e, e=2.71 …, μ _c 、l _c 、σ _x 、σ _y Are parameters in two-dimensional normal distribution, sigma _x ＝0.2，σ _y ＝0.5，μ _c ＝0.5，l _c ＝0.5，c' _i,j Represents x' _i,j Is's' _i,j Represents x' _i,j Signal structure component, l' _i,j Represents x' _i,j Mean intensity component of>μ _x'i,j Represents x' _i,j An average value of pixel values of all pixel points in the image;

6. The dynamic multi-exposure light field image fusion method based on the structural consistency detection according to claim 5, wherein the specific process of the step 4.4 is as follows:

step 4.4.2: order theRepresenting P _B(i) Corresponding overexposure binary pattern, +.>The number of pixel points in each row is equal to P _B(i) The number of overlapping blocks per line is equal, < >>The number of pixel points in each column and P _B(i) The number of overlapped blocks in each column is equal, if P _B(i) One of the overlapping blocks belonging to the overexposed region +.>The pixel value of the pixel point at the corresponding position in the pixel array is 1; if P _B(i) One of the overlapping blocks belonging to the non-overexposed region>The pixel value of the pixel point at the corresponding position in (a) is 0.

7. The method for dynamic multi-exposure light field image fusion based on structure consistency detection according to claim 5, wherein in said step 5, the k multi-exposure non-primary base sequence is processed by using the rest of baseband fusion modules based on mask division to obtain the k fusion non-primary base P _OB,k The process of (1) is as follows:

Step 5.2: classifying all color blocks in each non-master base in the kth multi-exposure non-master base sequence into masked regions and unmasked regions for P _O(i),k Will Z _i As a mask pair P _O(i),k The process of classifying each color block is as follows: for P _O(i),k The j-th color block x in (1) _i,k,j If Z _i Intermediate and x _i,k,j Image of the same positionIf the pixel value of the pixel point is 1, then consider x _i,k,j Belonging to a mask region; if Z _i Intermediate and x _i,k,j If the pixel value of the pixel point at the same position is 0, then consider x _i,k,j Belongs to a non-mask area;

step 5.4.1: for the color blocks belonging to the mask region in each non-master in the kth multi-exposure non-master sequence, P is used _X(i) The corresponding color blocks in (a) are replaced to realize compensation, namely: for P _O(i),k Let x be _i,k,j Belonging to the mask region, then P is used _X(i) The j-th color block x in (1) _X(i),j Substitution x _i,k,j The method comprises the steps of carrying out a first treatment on the surface of the Will pair P _O(i),k The compensation non-main base obtained after compensating all the color blocks belonging to the mask area is P' _O(i),k ；

Step 5.4.2: according to the color blocks corresponding to all non-main bases in the kth multi-exposure non-main base sequence and compensating the same position in the non-main base, namely M color blocks in total, reconstructing to obtain a corresponding fusion image block, and reconstructing the fusion image block according to { x' _i,k,j M color blocks with i 1 being less than or equal to i being less than or equal to M are reconstructed to obtain a fusion image block which is recorded as Expressed as signal strengthDegree component->Signal structure component->And average intensity component->Three parts (I)>Wherein x' _i,k,j Representing P' _O(i),k In the (c) color block(s),obtained as in step 4.12;

8. The method for dynamic multi-exposure light field image fusion based on structure uniformity detection according to claim 4, wherein in said step 6,

9. the dynamic multi-exposure light field map based on structural consistency detection of claim 2The image fusion method is characterized in that the specific process of the step 7 is as follows: according to { L _i ∈R ^{U×V×H×W×C} Parallax between a central sub-aperture image of a light field image with the optimal exposure degree in I1 is less than or equal to i is less than or equal to M and each non-central sub-aperture image is achieved by forward drawing The non-central sub-aperture images aligned with the central views are mapped to the corresponding positions to restore the angle information, so as to obtain L _H 。