WO2014206503A1 - Automatic noise modeling for ghost-free image reconstruction - Google Patents
Automatic noise modeling for ghost-free image reconstruction Download PDFInfo
- Publication number
- WO2014206503A1 WO2014206503A1 PCT/EP2013/066942 EP2013066942W WO2014206503A1 WO 2014206503 A1 WO2014206503 A1 WO 2014206503A1 EP 2013066942 W EP2013066942 W EP 2013066942W WO 2014206503 A1 WO2014206503 A1 WO 2014206503A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- combined image
- images
- individual images
- pixel
- Prior art date
Links
- 238000000034 method Methods 0.000 claims abstract description 87
- 238000009826 distribution Methods 0.000 claims description 20
- 238000005286 illumination Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 claims description 2
- 238000009877 rendering Methods 0.000 claims description 2
- 238000003384 imaging method Methods 0.000 abstract description 5
- 238000013179 statistical model Methods 0.000 abstract description 3
- 238000006073 displacement reaction Methods 0.000 description 14
- 239000003086 colorant Substances 0.000 description 11
- 238000012360 testing method Methods 0.000 description 11
- 238000001514 detection method Methods 0.000 description 10
- 230000035945 sensitivity Effects 0.000 description 10
- 238000002372 labelling Methods 0.000 description 7
- 230000033001 locomotion Effects 0.000 description 7
- 238000001914 filtration Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000012935 Averaging Methods 0.000 description 4
- 238000013459 approach Methods 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000005316 response function Methods 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000011156 evaluation Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- PEDCQBHIVMGVHV-UHFFFAOYSA-N Glycerine Chemical compound OCC(O)CO PEDCQBHIVMGVHV-UHFFFAOYSA-N 0.000 description 1
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 241000272534 Struthio camelus Species 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000002146 bilateral effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000000593 degrading effect Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 206010016256 fatigue Diseases 0.000 description 1
- 238000009499 grossing Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 229920006395 saturated elastomer Polymers 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- UBCKGWBNUIFUST-YHYXMXQVSA-N tetrachlorvinphos Chemical compound COP(=O)(OC)O\C(=C/Cl)C1=CC(Cl)=C(Cl)C=C1Cl UBCKGWBNUIFUST-YHYXMXQVSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10141—Special mode during image acquisition
- G06T2207/10144—Varying exposure
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20208—High dynamic range [HDR] image processing
Definitions
- the present invention relates to a computer-implemented method and a device for constructing combined images from a set of individual images that are ghost-free, for example the construction of high dynamic range images or panoramic images of a dynamic scene.
- HDR high dynamic range
- panoramic images of dynamic scenes without introducing ghosting is difficult.
- the inter-frame capture time between input images can be long enough to cause significant object displacement between images of cluttered of dynamic scenes, e.g. in cities or at popular tourist destinations, or in scenes with fast motion.
- ghosting artifacts are introduced.
- strategies for avoiding such artifacts include aligning the scene before color averaging, performing joint alignment and reconstruction using one reference image from the LDR set or detecting regions with moving objects and excluding their images from the average.
- optical flow methods can correct short displacements caused by camera shake and moving objects, they typically fail to estimate large displacements, and have difficulties with disocclusions occurring in highly cluttered and highly dynamic scenes.
- Joint alignment and reconstruction methods define a reference image to which all other images are patch-wise aligned. Ill-exposed regions in the reference are filled using an adaption of the bi-directional similarity function between the remaining input images and the HDR result.
- a single reference image might not correspond to the desired output, and a better result could be composited using parts from different images. For example, people in any chosen reference image may be occluded in other input images. In such cases, the dynamic ranges of reference image objects cannot be completed.
- Most HDR construction methods try to detect image regions that could produce ghosting artifacts and exclude them from the average. In general, these methods assume that the images are already aligned, and rely on an ability to test if the colors observed for the same pixel in different images are consistent.
- Consistency is tested with criteria such as pair-wise irradiance difference, irradiance difference to a background model, distance to the intensity mapping function, variance of the irradiance estimates, average ratio between images, probability of the distance to a background model, correlation with a reference image, difference of the entropy on local image patches and difference between gradient orientations.
- criteria such as pair-wise irradiance difference, irradiance difference to a background model, distance to the intensity mapping function, variance of the irradiance estimates, average ratio between images, probability of the distance to a background model, correlation with a reference image, difference of the entropy on local image patches and difference between gradient orientations.
- each of these consistency tests requires setting fixed thresholds that are unlikely to generalize well to the noise proper- ties of different cameras and exposure settings. All of these strategies fail under challenging conditions that occur in reality. There is no single best method and the selection of an adequate approach depends on the user's goal. Similar problems occur when constructing panoramic images from a set of images.
- a method for constructing a combined image from a set of individual images comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image.
- the step of determining the irradiance of a pixel of the combined image may comprises determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset.
- the subset may be determined based on a statistical model of color values in the set of individual images.
- the invention considers their noise distributions of the color values measured by the camera.
- Noise distributions depend on the camera and exposure settings, and can be modeled using Gaussian distributions. Distribution variance is proportional to the light intensity and is inversely proportional to the squared exposure time, and depends on camera parameters such as the gain factor and the readout noise parameters. Given that the noise depends on the scene irradiance and the camera parameters, no fixed threshold can be reliably set to detect image differences across camera models and scenes.
- the noise distribution may be predicted from the input images and used to normalize the color consistency tests. This automatic noise modeling approach improves the discriminative power of ghosting detection.
- the statistical model may based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera.
- the gain factor may estimated based on regions of essentially constant illumination in the input images.
- the subset may be determined based on a measure of spatial coherence.
- the subset may be determined based on a variance of the subset.
- the combined image may be a high dynamic range image.
- the combined image may also be a panoramic stitching of the individual images.
- Outputting the combined may comprise at least one step of storing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display.
- the method may be implemented in special purpose hardware, e.g. for inclusion in a camera, or on a general purpose computer.
- a final image may be chosen such that each pixel has high signal-to-noise (SNR) ratio and is spatially compatible with its neighbors in other images.
- SNR signal-to-noise
- This optimization directly produces results with lower noise than existing methods, and is especially useful for images acquired in low light, e.g., night shots.
- the invention comprises a simple method for estimating the camera gain factor from arbitrary images, enabling an automatic prediction of a camera noise range.
- the method is characterized in that the gain factor is estimated based on regions of essentially constant illumination in the input images.
- the HDR imaging method according to the invention fully automatically takes advantage of a camera noise model for performing reliable ghost-free reconstruction across different cameras and scenes. It obtains the irradiance of every pixel with lower noise and fewer artifacts than existing state-of-the-art approaches, even for very challenging scenes including crowded places with small and large object displacements and low-light shots. All these scenes are computed with no parameter tuning.
- the invention further comprises a device for constructing a combined image from a set of individual images, comprising an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like; an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and an output module for outputting the combined image, such as a display and / or a network and / or a storage interface.
- an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like
- an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images
- an output module for outputting the combined image, such as a display and / or a network and / or a storage interface.
- Fig. 1 shows a schematic flow diagram of a method for according / or constructing an HDR image to an embodiment of the invention
- Fig. 2 is a one-dimensional illustration of HDR reconstruction based on consistent subsets of individual images;
- Fig. 3 shows the effects of varying parameters ⁇ and ⁇ in equation 7;
- Fig. 4 illustrates a process of gain calibration according to an embodiment of the invention
- Fig. 5 shows confidences of camera gain estimation
- Fig. 6 illustrates the sensitivity of a deghosting method according to an embodiment of the invention to gain calibration; shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars); shows on the bottom: hand-held capture via in-camera bracketing.
- the dynamic car motions are reconstructed ghost free.
- the third image was selected as reference for the method of Sen et al.
- the inventive method automatically selects well- exposed sources for every region. shows a comparison with the method of Sen et al. on the Square at night sequence (top). The second exposure was selected as reference for Sen et al.'s method. Due to noise, their method finds few similar patches in other exposures. This implies that the dynamic range cannot be effectively extended using other input images (middle). The inventive method selects consistent sources with as low variance as possible, preventing the appearance of noise in the result (bottom). shows a comparison with the method of Zimmer et al.
- Figure 1 shows a method for constructing an HDR image from a set of individual images according to an embodiment of the invention.
- the input is a set of images taken with a static or hand held camera at different exposure times, where pixel values in the images are the raw output of the camera, i.e., before any of the camera's internal processing.
- step 110 if captured hand-held, the images are robustly registered using a global homography computed with RANSAC from sparse SURF keypoint matches.
- the method estimates an irradiance image where each pixel is constructed as a weighted average of colors of the corresponding pixels across the input images in steps 120, 130, 140 and 150.
- ghosting artifacts would be generated by averaging a set of pixels which includes an inconsistent subset.
- the method identifies a consistent subset of images per pixel location and reconstructs the final irradiance value as an average of consistent pixel colors. This avoids having to select a reference image, which might hinder the capabilities for dynamic range extension, or having to build a background model, which requires that the background be more like- ly to be observed at every image location— this is not necessarily true for cluttered scenes.
- shot noise is introduced by the process of light emission, which follows a Poisson distribution where the variance is equal to the mean.
- Readout noise comprises several other signal-independent sources affecting the acquisition process of digital cameras; it is modeled well by a Gaussian distribution with zero mean.
- the number of photon-electrons collected by the camera at every pixel is linearly proportional to the incident irradiance. This derives from the properties of the photo-electric effect on silicon-based sensors for visible wavelengths.
- the raw camera output is also linearly proportional to the number of collected photon- electrons. This relation is known as the camera response function f.
- the slope of this function corresponds to the camera's gain factor g. This factor is proportional to the ISO setting, e.g., the gain at ISO400 is four times the gain at ISO100.
- the response function f is linear for raw output, it is possible to recover the number of photon-electrons collected by the camera to approximate the probability distribution of each pixel measurement.
- the inverse of the response function i.e. the amount of collected photon-electrons, is estimated by
- the dark frame b i is an image acquired with same exposure time as v ; but without incoming light (e.g., with the lens cap on).
- the product ⁇ ( ⁇ ) between the image' s exposure time t i and the incident irradiance x(p) is known as the exposure, which is proportional to the number of photon-electrons collected by the camera.
- Dark frames measure the camera output induced by thermal energy only (not by light). In the present embodiment, it is assumed that the values in the dark frame are negligible or, equivalently, that dark frame subtraction is performed in-camera, which is common in modern digital cameras. Thus, the dark frame b ⁇ p) is replaced with the black level L Q of the camera.
- the exposure ⁇ ( ⁇ ) follows a Poisson distribution, and the uncertainty in its measurement corresponds to the shot noise. This distribution is approximated using a Gaussian to model the variance of the irradiance estimate x(p) .
- the variance of x(p) in image i can be derived as where ⁇ is the variance of the readout noise, which is also modeled using a Gaussian.
- the parameters g , L 0 , and t i need to be estimated the.
- the exposure time t i can be read directly from the digital image file.
- the black level Lo, and the readout variance are calibrated using the method described in [Janesick 2001 ; Granados et al. 2010] .
- This method estimates Lo and ⁇ as the mean and variance, respectively, of the pixel values of a black frame, i.e., an image taken with no incident light and no integration time, practically, a very short exposure time. In principle, this data could be obtained for every camera model from the manu- facturer.
- Figure 2 shows a one-dimensional illustration of HDR reconstruction.
- An HDR image can be reconstructed by averaging the irradiance estimates derived from the color of corresponding pixel locations in the input images. ghosting artifacts appear whenever sets of inconsistent colors are included in the average.
- the problem of HDR deghosting can be defined as selecting consistent subsets of colors for every pixel.
- two pixels at corresponding locations in different images are consistent if the corresponding color difference follows the predicted color differ- ence distribution, and a group of pixels is self-consistent if all the pixels are pair-wise consistent.
- a group of pixels is self-consistent if all the pixels are pair-wise consistent.
- the probability d (p) - x) (p) may be estimated: Since Gaussian, which for consistent pairs has zero mean and variance. where ⁇ k and ⁇ k are obtained from equation 2. Given the variance ⁇ k , the probability that observations at pixel p on images i, j are consistent may be estimated by comparing the corresponding irradiance differences with the expected noise distribution of the images on every color channel:
- N is the standard Gaussian random variable with mean zero and variance one.
- the estimate Pr(/? I ⁇ i, j ⁇ ) can be noisy (e.g., when the image is taken under low-light or when the camera has a high readout noise).
- the difference image d ⁇ (p) is smoothed using bilateral filter- ing. This step may be referred to as noise-adaptive difference filtering (DF).
- DF noise-adaptive difference filtering
- This filtering introduces dependencies between the distributions of neighboring pixels. However, this dependency occurs mostly between pixels that have already similar distributions. Given this similarity, the net effect of the filtering is an attenuation of the tails of the difference distribution. This allows obtaining higher detection sensitivity for the same specificity level.
- the variance of the difference function ⁇ t (p) also varies for every pixel and image pair.
- V ⁇ v ; ⁇ , i e T be the set of images in the exposure sequence.
- the probability that a given subset L ; e 2 V is consistent at a pixel p is defined as the minimum of the pair- wise consistency:
- Pr(p ⁇ L t ) min ⁇ Pr(p I ⁇ i, j ⁇ e L l x L l ) ⁇ . (5)
- Pr(p I ⁇ / ⁇ ) 1 - maximin Pr(v ⁇ p )), max Pr(v ⁇ p )) ⁇ (6) k e k oe with I G ⁇ R, G, B ⁇ .
- Pr ue and Pr oe correspond to the under- and over-exposure probability, respectively, of an observation according to the distribution of the (Gaussian) readout noise, when centered at the black level and saturation level, respectively.
- the choice of a particular subset to be averaged is ill-posed. This choice may be regularized by requiring that the selected subsets be spatially color-consistent.
- the pixel-wise consistency test and the spatial consistency test cast the HDR deghosting problem as a Markov random field (MRF)-type global energy minimization.
- MRF Markov random field
- a consistent subset of irradiance estimates is selected for every pixel to reconstruct the final pixel value.
- Arbitrarily any one subset may introduce unnatural color discontinuity in the final image (see yellow arrows in figure 6).
- This problem is resolved by introducing a spatial continuity measure as a regularizer, and finds a solution by minimizing a global energy that takes into account the consistence at every pixel location as well as their spatial coherence.
- This labeling is obtained by minimizing the energy functional: comprising terms for the consistency potential, the variance potential and the prior potential, where L pq corresponds to the index of the subset L F u L Fq e L, N denotes the
- the role of the consistency potential is to penalize image sets that do not have a high consistency probability, whereas the variance potential ensures that the final reconstruction has low noise by penalizing groups with larger variance. Additionally, the prior potential encourages the final reconstruction to agree with its spatial neighbors at every pixel.
- a confidence value is set to determine whether a set of images L F is consistent or not. This encodes an important design choice: W want to select any consistent group, not the most consistent one. This design gives more freedom to the optimization algorithm to construct the final composite.
- the variance potential prevents the generation of trivial solutions.
- Well-exposed observations from a single image are defined as consistent. Under this definition, selecting a single well-exposed image for reconstructing the whole image would create a labeling with minimum energy. This selection is undesired since the information contained in other consistent images is left out of the average, thus degrading the SNR of the resulting irradiance estimates. Instead, whenever two distinct sets are consistent, the set that produces lower-variance estimates regardless of the set size is preferable.
- the variance potential V(L ⁇ ) encodes preference by assigning higher costs to groups that provide higher-variance estimates. The relative variance of each estimate is:
- Parameter selection There are three hyper-parameters to be tuned in equation 7: The weight ⁇ for the variance potential, the confidence value a of the consistency tests, and the weight ⁇ of the prior potential.
- the parameter ⁇ is set to 0: 1 to ensure that the variance potential in equation 7 produces order-of-magnitude lower costs than the consistency potential. This design instructs the algorithm to prefer consistent subsets, but when presented with several consistent options, it will prefer the one with the least noise.
- the other two parameters were determined based on a performance evaluation using the challenging busy square sequence (figure 8).
- the confidence value a was set to 0.98, which provides a good trade-off between sensitivity and specificity of ghost detection when compared to a manual annotation of the scene (see Sec. 3 for details). In preliminary experiments, variations of a did not affect the results significant- ly.
- Parameter ⁇ is set to 20, which is the lowest value that did not introduce visual discontinuities on the test sequence (see figure 6). Once determined, the parameters ⁇ , ⁇ , ⁇ ; were fixed for all experiments.
- Figure 3 shows the effects of varying parameters ⁇ and ⁇ in equation 7.
- the right- hand side colors correspond to the estimated labeling, which is proportional to the noise of the selected subset (blue: higher SNR, red: higher SNR).
- Parameter ⁇ is set to 2
- is set to 0.1 (outlined in red) since these produce a good trade-off between low noise and spatial consistency.
- the algorithm mostly selects a single image as source except for ill-exposed regions (white arrows), as only such re- gions are considered inconsistent. This behavior holds regardless of the weight ⁇ given to the prior potential.
- the remaining subsets of larger SNR are preferred providing they are consistent, resulting in labelings that adapt more to the scene.
- visual discontinuities marked by yellow arrows
- the camera gain g can be calibrated by a method according to the invention that works directly from an input image set using regions of constant illumination in the input images. More specifically, an input image, e.g. the best exposure of the input set, is divided into super pixels (VEKSLER, O., BOYKOV, Y., AND MEHRANI, P. 2010. Superpixels and supervoxels in an energy optimization framework. In Proc. ECCV, vol. 6315, 211-224) and then the mean and variance of their color values are estimated. From the resulting mean-variance scatter plot (figure 3-top), the minimum variance is selected for each digital value, and RANSAC (FISCHLER, M. A., AND BOLLES, R. C. 1981.
- FISCHLER FISCHLER, M. A., AND BOLLES, R. C. 1981.
- Random sample consensus A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381-395) is used to fit a line that passes through (L 0 , ), i.e., through the expected variance at the black level.
- FIG. 4 illustrates this process.
- the top row shows the mean and variance color value of each super pixel (yellow and red dots).
- Red dots at the top and the bottom correspond to low-variance super pixels that are used for calibration.
- Yellow dots represent the remaining super pixels.
- Green lines show the predicted noise by image-based calibration, blue dashed lines show the prediction by flat-field calibration.
- the super pixels with minimum variance are selected as proxies for images exposed with a constant illumination at every pixel, such that every pixel color can be assumed to be a sample of the same random variable (shown in red). This selection is justified as only shot noise and readout noise contribute to the variance of image regions with constant illumination and, therefore, these noise sources determine the lower bound of the color variance.
- Using super pixels to estimate the lower bound of image variance was proposed in [Liu et al. 2008] for image denoising, but has not been applied for HDR re- construction.
- the deghosting method according to the invention is robust to calibration errors, so even in cases where the gain is overestimated (b), the final images are still free of ghosting artifacts.
- Figure 5 shows box plots of the 1st, 25th, 50th, 75th and 99th percentiles of the distribution of gain factors obtained from a flat- field calibration (JANESICK, J. 2001. Scientific charge-coupled devices. SPIE Press, GRANADOS, M., AJDIN, B., WAND, M., THEOBALT, C, SEIDEL, H.-P., AND LENSCH, H. P. A. 2010.
- CVPR 215-222
- the gray line denotes the true gain of the camera.
- the expected gain for both methods is very close, but the variance of image -based calibration is higher. Despite this, the gain estimate can still be used to reconstruct ghostfree HDR images (see figure 6).
- the red curve illustrates the dependency between the gain factor and the image variance prediction.
- the image-based calibration is sufficiently accurate. Importantly, since a wide range of scenes contain locally flat regions, this gain calibration approach allows applying the deghosting algorithm directly without requiring users to capture flat field images. However, its accuracy is content dependent, and figure 4b shows an example image from which the camera gain could not be correctly estimated.
- the image flat regions cover a limited color band, which misleads the slope estimation (figure 4b-top). That said, ghosting artifacts typically only appear when the variance within super pixels (and thus the gain) is underestimated (e.g. below the true gain, see figure 6), which is a highly unlikely scenario in practice. In general, when the camera gain is over-estimated, the predicted noise for the input images is under-estimated.
- FIG. 6 shows the sensitivity of the inventive deghosting method to gain calibration accuracy.
- g, ⁇ - g denote the mean and standard deviation of the flat-field estimates.
- the method is robust to slight under-estimation (b) and large over-estimation (d) of the camera gain: When it is under-estimated (which occurs seldom, see figure 5), ghosting artifacts can appear (a, magenta arrow). Conversely, when the gain is over- estimated, it leads to low SNR (d), but it does not introduce ghosting artifacts.
- Table 1 Summary of test sequences. HH; Hand-held, SC: scene clutter, SD: small object displacements, LD: large object displacement, LL: low light. Gain factor for ISO J 00 setting.
- the gain factor was estimated independently for every sequence using image-based calibration as described above. Although the gain needs to be estimated only once for any given camera model, it was calibrated on each sequence in order to validate the robustness of the inventive method.
- Per scene three or five images were captured in RAW mode at steps of one or two stops, respectively.
- the input color image is constructed from the green, red, and blue observation found on each 2 x 2 block of pixels in the un-demosaiced raw image. One of the four observations in each block is not used. If captured hand-held the images are registered using a global homography computed with RANSAC from sparse SURF keypoint matches. After HDR reconstruction, the images were white balanced and tone mapped.
- Figure 7 shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars). Focusing on small displacement quality in figure 7, of it show that the inventive method produces convincing results.
- the flower shop and busy square (figure 8) sequences show how strong scene clutter can cause severe ghosting artifacts in an HDR reconstruction which includes every image into the irradiance average.
- the square at night (figure 10) sequence shows that the inventive method is robust to high image noise.
- the cafe' terrace sequence (figure 9) contain relatively small object displacements for which previous reference-image-based methods are designed (SEN, P., KALANTARI, N.
- the method of Sen et al. finds patch-wise correspondences between the reference and the remaining input images.
- the reference image is of low dynamic range, regions that are ill-exposed or contain high noise might not be matched correctly to other exposures. This is demonstrated in figure 9, where the dynamic range of over-exposed regions could not be enhanced (indicated by arrows).
- figure 10 shows that strong noise in the reference may restrict correspondence finding in other images for range enhancement, leading to a noisy HDR image.
- the inventive method is designed to select sets of images that are both consistent and have low noise, resulting in HDR images with comparatively less noise.
- the inventive method could also generate noisy image regions (see figure 8, right) if this guarantees consistency, as this is weighted more than achieving low noise (see equation 7).
- Zim- mer et al. establish correspondences using optical flow, which will fail on objects that undergo large displacements or disocclusions. This failure case is shown on the person in figure 11, where ghosting artifacts are introduced after two instances of a person undergoing local motion cannot be properly aligned. In contrast, the inventive method selects a single self-consistent image, thus preventing the introduction of ghosting artifacts.
- DF noise-adaptive difference filtering
- the inven- tion had higher results than previous methods (46.7-58.3% vs. 43.6% for Grosch).
- the adaptive DF the specificity was comparable to that of other methods, including those methods based on invariants.
- the method achieves the best sensitivity, which is crucial for removing ghosts, without compromis- ing the specificity, which is crucial for producing low-noise HDR images.
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Image Processing (AREA)
Abstract
In order to obtain images from a combination of individual images, e.g. in high dynamic range imaging or panoramic stitching, the invention proposes a computer- implemented method, comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image. Preferably, the irradiance is determined based on a subset of the set of individual images, selected according to a statistical model.
Description
Automatic Noise Modeling for Ghost-free image Reconstruction
The present invention relates to a computer-implemented method and a device for constructing combined images from a set of individual images that are ghost-free, for example the construction of high dynamic range images or panoramic images of a dynamic scene.
Introduction
The acquisition of high dynamic range (HDR) or panoramic images of dynamic scenes without introducing ghosting is difficult. Even when using modern cameras with au- tomatic exposure bracketing, the inter-frame capture time between input images can be long enough to cause significant object displacement between images of cluttered of dynamic scenes, e.g. in cities or at popular tourist destinations, or in scenes with fast motion. When pixel colors from different images are averaged to construct an HDR image, ghosting artifacts are introduced.
In the prior art, strategies for avoiding such artifacts include aligning the scene before color averaging, performing joint alignment and reconstruction using one reference image from the LDR set or detecting regions with moving objects and excluding their images from the average.
Although optical flow methods can correct short displacements caused by camera shake and moving objects, they typically fail to estimate large displacements, and have difficulties with disocclusions occurring in highly cluttered and highly dynamic scenes.
Joint alignment and reconstruction methods define a reference image to which all other images are patch-wise aligned. Ill-exposed regions in the reference are filled using an adaption of the bi-directional similarity function between the remaining input images and the HDR result. However, a single reference image might not correspond to the desired output, and a better result could be composited using parts from different images.
For example, people in any chosen reference image may be occluded in other input images. In such cases, the dynamic ranges of reference image objects cannot be completed. Most HDR construction methods try to detect image regions that could produce ghosting artifacts and exclude them from the average. In general, these methods assume that the images are already aligned, and rely on an ability to test if the colors observed for the same pixel in different images are consistent. Consistency is tested with criteria such as pair-wise irradiance difference, irradiance difference to a background model, distance to the intensity mapping function, variance of the irradiance estimates, average ratio between images, probability of the distance to a background model, correlation with a reference image, difference of the entropy on local image patches and difference between gradient orientations. However, each of these consistency tests requires setting fixed thresholds that are unlikely to generalize well to the noise proper- ties of different cameras and exposure settings. All of these strategies fail under challenging conditions that occur in reality. There is no single best method and the selection of an adequate approach depends on the user's goal. Similar problems occur when constructing panoramic images from a set of images. It is therefore an object of the present invention to provide an improved method and a device for constructing a combined image, e.g. an HDR or a panorama image from a set of individual images in a wide variety of situations, including dynamic scenes with strong clutter and dynamics, with a reduced likelihood of shorting artifacts. These objects are achieved by a method for constructing a combined image from a set of individual images, comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image. The step of determining the irradiance of a pixel of the combined image may comprises determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset. The subset may be determined based on a statistical model of color values in the set of individual images.
Colors are observed at the same pixel location across different exposures in an LDR set. To test whether two colors correspond to the same irradiance and so correspond to the same object, the invention considers their noise distributions of the color values measured by the camera. Noise distributions depend on the camera and exposure settings, and can be modeled using Gaussian distributions.
Distribution variance is proportional to the light intensity and is inversely proportional to the squared exposure time, and depends on camera parameters such as the gain factor and the readout noise parameters. Given that the noise depends on the scene irradiance and the camera parameters, no fixed threshold can be reliably set to detect image differences across camera models and scenes. According to the invention, the noise distribution may be predicted from the input images and used to normalize the color consistency tests. This automatic noise modeling approach improves the discriminative power of ghosting detection.
The statistical model may based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera. The gain factor may estimated based on regions of essentially constant illumination in the input images. The subset may be determined based on a measure of spatial coherence. The subset may be determined based on a variance of the subset.
The combined image may be a high dynamic range image. The combined image may also be a panoramic stitching of the individual images. Outputting the combined may comprise at least one step of storing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display. The method may be implemented in special purpose hardware, e.g. for inclusion in a camera, or on a general purpose computer.
In general, there can be multiple ghost-free images that are consistent with a set of input images. According to a further aspect of the invention, a final image may be chosen such that each pixel has high signal-to-noise (SNR) ratio and is spatially compatible with its neighbors in other images.
This optimization directly produces results with lower noise than existing methods, and is especially useful for images acquired in low light, e.g., night shots.
In addition, the invention comprises a simple method for estimating the camera gain factor from arbitrary images, enabling an automatic prediction of a camera noise range. The method is characterized in that the gain factor is estimated based on regions of essentially constant illumination in the input images.
In summary, the HDR imaging method according to the invention fully automatically takes advantage of a camera noise model for performing reliable ghost-free reconstruction across different cameras and scenes. It obtains the irradiance of every pixel with lower noise and fewer artifacts than existing state-of-the-art approaches, even for very challenging scenes including crowded places with small and large object displacements and low-light shots. All these scenes are computed with no parameter tuning.
The invention further comprises a device for constructing a combined image from a set of individual images, comprising an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like; an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and an output module for outputting the combined image, such as a display and / or a network and / or a storage interface.
These and other aspects of the present invention will now be explained in connection with an embodiment of the invention and using the annexed figure, in which
Fig. 1 shows a schematic flow diagram of a method for according / or constructing an HDR image to an embodiment of the invention;
Fig. 2 is a one-dimensional illustration of HDR reconstruction based on consistent subsets of individual images; Fig. 3 shows the effects of varying parameters β and λ in equation 7;
Fig. 4 illustrates a process of gain calibration according to an embodiment of the invention; Fig. 5 shows confidences of camera gain estimation;
Fig. 6 illustrates the sensitivity of a deghosting method according to an embodiment of the invention to gain calibration; shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars);
shows on the bottom: hand-held capture via in-camera bracketing. The dynamic car motions are reconstructed ghost free. Bottom: cluttered busy square sequence, where naive averaging produces severe artifacts (left- hand side) and the inventor's result is ghost free (right-hand side). shows a comparison to Sen et al. on the Cafe terrace sequence (top). The third image was selected as reference for the method of Sen et al. Here, their method encounters difficulties extending the dynamic range of ill- exposed regions, which results in a washed-out appearance (indicated by arrows). In contrast, the inventive method automatically selects well- exposed sources for every region. shows a comparison with the method of Sen et al. on the Square at night sequence (top). The second exposure was selected as reference for Sen et al.'s method. Due to noise, their method finds few similar patches in other exposures. This implies that the dynamic range cannot be effectively extended using other input images (middle). The inventive method selects consistent sources with as low variance as possible, preventing the appearance of noise in the result (bottom). shows a comparison with the method of Zimmer et al. on the busy square sequence: (a) Reference image, (b) optical-flow alignment of an additional input image to the reference, (c) result after HDR reconstruction using (a) and (b), and (d) our result. shows a comparison of the inventive consistency detector with other state- of-the-art ghosting-detection methods. Here, the differences between a pair of images of the busy square (Fig. 8, right) are shown in red on top of their average color shows semantic inconsistencies and interactive correction: The inventive algorithm may produce semantic inconsistencies (a). These can appear when the color difference falls below the noise level (top), when all objects in a given image region are partially ill exposed (middle), or when objects are partially occluded (bottom). These inconsistencies can be corrected interactively by editing the labels (b). The results after editing are shown in (c).
Detailed description of an embodiment
In the following, a detailed embodiment of a method for a ghost-free combination of images is explained in relation to methods for constructing an HDR image from a set of individual images.
Figure 1 shows a method for constructing an HDR image from a set of individual images according to an embodiment of the invention. The input is a set of images taken with a static or hand held camera at different exposure times, where pixel values in the images are the raw output of the camera, i.e., before any of the camera's internal processing.
In step 110, if captured hand-held, the images are robustly registered using a global homography computed with RANSAC from sparse SURF keypoint matches.
With an aligned image set, the method estimates an irradiance image where each pixel is constructed as a weighted average of colors of the corresponding pixels across the input images in steps 120, 130, 140 and 150. Ghosting artifacts would be generated by averaging a set of pixels which includes an inconsistent subset. Instead, the method identifies a consistent subset of images per pixel location and reconstructs the final irradiance value as an average of consistent pixel colors. This avoids having to select a reference image, which might hinder the capabilities for dynamic range extension, or having to build a background model, which requires that the background be more like- ly to be observed at every image location— this is not necessarily true for cluttered scenes.
Image noise estimation
Even when assuming a static scene and constant camera parameters, input image noise varies by exposure time. The two main temporal noise sources are known as shot noise and readout noise. Shot noise is introduced by the process of light emission, which follows a Poisson distribution where the variance is equal to the mean. Readout noise comprises several other signal-independent sources affecting the acquisition process of digital cameras; it is modeled well by a Gaussian distribution with zero mean.
In CCD/CMOS sensors, the number of photon-electrons collected by the camera at every pixel is linearly proportional to the incident irradiance. This derives from the properties of the photo-electric effect on silicon-based sensors for visible wavelengths.
The raw camera output is also linearly proportional to the number of collected photon- electrons. This relation is known as the camera response function f. The slope of this function corresponds to the camera's gain factor g. This factor is proportional to the ISO setting, e.g., the gain at ISO400 is four times the gain at ISO100.
Since the response function f is linear for raw output, it is possible to recover the number of photon-electrons collected by the camera to approximate the probability distribution of each pixel measurement. For a non-saturated raw camera output vt(p) on image i and pixel p , the inverse of the response function, i.e. the amount of collected photon-electrons, is estimated by
(1)
where the dark frame bi is an image acquired with same exposure time as v; but without incoming light (e.g., with the lens cap on). The product ίμ(ρ) between the image' s exposure time ti and the incident irradiance x(p) is known as the exposure, which is proportional to the number of photon-electrons collected by the camera. Dark frames measure the camera output induced by thermal energy only (not by light). In the present embodiment, it is assumed that the values in the dark frame are negligible or, equivalently, that dark frame subtraction is performed in-camera, which is common in modern digital cameras. Thus, the dark frame b^p) is replaced with the black level LQ of the camera.
The exposure ίμ(ρ) follows a Poisson distribution, and the uncertainty in its measurement corresponds to the shot noise. This distribution is approximated using a Gaussian to model the variance of the irradiance estimate x(p) . From equation 1, the variance of x(p) in image i can be derived as
where σ is the variance of the readout noise, which is also modeled using a Gaussian. To evaluate equation 2, the parameters g , L0 , and ti need to be estimated the. The exposure time ti can be read directly from the digital image file.
The black level Lo, and the readout variance
are calibrated using the method described in [Janesick 2001 ; Granados et al. 2010] . This method estimates Lo and σ as the mean and variance, respectively, of the pixel values of a black frame, i.e., an image taken with no incident light and no integration time, practically, a very short exposure time. In principle, this data could be obtained for every camera model from the manu- facturer.
Figure 2 shows a one-dimensional illustration of HDR reconstruction. An HDR image can be reconstructed by averaging the irradiance estimates derived from the color of corresponding pixel locations in the input images. Ghosting artifacts appear whenever sets of inconsistent colors are included in the average. The problem of HDR deghosting can be defined as selecting consistent subsets of colors for every pixel.
In the present embodiment, two pixels at corresponding locations in different images are consistent if the corresponding color difference follows the predicted color differ- ence distribution, and a group of pixels is self-consistent if all the pixels are pair-wise consistent. Given two irradiance observations
(p) , xj k (p) at pixel p and color channel k , which are derived from the pixel colors v (p) , (p) on images 1 , , respectively, using the inverse of the camera response function (equation 1), detecting ghosting artifacts requires testing whether these irradiance observations are consistent, i.e. if they correspond to measurements of the same incident light. Existing algorithms solve this problem by relying on pre-determined thresholds, which are difficult to set. This requirement can be avoided by exploiting the noise model according to the present embodiment of the invention. The probability d (p) - x) (p) may be estimated: Since
Gaussian, which for consistent pairs has zero mean and variance.
where σ k and σ k are obtained from equation 2. Given the variance σ k , the probability that observations at pixel p on images i, j are consistent may be estimated by comparing the corresponding irradiance differences with the expected noise distribution of the images on every color channel:
where N is the standard Gaussian random variable with mean zero and variance one. In practice, the estimate Pr(/? I {i, j}) , can be noisy (e.g., when the image is taken under low-light or when the camera has a high readout noise). For this reason, prior to estimating the probabilities, the difference image d^ (p) is smoothed using bilateral filter- ing. This step may be referred to as noise-adaptive difference filtering (DF). A distance kernel of large bandwidth is used, and a range kernel with variable bandwidth ar = 2adt (p) that is proportional to the predicted image noise. This filtering introduces dependencies between the distributions of neighboring pixels. However, this dependency occurs mostly between pixels that have already similar distributions. Given this similarity, the net effect of the filtering is an attenuation of the tails of the difference distribution. This allows obtaining higher detection sensitivity for the same specificity level.
Since the noise variance
is different at every pixel and image in the sequence, the variance of the difference function ^t (p) also varies for every pixel and image pair.
This observation is integral to the inventive technique: As other reconstruction and deghosting methods do not automatically model noise, they are not likely to generalize well to the noise properties of different cameras and exposure settings.
Consistency test for sets of images
Let V = {v; }, i e T be the set of images in the exposure sequence. Based on the pair- wise consistency measure (equation 4), the probability that a given subset L; e 2V is consistent at a pixel p is defined as the minimum of the pair- wise consistency:
Pr(p \ Lt ) = min{Pr(p I {i, j}e Ll x Ll )}. (5)
For the case of a singleton Ll (i.e., |L = 1 ) the corresponding consistency probability is given as the probability that the corresponding observations is well exposed:
Pr(p I {/}) = 1 - maximin Pr(v {p )), max Pr(v {p ))} (6) k e k oe with I G {R, G, B}. Prue and Proe correspond to the under- and over-exposure probability, respectively, of an observation according to the distribution of the (Gaussian) readout noise, when centered at the black level and saturation level, respectively.
In this definition, all color channels need to be under-exposed to consider an observation v{(p) inconsistent, and if any color channel is over-exposed v{(p) is considered inconsistent.
Compositing of consistent sets
Since more than one subset of images can be consistent for a given pixel location, the choice of a particular subset to be averaged is ill-posed. This choice may be regularized by requiring that the selected subsets be spatially color-consistent. Together, the pixel-wise consistency test and the spatial consistency test cast the HDR deghosting problem as a Markov random field (MRF)-type global energy minimization.
To obtain a ghost-free HDR image, a consistent subset of irradiance estimates is selected for every pixel to reconstruct the final pixel value. However, given the presence of moving objects, there can be more than one consistent subset. Arbitrarily any one subset may introduce unnatural color discontinuity in the final image (see yellow arrows in figure 6). This problem is resolved by introducing a spatial continuity measure as a regularizer, and finds a solution by minimizing a global energy that takes into account the consistence at every pixel location as well as their spatial coherence. The fi- nal result is represented as a labeling F := F(p) that assigns to each pixel p the index of an element in 2V . This labeling is obtained by minimizing the energy functional:
comprising terms for the consistency potential, the variance potential and the prior potential, where Lpq corresponds to the index of the subset LF u LFq e L, N denotes the
4-neighborhood system on the set of pixel locations Ω, denotes the confidence value (see below) , and β and γ are weighting hyper-parameters.
In equation 7, the role of the consistency potential is to penalize image sets that do not have a high consistency probability, whereas the variance potential ensures that the final reconstruction has low noise by penalizing groups with larger variance. Additionally, the prior potential encourages the final reconstruction to agree with its spatial neighbors at every pixel.
In the consistency and prior potentials, instead of penalizing the consistency probability directly, a confidence value is set to determine whether a set of images LF is consistent or not. This encodes an important design choice: W want to select any consistent group, not the most consistent one. This design gives more freedom to the optimization algorithm to construct the final composite.
The variance potential prevents the generation of trivial solutions. Well-exposed observations from a single image are defined as consistent. Under this definition, selecting a single well-exposed image for reconstructing the whole image would create a labeling with minimum energy. This selection is undesired since the information contained in other consistent images is left out of the average, thus degrading the SNR of the resulting irradiance estimates. Instead, whenever two distinct sets are consistent, the set that produces lower-variance estimates regardless of the set size is preferable. The variance potential V(L{ ) encodes preference by assigning higher costs to groups that provide higher-variance estimates. The relative variance of each estimate is:
where the variance of each grou
Parameter selection There are three hyper-parameters to be tuned in equation 7: The weight λ for the variance potential, the confidence value a of the consistency tests,
and the weight β of the prior potential. The parameter λ is set to 0: 1 to ensure that the variance potential in equation 7 produces order-of-magnitude lower costs than the consistency potential. This design instructs the algorithm to prefer consistent subsets, but when presented with several consistent options, it will prefer the one with the least noise. The other two parameters were determined based on a performance evaluation using the challenging busy square sequence (figure 8). The confidence value a was set to 0.98, which provides a good trade-off between sensitivity and specificity of ghost detection when compared to a manual annotation of the scene (see Sec. 3 for details). In preliminary experiments, variations of a did not affect the results significant- ly.
Parameter β is set to 20, which is the lowest value that did not introduce visual discontinuities on the test sequence (see figure 6). Once determined, the parameters α,β, γ ; were fixed for all experiments.
Figure 3 shows the effects of varying parameters β and λ in equation 7. The right- hand side colors correspond to the estimated labeling, which is proportional to the noise of the selected subset (blue: higher SNR, red: higher SNR). Parameter β is set to 2, is set to 0.1 (outlined in red), since these produce a good trade-off between low noise and spatial consistency. These parameters were kept fixed during all experiments.
When noisy subsets are not penalized ( λ = 0; top row), the algorithm mostly selects a single image as source except for ill-exposed regions (white arrows), as only such re- gions are considered inconsistent. This behavior holds regardless of the weight β given to the prior potential. If noisy subsets are penalized mildly, i.e., less than inconsistent subsets (λ = 0: 1; middle row), the remaining subsets of larger SNR (shaded in blue and green colors) are preferred providing they are consistent, resulting in labelings that adapt more to the scene. In this configuration, as β of the prior potential increases, visual discontinuities (marked by yellow arrows) are eliminated from the deghosted image (e.g., in β = 10; 20). When noisy subsets are penalized as much as inconsistent ones ( λ 1; bottom row), it becomes affordable to include objects that are
partially ill-exposed (pointed by purple arrows) if they appear on the longest (less noisy) image. These results support the inventor's choice of λ .
Optimization and final reconstruction To obtain a minimum cost labeling F*, the expansion-move algorithm (BOYKOV, Y., VEKSLER, O., AND ZABIH, R. 2001. Fast approximate energy minimization via graph cuts. IEEE TP AMI 23, 11, 1222- 1239) is applied. With the resulting labeling, the final irradiance map is estimated as a weighted average:
where wt{p) corresponds to the probability that ν{(ρ) is well exposed. The weighting function W . = , , <J2 leads to a result close to the maximum likelihood so- lution, and it is constrained to apply identical weights to every color channel in a given pixel.
If not provided by the manufacturer, the camera gain g can be calibrated by a method according to the invention that works directly from an input image set using regions of constant illumination in the input images. More specifically, an input image, e.g. the best exposure of the input set, is divided into super pixels (VEKSLER, O., BOYKOV, Y., AND MEHRANI, P. 2010. Superpixels and supervoxels in an energy optimization framework. In Proc. ECCV, vol. 6315, 211-224) and then the mean and variance of their color values are estimated. From the resulting mean-variance scatter plot (figure 3-top), the minimum variance is selected for each digital value, and RANSAC (FISCHLER, M. A., AND BOLLES, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381-395) is used to fit a line that passes through (L0,
), i.e., through the expected variance at the black level.
Figure 4 illustrates this process. The top row shows the mean and variance color value of each super pixel (yellow and red dots). Red dots at the top and the bottom correspond to low-variance super pixels that are used for calibration. Yellow dots represent
the remaining super pixels. Green lines show the predicted noise by image-based calibration, blue dashed lines show the prediction by flat-field calibration. The super pixels with minimum variance are selected as proxies for images exposed with a constant illumination at every pixel, such that every pixel color can be assumed to be a sample of the same random variable (shown in red). This selection is justified as only shot noise and readout noise contribute to the variance of image regions with constant illumination and, therefore, these noise sources determine the lower bound of the color variance. Using super pixels to estimate the lower bound of image variance was proposed in [Liu et al. 2008] for image denoising, but has not been applied for HDR re- construction.
The deghosting method according to the invention is robust to calibration errors, so even in cases where the gain is overestimated (b), the final images are still free of ghosting artifacts.
Figure 5 shows box plots of the 1st, 25th, 50th, 75th and 99th percentiles of the distribution of gain factors obtained from a flat- field calibration (JANESICK, J. 2001. Scientific charge-coupled devices. SPIE Press, GRANADOS, M., AJDIN, B., WAND, M., THEOBALT, C, SEIDEL, H.-P., AND LENSCH, H. P. A. 2010. Optimal HDR reconstruction with linear digital cameras. In Proc. CVPR, 215-222), comprising 36 samples of flat field images, and the distribution of factors obtained from image-based calibration (a sample of seven images, each from a different scene; two shown in figure 4). The gray line denotes the true gain of the camera. The expected gain for both methods is very close, but the variance of image -based calibration is higher. Despite this, the gain estimate can still be used to reconstruct ghostfree HDR images (see figure 6). The red curve illustrates the dependency between the gain factor and the image variance prediction.
The image-based calibration is sufficiently accurate. Importantly, since a wide range of scenes contain locally flat regions, this gain calibration approach allows applying the deghosting algorithm directly without requiring users to capture flat field images. However, its accuracy is content dependent, and figure 4b shows an example image from which the camera gain could not be correctly estimated. The image flat regions cover a limited color band, which misleads the slope estimation (figure 4b-top). That
said, ghosting artifacts typically only appear when the variance within super pixels (and thus the gain) is underestimated (e.g. below the true gain, see figure 6), which is a highly unlikely scenario in practice. In general, when the camera gain is over-estimated, the predicted noise for the input images is under-estimated. This makes ghost detection stricter, thus reducing the SNR of the final HDR image because smaller consistent subsets will be found. As such, no ghosting artifacts are introduced by this error (see figure 6). Figure 6 shows the sensitivity of the inventive deghosting method to gain calibration accuracy. Here, g, <- g denote the mean and standard deviation of the flat-field estimates. The method is robust to slight under-estimation (b) and large over-estimation (d) of the camera gain: When it is under-estimated (which occurs seldom, see figure 5), ghosting artifacts can appear (a, magenta arrow). Conversely, when the gain is over- estimated, it leads to low SNR (d), but it does not introduce ghosting artifacts.
The following table 1 shows the results of an experimental evaluation:
Sequence HH SC SD LD LL Camera Est. gain factor
Acrobat (Fig. 1 ) x X X Cation 550D 0.6597
Street traffic (Fig. 8) x X Canon 550D 0.3753
Flower shop (Fig. 1 ) X X Can on S5 0.2390
Busy square (Fig. 8) X X X Canon 55 0,2417
Cafe terrace (Fig. 9) X Canon S5 0.2250
Square at night (Fig. 10) X X X X Canon S5 0.4125
Table 1: Summary of test sequences. HH; Hand-held, SC: scene clutter, SD: small object displacements, LD: large object displacement, LL: low light. Gain factor for ISO J 00 setting.
Several exposure sequences were obtained using a compact digital camera (Canon Powershot S5IS, lObit ADC) and a digital SLR (Canon EOS 550D, 14bit ADC). The camera's black level ( L^ = 32 and LQ = 2048, respectively) and readout variance ( aR 2
= 2:655 and
= 61:01, respectively) were estimated from a black frame. The gain factor was estimated independently for every sequence using image-based calibration as described above.
Although the gain needs to be estimated only once for any given camera model, it was calibrated on each sequence in order to validate the robustness of the inventive method. For reference, the gain factors obtained from flat-field calibration were g = 0:2394 and g = 0:4795, respectively.
Per scene, three or five images were captured in RAW mode at steps of one or two stops, respectively. The input color image is constructed from the green, red, and blue observation found on each 2 x 2 block of pixels in the un-demosaiced raw image. One of the four observations in each block is not used. If captured hand-held the images are registered using a global homography computed with RANSAC from sparse SURF keypoint matches. After HDR reconstruction, the images were white balanced and tone mapped.
Figure 7 shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars). Focusing on small displacement quality in figure 7, of it show that the inventive method produces convincing results. The flower shop and busy square (figure 8) sequences show how strong scene clutter can cause severe ghosting artifacts in an HDR reconstruction which includes every image into the irradiance average. In addition, the square at night (figure 10) sequence shows that the inventive method is robust to high image noise. The cafe' terrace sequence (figure 9) contain relatively small object displacements for which previous reference-image-based methods are designed (SEN, P., KALANTARI, N. K., YAESOUBI, M., DARABI, S., GOLDMAN, D., AND SHECHTMAN, E. 2012. Robust patch based hdr reconstruction of dynamic scenes. ACM TOG 31, 6). Even under small displacements, which are well handled by reference-image-based methods, the inventive method produces results with less washed out regions and lower noise.
The method of Sen et al. finds patch-wise correspondences between the reference and the remaining input images. As the reference image is of low dynamic range, regions that are ill-exposed or contain high noise might not be matched correctly to other exposures. This is demonstrated in figure 9, where the dynamic range of over-exposed regions could not be enhanced (indicated by arrows). Additionally, figure 10 shows that strong noise in the reference may restrict correspondence finding in other images
for range enhancement, leading to a noisy HDR image. In contrast, the inventive method is designed to select sets of images that are both consistent and have low noise, resulting in HDR images with comparatively less noise. In general, the inventive method could also generate noisy image regions (see figure 8, right) if this guarantees consistency, as this is weighted more than achieving low noise (see equation 7). Zim- mer et al. establish correspondences using optical flow, which will fail on objects that undergo large displacements or disocclusions. This failure case is shown on the person in figure 11, where ghosting artifacts are introduced after two instances of a person undergoing local motion cannot be properly aligned. In contrast, the inventive method selects a single self-consistent image, thus preventing the introduction of ghosting artifacts.
Comparison with detect-and-exclude methods The inventive method was compared against the top four performing methods reported by Sidibe et al. (SIDIBE, D., PUECH, W., AND STRAUSS, O. 2009. Ghost detection and removal in high dynamic range images. In Proc.EUSIPCO), according to their sensitivity score: Grosch (GROSCH, T. 2006. Fast and robust high dynamic range image generation with camera and object movement. In Proc. VMV.)], Heo et al. (HEO, Y. S., LEE, K. M., LEE, S. U., MOON, Y., AND CHA, J. 2010. Ghost-free high dynamic range imaging. In Proc. ACCV, vol. 4, 486-500), and Pece and Kautz (PECE, F., AND KAUTZ, J. 2010. Bitmap movement detection: HDR for dynamic scenes. In Proc. CVMP, 1-8). The inventors used their own implementation of these methods using the specified parameters, whenever available. All detect-and exclude methods, including ours, work in two stages: Detect inconsistent regions, and reconstruct the HDR image using consistent parts only. Since the inconsistency detection is often noisy, they apply different regularization techniques before the reconstruction stage (e.g., Gaussian smoothing, morphological operations, or MRF priors; the inventive method applies the latter). Therefore, to exclude the effect of different regularization strategies (i.e., of different image priors), only the detection stage of every method is compared (see figure 12). For the comparison, the first two input images of the busy square sequence were used. As ground truth, a manual segmentation of their differences was constructed (figure 12a).
Table 2 summarizes the sensitivity and specificity achieved by each method in classifying pixels as consistent or inconsistent with respect to the ground truth:
Detection strategy Sensiti ity Specificity
Proposed method i-DF), = 95.0% 0.583 0.750
Proposed method (-DF), a = 98.0% 0.542 0.881
Proposed method (-DF), a = 99.9%» 0.480 0.979
Proposed method (+DF), = 95,0%) 0.536 0.899
Proposed method (+DF), a = 98.0%, §.513 0.947
Proposed method (+DF), a = 99.9%, 0.467 0.987
Absolute difference [Grosch 20061 0.436 0.926
IMF probability [Heo et al. 2010] 0.200 0.963
Metratome ordering [Sidibe et al. 200 J 0.246 0.994
Median threshold [Pece and Kautz 2010] 0. 158 0.999
Table 2
To facilitate a fair comparison, results are presented with and without applying the difference filtering (DF) step of the inventive method. Among previous methods, the Grosch method achieved the best sensitivity (43.6%), which thresholds the absolute irradiance difference between the images (figure 12g). The methods of Sidibe' et al. (figure 12f) and Pece and Kautz (figure 12h) achieve the highest specificity (99.4% and 99.9%) but the lowest sensitivity (24.6% and 15.8%). This occurs as both methods are based on invariants that are satisfied whenever two pixels correspond to the same light intensity, but this is not always violated by moving objects.
The method was tested with confidence values = {0 : 95, 0 : 98, 0 : 999}, and with and without applying noise-adaptive difference filtering (DF). In all cases, the inven- tion had higher results than previous methods (46.7-58.3% vs. 43.6% for Grosch). With the adaptive DF, the specificity was comparable to that of other methods, including those methods based on invariants. The best trade-off was obtained at = 0.98 with sensitivity and specificity of 51% and 95%, respectively (figure 12c). The method achieves the best sensitivity, which is crucial for removing ghosts, without compromis- ing the specificity, which is crucial for producing low-noise HDR images.
Claims
Claims 1. Computer-implemented method for constructing a combined image from a set of individual images, comprising the steps of: determining an irradiance of a pixel of the combined image, based on the set of individual images; and
- outputting the combined image.
2. The method of claim 1, wherein the step of determining the irradiance of a pixel of the combined image comprises - determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset.
3. The method of claim 2, wherein the subset is determined based on a probability distribution of color values in the set of individual images.
4. The method according to claim 3, wherein the probability distribution is based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera.
5. The method according to claim 4, wherein the gain factor is estimated based on regions of essentially constant illumination in the input images.
6. The method according to claims 2 or 3, wherein the subset is determined based on a measure of spatial coherence.
7. The method according to claims 2, 3 or 6, wherein the subset is determined based on a variance of the subset.
8. The method of claim 1, wherein outputting comprises at least one step of stor- ing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display.
9. The method of claim 1, wherein the combined image is a high dynamic range image.
10. The method of claim 1, wherein the combined image is a panoramic stitching of the individual images.
11. Method for estimating a camera gain factor from a set images, characterized in that the gain factor is estimated based on regions of essentially constant illumi- nation in the input images.
12. Device for constructing a combined image from a set of individual images, comprising: - an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like;
an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and
- an output module for outputting the combined image, such as a display and
/ or a network and / or a storage interface.
13. Method for constructing a combined image from a set of individual images, comprising the steps of: automatically determining parameters of an image sensor used for acquiring the individual images;
determining an irradiance of a pixel of the combined image, based on the set of individual images and the parameters; and
- outputting the combined image.
14. The method of claiml3, wherein the parameters comprise a gain factor of the image sensor.
15. The method of claim 14, wherein the combined image is a high dynamic range image.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361840002P | 2013-06-27 | 2013-06-27 | |
US61/840,002 | 2013-06-27 | ||
EP13174136.5 | 2013-06-27 | ||
EP13174136 | 2013-06-27 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2014206503A1 true WO2014206503A1 (en) | 2014-12-31 |
Family
ID=48740899
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2013/066942 WO2014206503A1 (en) | 2013-06-27 | 2013-08-13 | Automatic noise modeling for ghost-free image reconstruction |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2014206503A1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9955085B2 (en) | 2016-09-22 | 2018-04-24 | Apple Inc. | Adaptive bracketing techniques |
US10019848B2 (en) | 2015-07-31 | 2018-07-10 | Adobe Systems Incorporated | Edge preserving color smoothing of 3D models |
US10319083B2 (en) | 2016-07-15 | 2019-06-11 | Samsung Electronics Co., Ltd. | Image artifact detection and correction in scenes obtained from multiple visual images |
CN112085803A (en) * | 2020-07-27 | 2020-12-15 | 北京空间机电研究所 | Multi-lens multi-detector splicing type camera color consistency processing method |
CN116051449A (en) * | 2022-08-11 | 2023-05-02 | 荣耀终端有限公司 | Image noise estimation method and device |
-
2013
- 2013-08-13 WO PCT/EP2013/066942 patent/WO2014206503A1/en active Application Filing
Non-Patent Citations (14)
Title |
---|
ABHILASH SRIKANTHA ET AL: "Ghost detection and removal for high dynamic range images: Recent advances", SIGNAL PROCESSING: IMAGE COMMUNICATION, vol. 27, no. 6, 1 July 2012 (2012-07-01), pages 650 - 662, XP055036838, ISSN: 0923-5965, DOI: 10.1016/j.image.2012.02.001 * |
BOYKOV, Y.; VEKSLER, O.; ZABIH, R: "Fast approximate energy minimization via graph cuts", IEEE TPAMI, vol. 23, no. 11, 2001, pages 1222 - 1239 |
FISCHLER, M. A.; BOLLES, R. C: "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography", COMMUN. ACM, vol. 24, no. 6, 1981, pages 381 - 395 |
GRANADOS, M.; AJDIN, B.; WAND, M.; THEOBALT, C.; SEIDEL, H.-P.; LENSCH, H. P. A: "Optimal HDR reconstruction with linear digital cameras", PROC. CVPR, 2010, pages 215 - 222 |
GROSCH, T.: "Fast and robust high dynamic range im- age generation with camera and object movement", PROC. VMV, 2006 |
HEO, Y. S.; LEE, K. M.; LEE, S. U.; MOON, Y.; CHA, J: "Ghost-free high dy- namic range imaging", PROC. ACCV, vol. 4, 2010, pages 486 - 500 |
JANESICK, J.: "Sci- entific charge-coupled devices", 2001, SPIE PRESS |
MIGUEL GRANADOS ET AL: "Automatic noise modeling for ghost-free HDR reconstruction", ACM TRANSACTIONS ON GRAPHICS (TOG), ACM, US, vol. 32, no. 6, 1 November 2013 (2013-11-01), pages 1 - 10, XP058033905, ISSN: 0730-0301, DOI: 10.1145/2508363.2508410 * |
MIGUEL GRANADOS ET AL: "Background Estimation from Non-Time Sequence Images", PROCEEDINGS OF GRAPHICS INTERFACE 2008, 30 May 2008 (2008-05-30), Toronto, Ont., Canada, pages 33 - 40, XP055101699, ISBN: 978-1-56-881423-0 * |
MIGUEL GRANADOS ET AL: "Optimal HDR reconstruction with linear digital cameras", 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 13-18 JUNE 2010, SAN FRANCISCO, CA, USA, IEEE, PISCATAWAY, NJ, USA, 13 June 2010 (2010-06-13), pages 215 - 222, XP031726033, ISBN: 978-1-4244-6984-0 * |
PECE, F.; KAUTZ, J: "Bitmap movement detection: HDR for dynamic scenes", PROC. CVMP, vol. 1-8, 2010 |
SEN, P.; KALANTARI, N. K.; YAESOUBI, M.; DARABI, S.; GOLDMAN, D.; SHECHTMAN, E: "Robust patch based hdr reconstruction of dynamic scenes", ACM TOG, vol. 31, 2012, pages 6 |
SIDIBE, D.; PUECH, W.; STRAUSS, O: "Ghost detection and removal in high dynamic range images", PROC.EUSIPCO, 2009 |
VEKSLER, O.; BOYKOV, Y.; MEHRANI, P: "Superpixels and supervoxels in an energy optimization framework", PROC. ECCV, vol. 6315, 2010, pages 211 - 224 |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10019848B2 (en) | 2015-07-31 | 2018-07-10 | Adobe Systems Incorporated | Edge preserving color smoothing of 3D models |
US10319083B2 (en) | 2016-07-15 | 2019-06-11 | Samsung Electronics Co., Ltd. | Image artifact detection and correction in scenes obtained from multiple visual images |
US9955085B2 (en) | 2016-09-22 | 2018-04-24 | Apple Inc. | Adaptive bracketing techniques |
CN112085803A (en) * | 2020-07-27 | 2020-12-15 | 北京空间机电研究所 | Multi-lens multi-detector splicing type camera color consistency processing method |
CN112085803B (en) * | 2020-07-27 | 2023-11-14 | 北京空间机电研究所 | Multi-lens multi-detector spliced camera color consistency processing method |
CN116051449A (en) * | 2022-08-11 | 2023-05-02 | 荣耀终端有限公司 | Image noise estimation method and device |
CN116051449B (en) * | 2022-08-11 | 2023-10-24 | 荣耀终端有限公司 | Image noise estimation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xu et al. | Real-world noisy image denoising: A new benchmark | |
Jacobs et al. | Automatic high-dynamic range image generation for dynamic scenes | |
Chakrabarti et al. | Depth and deblurring from a spectrally-varying depth-of-field | |
JP4593449B2 (en) | Detection device and energy field detection method | |
Jinno et al. | Multiple exposure fusion for high dynamic range image acquisition | |
Srikantha et al. | Ghost detection and removal for high dynamic range images: Recent advances | |
CN102970464B (en) | Image processing apparatus and image processing method | |
KR101442153B1 (en) | Method and system for processing for low light level image. | |
Tico et al. | Motion-blur-free exposure fusion | |
US20130287296A1 (en) | Method and device for image processing | |
CN108694705A (en) | A kind of method multiple image registration and merge denoising | |
WO2014206503A1 (en) | Automatic noise modeling for ghost-free image reconstruction | |
Lamba et al. | Harnessing multi-view perspective of light fields for low-light imaging | |
Cho et al. | Single‐shot High Dynamic Range Imaging Using Coded Electronic Shutter | |
Lv et al. | An integrated enhancement solution for 24-hour colorful imaging | |
Aguerrebere et al. | Simultaneous HDR image reconstruction and denoising for dynamic scenes | |
KR101921608B1 (en) | Apparatus and method for generating depth information | |
Tallon et al. | Space-variant blur deconvolution and denoising in the dual exposure problem | |
van Beek | Improved image selection for stack-based hdr imaging | |
Wang et al. | Rethinking noise modeling in extreme low-light environments | |
Kakarala et al. | A method for fusing a pair of images in the JPEG domain | |
Gallo et al. | Stack-based algorithms for HDR capture and reconstruction | |
Lelégard et al. | Detecting and correcting motion blur from images shot with channel-dependent exposure time | |
Johnson | High dynamic range imaging—A review | |
Goossens et al. | Reconstruction of high dynamic range images with poisson noise modeling and integrated denoising |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 13748078 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 13748078 Country of ref document: EP Kind code of ref document: A1 |