WO2014206503A1 - Automatic noise modeling for ghost-free image reconstruction - Google Patents

Automatic noise modeling for ghost-free image reconstruction Download PDF

Info

Publication number
WO2014206503A1
WO2014206503A1 PCT/EP2013/066942 EP2013066942W WO2014206503A1 WO 2014206503 A1 WO2014206503 A1 WO 2014206503A1 EP 2013066942 W EP2013066942 W EP 2013066942W WO 2014206503 A1 WO2014206503 A1 WO 2014206503A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
combined image
images
individual images
pixel
Prior art date
Application number
PCT/EP2013/066942
Other languages
French (fr)
Inventor
Christian Theobalt
Miguel GRANADOS
Kwang In Kim
Original Assignee
MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V. filed Critical MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V.
Publication of WO2014206503A1 publication Critical patent/WO2014206503A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10141Special mode during image acquisition
    • G06T2207/10144Varying exposure
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20172Image enhancement details
    • G06T2207/20208High dynamic range [HDR] image processing

Definitions

  • the present invention relates to a computer-implemented method and a device for constructing combined images from a set of individual images that are ghost-free, for example the construction of high dynamic range images or panoramic images of a dynamic scene.
  • HDR high dynamic range
  • panoramic images of dynamic scenes without introducing ghosting is difficult.
  • the inter-frame capture time between input images can be long enough to cause significant object displacement between images of cluttered of dynamic scenes, e.g. in cities or at popular tourist destinations, or in scenes with fast motion.
  • ghosting artifacts are introduced.
  • strategies for avoiding such artifacts include aligning the scene before color averaging, performing joint alignment and reconstruction using one reference image from the LDR set or detecting regions with moving objects and excluding their images from the average.
  • optical flow methods can correct short displacements caused by camera shake and moving objects, they typically fail to estimate large displacements, and have difficulties with disocclusions occurring in highly cluttered and highly dynamic scenes.
  • Joint alignment and reconstruction methods define a reference image to which all other images are patch-wise aligned. Ill-exposed regions in the reference are filled using an adaption of the bi-directional similarity function between the remaining input images and the HDR result.
  • a single reference image might not correspond to the desired output, and a better result could be composited using parts from different images. For example, people in any chosen reference image may be occluded in other input images. In such cases, the dynamic ranges of reference image objects cannot be completed.
  • Most HDR construction methods try to detect image regions that could produce ghosting artifacts and exclude them from the average. In general, these methods assume that the images are already aligned, and rely on an ability to test if the colors observed for the same pixel in different images are consistent.
  • Consistency is tested with criteria such as pair-wise irradiance difference, irradiance difference to a background model, distance to the intensity mapping function, variance of the irradiance estimates, average ratio between images, probability of the distance to a background model, correlation with a reference image, difference of the entropy on local image patches and difference between gradient orientations.
  • criteria such as pair-wise irradiance difference, irradiance difference to a background model, distance to the intensity mapping function, variance of the irradiance estimates, average ratio between images, probability of the distance to a background model, correlation with a reference image, difference of the entropy on local image patches and difference between gradient orientations.
  • each of these consistency tests requires setting fixed thresholds that are unlikely to generalize well to the noise proper- ties of different cameras and exposure settings. All of these strategies fail under challenging conditions that occur in reality. There is no single best method and the selection of an adequate approach depends on the user's goal. Similar problems occur when constructing panoramic images from a set of images.
  • a method for constructing a combined image from a set of individual images comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image.
  • the step of determining the irradiance of a pixel of the combined image may comprises determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset.
  • the subset may be determined based on a statistical model of color values in the set of individual images.
  • the invention considers their noise distributions of the color values measured by the camera.
  • Noise distributions depend on the camera and exposure settings, and can be modeled using Gaussian distributions. Distribution variance is proportional to the light intensity and is inversely proportional to the squared exposure time, and depends on camera parameters such as the gain factor and the readout noise parameters. Given that the noise depends on the scene irradiance and the camera parameters, no fixed threshold can be reliably set to detect image differences across camera models and scenes.
  • the noise distribution may be predicted from the input images and used to normalize the color consistency tests. This automatic noise modeling approach improves the discriminative power of ghosting detection.
  • the statistical model may based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera.
  • the gain factor may estimated based on regions of essentially constant illumination in the input images.
  • the subset may be determined based on a measure of spatial coherence.
  • the subset may be determined based on a variance of the subset.
  • the combined image may be a high dynamic range image.
  • the combined image may also be a panoramic stitching of the individual images.
  • Outputting the combined may comprise at least one step of storing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display.
  • the method may be implemented in special purpose hardware, e.g. for inclusion in a camera, or on a general purpose computer.
  • a final image may be chosen such that each pixel has high signal-to-noise (SNR) ratio and is spatially compatible with its neighbors in other images.
  • SNR signal-to-noise
  • This optimization directly produces results with lower noise than existing methods, and is especially useful for images acquired in low light, e.g., night shots.
  • the invention comprises a simple method for estimating the camera gain factor from arbitrary images, enabling an automatic prediction of a camera noise range.
  • the method is characterized in that the gain factor is estimated based on regions of essentially constant illumination in the input images.
  • the HDR imaging method according to the invention fully automatically takes advantage of a camera noise model for performing reliable ghost-free reconstruction across different cameras and scenes. It obtains the irradiance of every pixel with lower noise and fewer artifacts than existing state-of-the-art approaches, even for very challenging scenes including crowded places with small and large object displacements and low-light shots. All these scenes are computed with no parameter tuning.
  • the invention further comprises a device for constructing a combined image from a set of individual images, comprising an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like; an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and an output module for outputting the combined image, such as a display and / or a network and / or a storage interface.
  • an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like
  • an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images
  • an output module for outputting the combined image, such as a display and / or a network and / or a storage interface.
  • Fig. 1 shows a schematic flow diagram of a method for according / or constructing an HDR image to an embodiment of the invention
  • Fig. 2 is a one-dimensional illustration of HDR reconstruction based on consistent subsets of individual images;
  • Fig. 3 shows the effects of varying parameters ⁇ and ⁇ in equation 7;
  • Fig. 4 illustrates a process of gain calibration according to an embodiment of the invention
  • Fig. 5 shows confidences of camera gain estimation
  • Fig. 6 illustrates the sensitivity of a deghosting method according to an embodiment of the invention to gain calibration; shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars); shows on the bottom: hand-held capture via in-camera bracketing.
  • the dynamic car motions are reconstructed ghost free.
  • the third image was selected as reference for the method of Sen et al.
  • the inventive method automatically selects well- exposed sources for every region. shows a comparison with the method of Sen et al. on the Square at night sequence (top). The second exposure was selected as reference for Sen et al.'s method. Due to noise, their method finds few similar patches in other exposures. This implies that the dynamic range cannot be effectively extended using other input images (middle). The inventive method selects consistent sources with as low variance as possible, preventing the appearance of noise in the result (bottom). shows a comparison with the method of Zimmer et al.
  • Figure 1 shows a method for constructing an HDR image from a set of individual images according to an embodiment of the invention.
  • the input is a set of images taken with a static or hand held camera at different exposure times, where pixel values in the images are the raw output of the camera, i.e., before any of the camera's internal processing.
  • step 110 if captured hand-held, the images are robustly registered using a global homography computed with RANSAC from sparse SURF keypoint matches.
  • the method estimates an irradiance image where each pixel is constructed as a weighted average of colors of the corresponding pixels across the input images in steps 120, 130, 140 and 150.
  • ghosting artifacts would be generated by averaging a set of pixels which includes an inconsistent subset.
  • the method identifies a consistent subset of images per pixel location and reconstructs the final irradiance value as an average of consistent pixel colors. This avoids having to select a reference image, which might hinder the capabilities for dynamic range extension, or having to build a background model, which requires that the background be more like- ly to be observed at every image location— this is not necessarily true for cluttered scenes.
  • shot noise is introduced by the process of light emission, which follows a Poisson distribution where the variance is equal to the mean.
  • Readout noise comprises several other signal-independent sources affecting the acquisition process of digital cameras; it is modeled well by a Gaussian distribution with zero mean.
  • the number of photon-electrons collected by the camera at every pixel is linearly proportional to the incident irradiance. This derives from the properties of the photo-electric effect on silicon-based sensors for visible wavelengths.
  • the raw camera output is also linearly proportional to the number of collected photon- electrons. This relation is known as the camera response function f.
  • the slope of this function corresponds to the camera's gain factor g. This factor is proportional to the ISO setting, e.g., the gain at ISO400 is four times the gain at ISO100.
  • the response function f is linear for raw output, it is possible to recover the number of photon-electrons collected by the camera to approximate the probability distribution of each pixel measurement.
  • the inverse of the response function i.e. the amount of collected photon-electrons, is estimated by
  • the dark frame b i is an image acquired with same exposure time as v ; but without incoming light (e.g., with the lens cap on).
  • the product ⁇ ( ⁇ ) between the image' s exposure time t i and the incident irradiance x(p) is known as the exposure, which is proportional to the number of photon-electrons collected by the camera.
  • Dark frames measure the camera output induced by thermal energy only (not by light). In the present embodiment, it is assumed that the values in the dark frame are negligible or, equivalently, that dark frame subtraction is performed in-camera, which is common in modern digital cameras. Thus, the dark frame b ⁇ p) is replaced with the black level L Q of the camera.
  • the exposure ⁇ ( ⁇ ) follows a Poisson distribution, and the uncertainty in its measurement corresponds to the shot noise. This distribution is approximated using a Gaussian to model the variance of the irradiance estimate x(p) .
  • the variance of x(p) in image i can be derived as where ⁇ is the variance of the readout noise, which is also modeled using a Gaussian.
  • the parameters g , L 0 , and t i need to be estimated the.
  • the exposure time t i can be read directly from the digital image file.
  • the black level Lo, and the readout variance are calibrated using the method described in [Janesick 2001 ; Granados et al. 2010] .
  • This method estimates Lo and ⁇ as the mean and variance, respectively, of the pixel values of a black frame, i.e., an image taken with no incident light and no integration time, practically, a very short exposure time. In principle, this data could be obtained for every camera model from the manu- facturer.
  • Figure 2 shows a one-dimensional illustration of HDR reconstruction.
  • An HDR image can be reconstructed by averaging the irradiance estimates derived from the color of corresponding pixel locations in the input images. ghosting artifacts appear whenever sets of inconsistent colors are included in the average.
  • the problem of HDR deghosting can be defined as selecting consistent subsets of colors for every pixel.
  • two pixels at corresponding locations in different images are consistent if the corresponding color difference follows the predicted color differ- ence distribution, and a group of pixels is self-consistent if all the pixels are pair-wise consistent.
  • a group of pixels is self-consistent if all the pixels are pair-wise consistent.
  • the probability d (p) - x) (p) may be estimated: Since Gaussian, which for consistent pairs has zero mean and variance. where ⁇ k and ⁇ k are obtained from equation 2. Given the variance ⁇ k , the probability that observations at pixel p on images i, j are consistent may be estimated by comparing the corresponding irradiance differences with the expected noise distribution of the images on every color channel:
  • N is the standard Gaussian random variable with mean zero and variance one.
  • the estimate Pr(/? I ⁇ i, j ⁇ ) can be noisy (e.g., when the image is taken under low-light or when the camera has a high readout noise).
  • the difference image d ⁇ (p) is smoothed using bilateral filter- ing. This step may be referred to as noise-adaptive difference filtering (DF).
  • DF noise-adaptive difference filtering
  • This filtering introduces dependencies between the distributions of neighboring pixels. However, this dependency occurs mostly between pixels that have already similar distributions. Given this similarity, the net effect of the filtering is an attenuation of the tails of the difference distribution. This allows obtaining higher detection sensitivity for the same specificity level.
  • the variance of the difference function ⁇ t (p) also varies for every pixel and image pair.
  • V ⁇ v ; ⁇ , i e T be the set of images in the exposure sequence.
  • the probability that a given subset L ; e 2 V is consistent at a pixel p is defined as the minimum of the pair- wise consistency:
  • Pr(p ⁇ L t ) min ⁇ Pr(p I ⁇ i, j ⁇ e L l x L l ) ⁇ . (5)
  • Pr(p I ⁇ / ⁇ ) 1 - maximin Pr(v ⁇ p )), max Pr(v ⁇ p )) ⁇ (6) k e k oe with I G ⁇ R, G, B ⁇ .
  • Pr ue and Pr oe correspond to the under- and over-exposure probability, respectively, of an observation according to the distribution of the (Gaussian) readout noise, when centered at the black level and saturation level, respectively.
  • the choice of a particular subset to be averaged is ill-posed. This choice may be regularized by requiring that the selected subsets be spatially color-consistent.
  • the pixel-wise consistency test and the spatial consistency test cast the HDR deghosting problem as a Markov random field (MRF)-type global energy minimization.
  • MRF Markov random field
  • a consistent subset of irradiance estimates is selected for every pixel to reconstruct the final pixel value.
  • Arbitrarily any one subset may introduce unnatural color discontinuity in the final image (see yellow arrows in figure 6).
  • This problem is resolved by introducing a spatial continuity measure as a regularizer, and finds a solution by minimizing a global energy that takes into account the consistence at every pixel location as well as their spatial coherence.
  • This labeling is obtained by minimizing the energy functional: comprising terms for the consistency potential, the variance potential and the prior potential, where L pq corresponds to the index of the subset L F u L Fq e L, N denotes the
  • the role of the consistency potential is to penalize image sets that do not have a high consistency probability, whereas the variance potential ensures that the final reconstruction has low noise by penalizing groups with larger variance. Additionally, the prior potential encourages the final reconstruction to agree with its spatial neighbors at every pixel.
  • a confidence value is set to determine whether a set of images L F is consistent or not. This encodes an important design choice: W want to select any consistent group, not the most consistent one. This design gives more freedom to the optimization algorithm to construct the final composite.
  • the variance potential prevents the generation of trivial solutions.
  • Well-exposed observations from a single image are defined as consistent. Under this definition, selecting a single well-exposed image for reconstructing the whole image would create a labeling with minimum energy. This selection is undesired since the information contained in other consistent images is left out of the average, thus degrading the SNR of the resulting irradiance estimates. Instead, whenever two distinct sets are consistent, the set that produces lower-variance estimates regardless of the set size is preferable.
  • the variance potential V(L ⁇ ) encodes preference by assigning higher costs to groups that provide higher-variance estimates. The relative variance of each estimate is:
  • Parameter selection There are three hyper-parameters to be tuned in equation 7: The weight ⁇ for the variance potential, the confidence value a of the consistency tests, and the weight ⁇ of the prior potential.
  • the parameter ⁇ is set to 0: 1 to ensure that the variance potential in equation 7 produces order-of-magnitude lower costs than the consistency potential. This design instructs the algorithm to prefer consistent subsets, but when presented with several consistent options, it will prefer the one with the least noise.
  • the other two parameters were determined based on a performance evaluation using the challenging busy square sequence (figure 8).
  • the confidence value a was set to 0.98, which provides a good trade-off between sensitivity and specificity of ghost detection when compared to a manual annotation of the scene (see Sec. 3 for details). In preliminary experiments, variations of a did not affect the results significant- ly.
  • Parameter ⁇ is set to 20, which is the lowest value that did not introduce visual discontinuities on the test sequence (see figure 6). Once determined, the parameters ⁇ , ⁇ , ⁇ ; were fixed for all experiments.
  • Figure 3 shows the effects of varying parameters ⁇ and ⁇ in equation 7.
  • the right- hand side colors correspond to the estimated labeling, which is proportional to the noise of the selected subset (blue: higher SNR, red: higher SNR).
  • Parameter ⁇ is set to 2
  • is set to 0.1 (outlined in red) since these produce a good trade-off between low noise and spatial consistency.
  • the algorithm mostly selects a single image as source except for ill-exposed regions (white arrows), as only such re- gions are considered inconsistent. This behavior holds regardless of the weight ⁇ given to the prior potential.
  • the remaining subsets of larger SNR are preferred providing they are consistent, resulting in labelings that adapt more to the scene.
  • visual discontinuities marked by yellow arrows
  • the camera gain g can be calibrated by a method according to the invention that works directly from an input image set using regions of constant illumination in the input images. More specifically, an input image, e.g. the best exposure of the input set, is divided into super pixels (VEKSLER, O., BOYKOV, Y., AND MEHRANI, P. 2010. Superpixels and supervoxels in an energy optimization framework. In Proc. ECCV, vol. 6315, 211-224) and then the mean and variance of their color values are estimated. From the resulting mean-variance scatter plot (figure 3-top), the minimum variance is selected for each digital value, and RANSAC (FISCHLER, M. A., AND BOLLES, R. C. 1981.
  • FISCHLER FISCHLER, M. A., AND BOLLES, R. C. 1981.
  • Random sample consensus A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381-395) is used to fit a line that passes through (L 0 , ), i.e., through the expected variance at the black level.
  • FIG. 4 illustrates this process.
  • the top row shows the mean and variance color value of each super pixel (yellow and red dots).
  • Red dots at the top and the bottom correspond to low-variance super pixels that are used for calibration.
  • Yellow dots represent the remaining super pixels.
  • Green lines show the predicted noise by image-based calibration, blue dashed lines show the prediction by flat-field calibration.
  • the super pixels with minimum variance are selected as proxies for images exposed with a constant illumination at every pixel, such that every pixel color can be assumed to be a sample of the same random variable (shown in red). This selection is justified as only shot noise and readout noise contribute to the variance of image regions with constant illumination and, therefore, these noise sources determine the lower bound of the color variance.
  • Using super pixels to estimate the lower bound of image variance was proposed in [Liu et al. 2008] for image denoising, but has not been applied for HDR re- construction.
  • the deghosting method according to the invention is robust to calibration errors, so even in cases where the gain is overestimated (b), the final images are still free of ghosting artifacts.
  • Figure 5 shows box plots of the 1st, 25th, 50th, 75th and 99th percentiles of the distribution of gain factors obtained from a flat- field calibration (JANESICK, J. 2001. Scientific charge-coupled devices. SPIE Press, GRANADOS, M., AJDIN, B., WAND, M., THEOBALT, C, SEIDEL, H.-P., AND LENSCH, H. P. A. 2010.
  • CVPR 215-222
  • the gray line denotes the true gain of the camera.
  • the expected gain for both methods is very close, but the variance of image -based calibration is higher. Despite this, the gain estimate can still be used to reconstruct ghostfree HDR images (see figure 6).
  • the red curve illustrates the dependency between the gain factor and the image variance prediction.
  • the image-based calibration is sufficiently accurate. Importantly, since a wide range of scenes contain locally flat regions, this gain calibration approach allows applying the deghosting algorithm directly without requiring users to capture flat field images. However, its accuracy is content dependent, and figure 4b shows an example image from which the camera gain could not be correctly estimated.
  • the image flat regions cover a limited color band, which misleads the slope estimation (figure 4b-top). That said, ghosting artifacts typically only appear when the variance within super pixels (and thus the gain) is underestimated (e.g. below the true gain, see figure 6), which is a highly unlikely scenario in practice. In general, when the camera gain is over-estimated, the predicted noise for the input images is under-estimated.
  • FIG. 6 shows the sensitivity of the inventive deghosting method to gain calibration accuracy.
  • g, ⁇ - g denote the mean and standard deviation of the flat-field estimates.
  • the method is robust to slight under-estimation (b) and large over-estimation (d) of the camera gain: When it is under-estimated (which occurs seldom, see figure 5), ghosting artifacts can appear (a, magenta arrow). Conversely, when the gain is over- estimated, it leads to low SNR (d), but it does not introduce ghosting artifacts.
  • Table 1 Summary of test sequences. HH; Hand-held, SC: scene clutter, SD: small object displacements, LD: large object displacement, LL: low light. Gain factor for ISO J 00 setting.
  • the gain factor was estimated independently for every sequence using image-based calibration as described above. Although the gain needs to be estimated only once for any given camera model, it was calibrated on each sequence in order to validate the robustness of the inventive method.
  • Per scene three or five images were captured in RAW mode at steps of one or two stops, respectively.
  • the input color image is constructed from the green, red, and blue observation found on each 2 x 2 block of pixels in the un-demosaiced raw image. One of the four observations in each block is not used. If captured hand-held the images are registered using a global homography computed with RANSAC from sparse SURF keypoint matches. After HDR reconstruction, the images were white balanced and tone mapped.
  • Figure 7 shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars). Focusing on small displacement quality in figure 7, of it show that the inventive method produces convincing results.
  • the flower shop and busy square (figure 8) sequences show how strong scene clutter can cause severe ghosting artifacts in an HDR reconstruction which includes every image into the irradiance average.
  • the square at night (figure 10) sequence shows that the inventive method is robust to high image noise.
  • the cafe' terrace sequence (figure 9) contain relatively small object displacements for which previous reference-image-based methods are designed (SEN, P., KALANTARI, N.
  • the method of Sen et al. finds patch-wise correspondences between the reference and the remaining input images.
  • the reference image is of low dynamic range, regions that are ill-exposed or contain high noise might not be matched correctly to other exposures. This is demonstrated in figure 9, where the dynamic range of over-exposed regions could not be enhanced (indicated by arrows).
  • figure 10 shows that strong noise in the reference may restrict correspondence finding in other images for range enhancement, leading to a noisy HDR image.
  • the inventive method is designed to select sets of images that are both consistent and have low noise, resulting in HDR images with comparatively less noise.
  • the inventive method could also generate noisy image regions (see figure 8, right) if this guarantees consistency, as this is weighted more than achieving low noise (see equation 7).
  • Zim- mer et al. establish correspondences using optical flow, which will fail on objects that undergo large displacements or disocclusions. This failure case is shown on the person in figure 11, where ghosting artifacts are introduced after two instances of a person undergoing local motion cannot be properly aligned. In contrast, the inventive method selects a single self-consistent image, thus preventing the introduction of ghosting artifacts.
  • DF noise-adaptive difference filtering
  • the inven- tion had higher results than previous methods (46.7-58.3% vs. 43.6% for Grosch).
  • the adaptive DF the specificity was comparable to that of other methods, including those methods based on invariants.
  • the method achieves the best sensitivity, which is crucial for removing ghosts, without compromis- ing the specificity, which is crucial for producing low-noise HDR images.

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

In order to obtain images from a combination of individual images, e.g. in high dynamic range imaging or panoramic stitching, the invention proposes a computer- implemented method, comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image. Preferably, the irradiance is determined based on a subset of the set of individual images, selected according to a statistical model.

Description

Automatic Noise Modeling for Ghost-free image Reconstruction
The present invention relates to a computer-implemented method and a device for constructing combined images from a set of individual images that are ghost-free, for example the construction of high dynamic range images or panoramic images of a dynamic scene.
Introduction
The acquisition of high dynamic range (HDR) or panoramic images of dynamic scenes without introducing ghosting is difficult. Even when using modern cameras with au- tomatic exposure bracketing, the inter-frame capture time between input images can be long enough to cause significant object displacement between images of cluttered of dynamic scenes, e.g. in cities or at popular tourist destinations, or in scenes with fast motion. When pixel colors from different images are averaged to construct an HDR image, ghosting artifacts are introduced.
In the prior art, strategies for avoiding such artifacts include aligning the scene before color averaging, performing joint alignment and reconstruction using one reference image from the LDR set or detecting regions with moving objects and excluding their images from the average.
Although optical flow methods can correct short displacements caused by camera shake and moving objects, they typically fail to estimate large displacements, and have difficulties with disocclusions occurring in highly cluttered and highly dynamic scenes.
Joint alignment and reconstruction methods define a reference image to which all other images are patch-wise aligned. Ill-exposed regions in the reference are filled using an adaption of the bi-directional similarity function between the remaining input images and the HDR result. However, a single reference image might not correspond to the desired output, and a better result could be composited using parts from different images. For example, people in any chosen reference image may be occluded in other input images. In such cases, the dynamic ranges of reference image objects cannot be completed. Most HDR construction methods try to detect image regions that could produce ghosting artifacts and exclude them from the average. In general, these methods assume that the images are already aligned, and rely on an ability to test if the colors observed for the same pixel in different images are consistent. Consistency is tested with criteria such as pair-wise irradiance difference, irradiance difference to a background model, distance to the intensity mapping function, variance of the irradiance estimates, average ratio between images, probability of the distance to a background model, correlation with a reference image, difference of the entropy on local image patches and difference between gradient orientations. However, each of these consistency tests requires setting fixed thresholds that are unlikely to generalize well to the noise proper- ties of different cameras and exposure settings. All of these strategies fail under challenging conditions that occur in reality. There is no single best method and the selection of an adequate approach depends on the user's goal. Similar problems occur when constructing panoramic images from a set of images. It is therefore an object of the present invention to provide an improved method and a device for constructing a combined image, e.g. an HDR or a panorama image from a set of individual images in a wide variety of situations, including dynamic scenes with strong clutter and dynamics, with a reduced likelihood of shorting artifacts. These objects are achieved by a method for constructing a combined image from a set of individual images, comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image. The step of determining the irradiance of a pixel of the combined image may comprises determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset. The subset may be determined based on a statistical model of color values in the set of individual images.
Colors are observed at the same pixel location across different exposures in an LDR set. To test whether two colors correspond to the same irradiance and so correspond to the same object, the invention considers their noise distributions of the color values measured by the camera. Noise distributions depend on the camera and exposure settings, and can be modeled using Gaussian distributions. Distribution variance is proportional to the light intensity and is inversely proportional to the squared exposure time, and depends on camera parameters such as the gain factor and the readout noise parameters. Given that the noise depends on the scene irradiance and the camera parameters, no fixed threshold can be reliably set to detect image differences across camera models and scenes. According to the invention, the noise distribution may be predicted from the input images and used to normalize the color consistency tests. This automatic noise modeling approach improves the discriminative power of ghosting detection.
The statistical model may based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera. The gain factor may estimated based on regions of essentially constant illumination in the input images. The subset may be determined based on a measure of spatial coherence. The subset may be determined based on a variance of the subset.
The combined image may be a high dynamic range image. The combined image may also be a panoramic stitching of the individual images. Outputting the combined may comprise at least one step of storing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display. The method may be implemented in special purpose hardware, e.g. for inclusion in a camera, or on a general purpose computer.
In general, there can be multiple ghost-free images that are consistent with a set of input images. According to a further aspect of the invention, a final image may be chosen such that each pixel has high signal-to-noise (SNR) ratio and is spatially compatible with its neighbors in other images.
This optimization directly produces results with lower noise than existing methods, and is especially useful for images acquired in low light, e.g., night shots.
In addition, the invention comprises a simple method for estimating the camera gain factor from arbitrary images, enabling an automatic prediction of a camera noise range. The method is characterized in that the gain factor is estimated based on regions of essentially constant illumination in the input images. In summary, the HDR imaging method according to the invention fully automatically takes advantage of a camera noise model for performing reliable ghost-free reconstruction across different cameras and scenes. It obtains the irradiance of every pixel with lower noise and fewer artifacts than existing state-of-the-art approaches, even for very challenging scenes including crowded places with small and large object displacements and low-light shots. All these scenes are computed with no parameter tuning.
The invention further comprises a device for constructing a combined image from a set of individual images, comprising an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like; an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and an output module for outputting the combined image, such as a display and / or a network and / or a storage interface.
These and other aspects of the present invention will now be explained in connection with an embodiment of the invention and using the annexed figure, in which
Fig. 1 shows a schematic flow diagram of a method for according / or constructing an HDR image to an embodiment of the invention;
Fig. 2 is a one-dimensional illustration of HDR reconstruction based on consistent subsets of individual images; Fig. 3 shows the effects of varying parameters β and λ in equation 7;
Fig. 4 illustrates a process of gain calibration according to an embodiment of the invention; Fig. 5 shows confidences of camera gain estimation;
Fig. 6 illustrates the sensitivity of a deghosting method according to an embodiment of the invention to gain calibration; shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars); shows on the bottom: hand-held capture via in-camera bracketing. The dynamic car motions are reconstructed ghost free. Bottom: cluttered busy square sequence, where naive averaging produces severe artifacts (left- hand side) and the inventor's result is ghost free (right-hand side). shows a comparison to Sen et al. on the Cafe terrace sequence (top). The third image was selected as reference for the method of Sen et al. Here, their method encounters difficulties extending the dynamic range of ill- exposed regions, which results in a washed-out appearance (indicated by arrows). In contrast, the inventive method automatically selects well- exposed sources for every region. shows a comparison with the method of Sen et al. on the Square at night sequence (top). The second exposure was selected as reference for Sen et al.'s method. Due to noise, their method finds few similar patches in other exposures. This implies that the dynamic range cannot be effectively extended using other input images (middle). The inventive method selects consistent sources with as low variance as possible, preventing the appearance of noise in the result (bottom). shows a comparison with the method of Zimmer et al. on the busy square sequence: (a) Reference image, (b) optical-flow alignment of an additional input image to the reference, (c) result after HDR reconstruction using (a) and (b), and (d) our result. shows a comparison of the inventive consistency detector with other state- of-the-art ghosting-detection methods. Here, the differences between a pair of images of the busy square (Fig. 8, right) are shown in red on top of their average color shows semantic inconsistencies and interactive correction: The inventive algorithm may produce semantic inconsistencies (a). These can appear when the color difference falls below the noise level (top), when all objects in a given image region are partially ill exposed (middle), or when objects are partially occluded (bottom). These inconsistencies can be corrected interactively by editing the labels (b). The results after editing are shown in (c). Detailed description of an embodiment
In the following, a detailed embodiment of a method for a ghost-free combination of images is explained in relation to methods for constructing an HDR image from a set of individual images.
Figure 1 shows a method for constructing an HDR image from a set of individual images according to an embodiment of the invention. The input is a set of images taken with a static or hand held camera at different exposure times, where pixel values in the images are the raw output of the camera, i.e., before any of the camera's internal processing.
In step 110, if captured hand-held, the images are robustly registered using a global homography computed with RANSAC from sparse SURF keypoint matches.
With an aligned image set, the method estimates an irradiance image where each pixel is constructed as a weighted average of colors of the corresponding pixels across the input images in steps 120, 130, 140 and 150. Ghosting artifacts would be generated by averaging a set of pixels which includes an inconsistent subset. Instead, the method identifies a consistent subset of images per pixel location and reconstructs the final irradiance value as an average of consistent pixel colors. This avoids having to select a reference image, which might hinder the capabilities for dynamic range extension, or having to build a background model, which requires that the background be more like- ly to be observed at every image location— this is not necessarily true for cluttered scenes.
Image noise estimation
Even when assuming a static scene and constant camera parameters, input image noise varies by exposure time. The two main temporal noise sources are known as shot noise and readout noise. Shot noise is introduced by the process of light emission, which follows a Poisson distribution where the variance is equal to the mean. Readout noise comprises several other signal-independent sources affecting the acquisition process of digital cameras; it is modeled well by a Gaussian distribution with zero mean.
In CCD/CMOS sensors, the number of photon-electrons collected by the camera at every pixel is linearly proportional to the incident irradiance. This derives from the properties of the photo-electric effect on silicon-based sensors for visible wavelengths. The raw camera output is also linearly proportional to the number of collected photon- electrons. This relation is known as the camera response function f. The slope of this function corresponds to the camera's gain factor g. This factor is proportional to the ISO setting, e.g., the gain at ISO400 is four times the gain at ISO100.
Since the response function f is linear for raw output, it is possible to recover the number of photon-electrons collected by the camera to approximate the probability distribution of each pixel measurement. For a non-saturated raw camera output vt(p) on image i and pixel p , the inverse of the response function, i.e. the amount of collected photon-electrons, is estimated by
(1)
where the dark frame bi is an image acquired with same exposure time as v; but without incoming light (e.g., with the lens cap on). The product ίμ(ρ) between the image' s exposure time ti and the incident irradiance x(p) is known as the exposure, which is proportional to the number of photon-electrons collected by the camera. Dark frames measure the camera output induced by thermal energy only (not by light). In the present embodiment, it is assumed that the values in the dark frame are negligible or, equivalently, that dark frame subtraction is performed in-camera, which is common in modern digital cameras. Thus, the dark frame b^p) is replaced with the black level LQ of the camera.
The exposure ίμ(ρ) follows a Poisson distribution, and the uncertainty in its measurement corresponds to the shot noise. This distribution is approximated using a Gaussian to model the variance of the irradiance estimate x(p) . From equation 1, the variance of x(p) in image i can be derived as
Figure imgf000008_0001
where σ is the variance of the readout noise, which is also modeled using a Gaussian. To evaluate equation 2, the parameters g , L0 , and ti need to be estimated the. The exposure time ti can be read directly from the digital image file.
The black level Lo, and the readout variance
Figure imgf000009_0001
are calibrated using the method described in [Janesick 2001 ; Granados et al. 2010] . This method estimates Lo and σ as the mean and variance, respectively, of the pixel values of a black frame, i.e., an image taken with no incident light and no integration time, practically, a very short exposure time. In principle, this data could be obtained for every camera model from the manu- facturer.
Figure 2 shows a one-dimensional illustration of HDR reconstruction. An HDR image can be reconstructed by averaging the irradiance estimates derived from the color of corresponding pixel locations in the input images. Ghosting artifacts appear whenever sets of inconsistent colors are included in the average. The problem of HDR deghosting can be defined as selecting consistent subsets of colors for every pixel.
In the present embodiment, two pixels at corresponding locations in different images are consistent if the corresponding color difference follows the predicted color differ- ence distribution, and a group of pixels is self-consistent if all the pixels are pair-wise consistent. Given two irradiance observations
Figure imgf000009_0002
(p) , xj k (p) at pixel p and color channel k , which are derived from the pixel colors v (p) , (p) on images 1 , , respectively, using the inverse of the camera response function (equation 1), detecting ghosting artifacts requires testing whether these irradiance observations are consistent, i.e. if they correspond to measurements of the same incident light. Existing algorithms solve this problem by relying on pre-determined thresholds, which are difficult to set. This requirement can be avoided by exploiting the noise model according to the present embodiment of the invention. The probability d (p) - x) (p) may be estimated: Since
Figure imgf000009_0003
Gaussian, which for consistent pairs has zero mean and variance.
Figure imgf000010_0001
where σ k and σ k are obtained from equation 2. Given the variance σ k , the probability that observations at pixel p on images i, j are consistent may be estimated by comparing the corresponding irradiance differences with the expected noise distribution of the images on every color channel:
Figure imgf000010_0002
Figure imgf000010_0004
where N is the standard Gaussian random variable with mean zero and variance one. In practice, the estimate Pr(/? I {i, j}) , can be noisy (e.g., when the image is taken under low-light or when the camera has a high readout noise). For this reason, prior to estimating the probabilities, the difference image d^ (p) is smoothed using bilateral filter- ing. This step may be referred to as noise-adaptive difference filtering (DF). A distance kernel of large bandwidth is used, and a range kernel with variable bandwidth ar = 2adt (p) that is proportional to the predicted image noise. This filtering introduces dependencies between the distributions of neighboring pixels. However, this dependency occurs mostly between pixels that have already similar distributions. Given this similarity, the net effect of the filtering is an attenuation of the tails of the difference distribution. This allows obtaining higher detection sensitivity for the same specificity level.
Since the noise variance
Figure imgf000010_0003
is different at every pixel and image in the sequence, the variance of the difference function ^t (p) also varies for every pixel and image pair.
This observation is integral to the inventive technique: As other reconstruction and deghosting methods do not automatically model noise, they are not likely to generalize well to the noise properties of different cameras and exposure settings. Consistency test for sets of images
Let V = {v; }, i e T be the set of images in the exposure sequence. Based on the pair- wise consistency measure (equation 4), the probability that a given subset L; e 2V is consistent at a pixel p is defined as the minimum of the pair- wise consistency:
Pr(p \ Lt ) = min{Pr(p I {i, j}e Ll x Ll )}. (5)
For the case of a singleton Ll (i.e., |L = 1 ) the corresponding consistency probability is given as the probability that the corresponding observations is well exposed:
Pr(p I {/}) = 1 - maximin Pr(v {p )), max Pr(v {p ))} (6) k e k oe with I G {R, G, B}. Prue and Proe correspond to the under- and over-exposure probability, respectively, of an observation according to the distribution of the (Gaussian) readout noise, when centered at the black level and saturation level, respectively.
In this definition, all color channels need to be under-exposed to consider an observation v{(p) inconsistent, and if any color channel is over-exposed v{(p) is considered inconsistent.
Compositing of consistent sets
Since more than one subset of images can be consistent for a given pixel location, the choice of a particular subset to be averaged is ill-posed. This choice may be regularized by requiring that the selected subsets be spatially color-consistent. Together, the pixel-wise consistency test and the spatial consistency test cast the HDR deghosting problem as a Markov random field (MRF)-type global energy minimization.
To obtain a ghost-free HDR image, a consistent subset of irradiance estimates is selected for every pixel to reconstruct the final pixel value. However, given the presence of moving objects, there can be more than one consistent subset. Arbitrarily any one subset may introduce unnatural color discontinuity in the final image (see yellow arrows in figure 6). This problem is resolved by introducing a spatial continuity measure as a regularizer, and finds a solution by minimizing a global energy that takes into account the consistence at every pixel location as well as their spatial coherence. The fi- nal result is represented as a labeling F := F(p) that assigns to each pixel p the index of an element in 2V . This labeling is obtained by minimizing the energy functional:
Figure imgf000012_0001
comprising terms for the consistency potential, the variance potential and the prior potential, where Lpq corresponds to the index of the subset LF u LFq e L, N denotes the
4-neighborhood system on the set of pixel locations Ω, denotes the confidence value (see below) , and β and γ are weighting hyper-parameters.
In equation 7, the role of the consistency potential is to penalize image sets that do not have a high consistency probability, whereas the variance potential ensures that the final reconstruction has low noise by penalizing groups with larger variance. Additionally, the prior potential encourages the final reconstruction to agree with its spatial neighbors at every pixel.
In the consistency and prior potentials, instead of penalizing the consistency probability directly, a confidence value is set to determine whether a set of images LF is consistent or not. This encodes an important design choice: W want to select any consistent group, not the most consistent one. This design gives more freedom to the optimization algorithm to construct the final composite.
The variance potential prevents the generation of trivial solutions. Well-exposed observations from a single image are defined as consistent. Under this definition, selecting a single well-exposed image for reconstructing the whole image would create a labeling with minimum energy. This selection is undesired since the information contained in other consistent images is left out of the average, thus degrading the SNR of the resulting irradiance estimates. Instead, whenever two distinct sets are consistent, the set that produces lower-variance estimates regardless of the set size is preferable. The variance potential V(L{ ) encodes preference by assigning higher costs to groups that provide higher-variance estimates. The relative variance of each estimate is:
where the variance of each grou
Parameter selection There are three hyper-parameters to be tuned in equation 7: The weight λ for the variance potential, the confidence value a of the consistency tests, and the weight β of the prior potential. The parameter λ is set to 0: 1 to ensure that the variance potential in equation 7 produces order-of-magnitude lower costs than the consistency potential. This design instructs the algorithm to prefer consistent subsets, but when presented with several consistent options, it will prefer the one with the least noise. The other two parameters were determined based on a performance evaluation using the challenging busy square sequence (figure 8). The confidence value a was set to 0.98, which provides a good trade-off between sensitivity and specificity of ghost detection when compared to a manual annotation of the scene (see Sec. 3 for details). In preliminary experiments, variations of a did not affect the results significant- ly.
Parameter β is set to 20, which is the lowest value that did not introduce visual discontinuities on the test sequence (see figure 6). Once determined, the parameters α,β, γ ; were fixed for all experiments.
Figure 3 shows the effects of varying parameters β and λ in equation 7. The right- hand side colors correspond to the estimated labeling, which is proportional to the noise of the selected subset (blue: higher SNR, red: higher SNR). Parameter β is set to 2, is set to 0.1 (outlined in red), since these produce a good trade-off between low noise and spatial consistency. These parameters were kept fixed during all experiments.
When noisy subsets are not penalized ( λ = 0; top row), the algorithm mostly selects a single image as source except for ill-exposed regions (white arrows), as only such re- gions are considered inconsistent. This behavior holds regardless of the weight β given to the prior potential. If noisy subsets are penalized mildly, i.e., less than inconsistent subsets (λ = 0: 1; middle row), the remaining subsets of larger SNR (shaded in blue and green colors) are preferred providing they are consistent, resulting in labelings that adapt more to the scene. In this configuration, as β of the prior potential increases, visual discontinuities (marked by yellow arrows) are eliminated from the deghosted image (e.g., in β = 10; 20). When noisy subsets are penalized as much as inconsistent ones ( λ 1; bottom row), it becomes affordable to include objects that are partially ill-exposed (pointed by purple arrows) if they appear on the longest (less noisy) image. These results support the inventor's choice of λ .
Optimization and final reconstruction To obtain a minimum cost labeling F*, the expansion-move algorithm (BOYKOV, Y., VEKSLER, O., AND ZABIH, R. 2001. Fast approximate energy minimization via graph cuts. IEEE TP AMI 23, 11, 1222- 1239) is applied. With the resulting labeling, the final irradiance map is estimated as a weighted average:
Figure imgf000014_0001
where wt{p) corresponds to the probability that ν{(ρ) is well exposed. The weighting function W . = , , <J2 leads to a result close to the maximum likelihood so- lution, and it is constrained to apply identical weights to every color channel in a given pixel.
If not provided by the manufacturer, the camera gain g can be calibrated by a method according to the invention that works directly from an input image set using regions of constant illumination in the input images. More specifically, an input image, e.g. the best exposure of the input set, is divided into super pixels (VEKSLER, O., BOYKOV, Y., AND MEHRANI, P. 2010. Superpixels and supervoxels in an energy optimization framework. In Proc. ECCV, vol. 6315, 211-224) and then the mean and variance of their color values are estimated. From the resulting mean-variance scatter plot (figure 3-top), the minimum variance is selected for each digital value, and RANSAC (FISCHLER, M. A., AND BOLLES, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381-395) is used to fit a line that passes through (L0,
Figure imgf000014_0002
), i.e., through the expected variance at the black level.
Figure 4 illustrates this process. The top row shows the mean and variance color value of each super pixel (yellow and red dots). Red dots at the top and the bottom correspond to low-variance super pixels that are used for calibration. Yellow dots represent the remaining super pixels. Green lines show the predicted noise by image-based calibration, blue dashed lines show the prediction by flat-field calibration. The super pixels with minimum variance are selected as proxies for images exposed with a constant illumination at every pixel, such that every pixel color can be assumed to be a sample of the same random variable (shown in red). This selection is justified as only shot noise and readout noise contribute to the variance of image regions with constant illumination and, therefore, these noise sources determine the lower bound of the color variance. Using super pixels to estimate the lower bound of image variance was proposed in [Liu et al. 2008] for image denoising, but has not been applied for HDR re- construction.
The deghosting method according to the invention is robust to calibration errors, so even in cases where the gain is overestimated (b), the final images are still free of ghosting artifacts.
Figure 5 shows box plots of the 1st, 25th, 50th, 75th and 99th percentiles of the distribution of gain factors obtained from a flat- field calibration (JANESICK, J. 2001. Scientific charge-coupled devices. SPIE Press, GRANADOS, M., AJDIN, B., WAND, M., THEOBALT, C, SEIDEL, H.-P., AND LENSCH, H. P. A. 2010. Optimal HDR reconstruction with linear digital cameras. In Proc. CVPR, 215-222), comprising 36 samples of flat field images, and the distribution of factors obtained from image-based calibration (a sample of seven images, each from a different scene; two shown in figure 4). The gray line denotes the true gain of the camera. The expected gain for both methods is very close, but the variance of image -based calibration is higher. Despite this, the gain estimate can still be used to reconstruct ghostfree HDR images (see figure 6). The red curve illustrates the dependency between the gain factor and the image variance prediction.
The image-based calibration is sufficiently accurate. Importantly, since a wide range of scenes contain locally flat regions, this gain calibration approach allows applying the deghosting algorithm directly without requiring users to capture flat field images. However, its accuracy is content dependent, and figure 4b shows an example image from which the camera gain could not be correctly estimated. The image flat regions cover a limited color band, which misleads the slope estimation (figure 4b-top). That said, ghosting artifacts typically only appear when the variance within super pixels (and thus the gain) is underestimated (e.g. below the true gain, see figure 6), which is a highly unlikely scenario in practice. In general, when the camera gain is over-estimated, the predicted noise for the input images is under-estimated. This makes ghost detection stricter, thus reducing the SNR of the final HDR image because smaller consistent subsets will be found. As such, no ghosting artifacts are introduced by this error (see figure 6). Figure 6 shows the sensitivity of the inventive deghosting method to gain calibration accuracy. Here, g, <- g denote the mean and standard deviation of the flat-field estimates. The method is robust to slight under-estimation (b) and large over-estimation (d) of the camera gain: When it is under-estimated (which occurs seldom, see figure 5), ghosting artifacts can appear (a, magenta arrow). Conversely, when the gain is over- estimated, it leads to low SNR (d), but it does not introduce ghosting artifacts.
The following table 1 shows the results of an experimental evaluation:
Sequence HH SC SD LD LL Camera Est. gain factor
Acrobat (Fig. 1 ) x X X Cation 550D 0.6597
Street traffic (Fig. 8) x X Canon 550D 0.3753
Flower shop (Fig. 1 ) X X Can on S5 0.2390
Busy square (Fig. 8) X X X Canon 55 0,2417
Cafe terrace (Fig. 9) X Canon S5 0.2250
Square at night (Fig. 10) X X X X Canon S5 0.4125
Table 1: Summary of test sequences. HH; Hand-held, SC: scene clutter, SD: small object displacements, LD: large object displacement, LL: low light. Gain factor for ISO J 00 setting.
Several exposure sequences were obtained using a compact digital camera (Canon Powershot S5IS, lObit ADC) and a digital SLR (Canon EOS 550D, 14bit ADC). The camera's black level ( L^ = 32 and LQ = 2048, respectively) and readout variance ( aR 2
= 2:655 and
Figure imgf000016_0001
= 61:01, respectively) were estimated from a black frame. The gain factor was estimated independently for every sequence using image-based calibration as described above. Although the gain needs to be estimated only once for any given camera model, it was calibrated on each sequence in order to validate the robustness of the inventive method. For reference, the gain factors obtained from flat-field calibration were g = 0:2394 and g = 0:4795, respectively.
Per scene, three or five images were captured in RAW mode at steps of one or two stops, respectively. The input color image is constructed from the green, red, and blue observation found on each 2 x 2 block of pixels in the un-demosaiced raw image. One of the four observations in each block is not used. If captured hand-held the images are registered using a global homography computed with RANSAC from sparse SURF keypoint matches. After HDR reconstruction, the images were white balanced and tone mapped.
Figure 7 shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars). Focusing on small displacement quality in figure 7, of it show that the inventive method produces convincing results. The flower shop and busy square (figure 8) sequences show how strong scene clutter can cause severe ghosting artifacts in an HDR reconstruction which includes every image into the irradiance average. In addition, the square at night (figure 10) sequence shows that the inventive method is robust to high image noise. The cafe' terrace sequence (figure 9) contain relatively small object displacements for which previous reference-image-based methods are designed (SEN, P., KALANTARI, N. K., YAESOUBI, M., DARABI, S., GOLDMAN, D., AND SHECHTMAN, E. 2012. Robust patch based hdr reconstruction of dynamic scenes. ACM TOG 31, 6). Even under small displacements, which are well handled by reference-image-based methods, the inventive method produces results with less washed out regions and lower noise.
The method of Sen et al. finds patch-wise correspondences between the reference and the remaining input images. As the reference image is of low dynamic range, regions that are ill-exposed or contain high noise might not be matched correctly to other exposures. This is demonstrated in figure 9, where the dynamic range of over-exposed regions could not be enhanced (indicated by arrows). Additionally, figure 10 shows that strong noise in the reference may restrict correspondence finding in other images for range enhancement, leading to a noisy HDR image. In contrast, the inventive method is designed to select sets of images that are both consistent and have low noise, resulting in HDR images with comparatively less noise. In general, the inventive method could also generate noisy image regions (see figure 8, right) if this guarantees consistency, as this is weighted more than achieving low noise (see equation 7). Zim- mer et al. establish correspondences using optical flow, which will fail on objects that undergo large displacements or disocclusions. This failure case is shown on the person in figure 11, where ghosting artifacts are introduced after two instances of a person undergoing local motion cannot be properly aligned. In contrast, the inventive method selects a single self-consistent image, thus preventing the introduction of ghosting artifacts.
Comparison with detect-and-exclude methods The inventive method was compared against the top four performing methods reported by Sidibe et al. (SIDIBE, D., PUECH, W., AND STRAUSS, O. 2009. Ghost detection and removal in high dynamic range images. In Proc.EUSIPCO), according to their sensitivity score: Grosch (GROSCH, T. 2006. Fast and robust high dynamic range image generation with camera and object movement. In Proc. VMV.)], Heo et al. (HEO, Y. S., LEE, K. M., LEE, S. U., MOON, Y., AND CHA, J. 2010. Ghost-free high dynamic range imaging. In Proc. ACCV, vol. 4, 486-500), and Pece and Kautz (PECE, F., AND KAUTZ, J. 2010. Bitmap movement detection: HDR for dynamic scenes. In Proc. CVMP, 1-8). The inventors used their own implementation of these methods using the specified parameters, whenever available. All detect-and exclude methods, including ours, work in two stages: Detect inconsistent regions, and reconstruct the HDR image using consistent parts only. Since the inconsistency detection is often noisy, they apply different regularization techniques before the reconstruction stage (e.g., Gaussian smoothing, morphological operations, or MRF priors; the inventive method applies the latter). Therefore, to exclude the effect of different regularization strategies (i.e., of different image priors), only the detection stage of every method is compared (see figure 12). For the comparison, the first two input images of the busy square sequence were used. As ground truth, a manual segmentation of their differences was constructed (figure 12a). Table 2 summarizes the sensitivity and specificity achieved by each method in classifying pixels as consistent or inconsistent with respect to the ground truth:
Detection strategy Sensiti ity Specificity
Proposed method i-DF), = 95.0% 0.583 0.750
Proposed method (-DF), a = 98.0% 0.542 0.881
Proposed method (-DF), a = 99.9%» 0.480 0.979
Proposed method (+DF), = 95,0%) 0.536 0.899
Proposed method (+DF), a = 98.0%, §.513 0.947
Proposed method (+DF), a = 99.9%, 0.467 0.987
Absolute difference [Grosch 20061 0.436 0.926
IMF probability [Heo et al. 2010] 0.200 0.963
Metratome ordering [Sidibe et al. 200 J 0.246 0.994
Median threshold [Pece and Kautz 2010] 0. 158 0.999
Table 2
To facilitate a fair comparison, results are presented with and without applying the difference filtering (DF) step of the inventive method. Among previous methods, the Grosch method achieved the best sensitivity (43.6%), which thresholds the absolute irradiance difference between the images (figure 12g). The methods of Sidibe' et al. (figure 12f) and Pece and Kautz (figure 12h) achieve the highest specificity (99.4% and 99.9%) but the lowest sensitivity (24.6% and 15.8%). This occurs as both methods are based on invariants that are satisfied whenever two pixels correspond to the same light intensity, but this is not always violated by moving objects.
The method was tested with confidence values = {0 : 95, 0 : 98, 0 : 999}, and with and without applying noise-adaptive difference filtering (DF). In all cases, the inven- tion had higher results than previous methods (46.7-58.3% vs. 43.6% for Grosch). With the adaptive DF, the specificity was comparable to that of other methods, including those methods based on invariants. The best trade-off was obtained at = 0.98 with sensitivity and specificity of 51% and 95%, respectively (figure 12c). The method achieves the best sensitivity, which is crucial for removing ghosts, without compromis- ing the specificity, which is crucial for producing low-noise HDR images.

Claims

Claims 1. Computer-implemented method for constructing a combined image from a set of individual images, comprising the steps of: determining an irradiance of a pixel of the combined image, based on the set of individual images; and
- outputting the combined image.
2. The method of claim 1, wherein the step of determining the irradiance of a pixel of the combined image comprises - determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset.
3. The method of claim 2, wherein the subset is determined based on a probability distribution of color values in the set of individual images.
4. The method according to claim 3, wherein the probability distribution is based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera.
5. The method according to claim 4, wherein the gain factor is estimated based on regions of essentially constant illumination in the input images.
6. The method according to claims 2 or 3, wherein the subset is determined based on a measure of spatial coherence.
7. The method according to claims 2, 3 or 6, wherein the subset is determined based on a variance of the subset.
8. The method of claim 1, wherein outputting comprises at least one step of stor- ing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display.
9. The method of claim 1, wherein the combined image is a high dynamic range image.
10. The method of claim 1, wherein the combined image is a panoramic stitching of the individual images.
11. Method for estimating a camera gain factor from a set images, characterized in that the gain factor is estimated based on regions of essentially constant illumi- nation in the input images.
12. Device for constructing a combined image from a set of individual images, comprising: - an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like;
an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and
- an output module for outputting the combined image, such as a display and
/ or a network and / or a storage interface.
13. Method for constructing a combined image from a set of individual images, comprising the steps of: automatically determining parameters of an image sensor used for acquiring the individual images;
determining an irradiance of a pixel of the combined image, based on the set of individual images and the parameters; and
- outputting the combined image.
14. The method of claiml3, wherein the parameters comprise a gain factor of the image sensor.
15. The method of claim 14, wherein the combined image is a high dynamic range image.
PCT/EP2013/066942 2013-06-27 2013-08-13 Automatic noise modeling for ghost-free image reconstruction WO2014206503A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201361840002P 2013-06-27 2013-06-27
US61/840,002 2013-06-27
EP13174136.5 2013-06-27
EP13174136 2013-06-27

Publications (1)

Publication Number Publication Date
WO2014206503A1 true WO2014206503A1 (en) 2014-12-31

Family

ID=48740899

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2013/066942 WO2014206503A1 (en) 2013-06-27 2013-08-13 Automatic noise modeling for ghost-free image reconstruction

Country Status (1)

Country Link
WO (1) WO2014206503A1 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9955085B2 (en) 2016-09-22 2018-04-24 Apple Inc. Adaptive bracketing techniques
US10019848B2 (en) 2015-07-31 2018-07-10 Adobe Systems Incorporated Edge preserving color smoothing of 3D models
US10319083B2 (en) 2016-07-15 2019-06-11 Samsung Electronics Co., Ltd. Image artifact detection and correction in scenes obtained from multiple visual images
CN112085803A (en) * 2020-07-27 2020-12-15 北京空间机电研究所 Multi-lens multi-detector splicing type camera color consistency processing method
CN116051449A (en) * 2022-08-11 2023-05-02 荣耀终端有限公司 Image noise estimation method and device

Non-Patent Citations (14)

* Cited by examiner, † Cited by third party
Title
ABHILASH SRIKANTHA ET AL: "Ghost detection and removal for high dynamic range images: Recent advances", SIGNAL PROCESSING: IMAGE COMMUNICATION, vol. 27, no. 6, 1 July 2012 (2012-07-01), pages 650 - 662, XP055036838, ISSN: 0923-5965, DOI: 10.1016/j.image.2012.02.001 *
BOYKOV, Y.; VEKSLER, O.; ZABIH, R: "Fast approximate energy minimization via graph cuts", IEEE TPAMI, vol. 23, no. 11, 2001, pages 1222 - 1239
FISCHLER, M. A.; BOLLES, R. C: "Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography", COMMUN. ACM, vol. 24, no. 6, 1981, pages 381 - 395
GRANADOS, M.; AJDIN, B.; WAND, M.; THEOBALT, C.; SEIDEL, H.-P.; LENSCH, H. P. A: "Optimal HDR reconstruction with linear digital cameras", PROC. CVPR, 2010, pages 215 - 222
GROSCH, T.: "Fast and robust high dynamic range im- age generation with camera and object movement", PROC. VMV, 2006
HEO, Y. S.; LEE, K. M.; LEE, S. U.; MOON, Y.; CHA, J: "Ghost-free high dy- namic range imaging", PROC. ACCV, vol. 4, 2010, pages 486 - 500
JANESICK, J.: "Sci- entific charge-coupled devices", 2001, SPIE PRESS
MIGUEL GRANADOS ET AL: "Automatic noise modeling for ghost-free HDR reconstruction", ACM TRANSACTIONS ON GRAPHICS (TOG), ACM, US, vol. 32, no. 6, 1 November 2013 (2013-11-01), pages 1 - 10, XP058033905, ISSN: 0730-0301, DOI: 10.1145/2508363.2508410 *
MIGUEL GRANADOS ET AL: "Background Estimation from Non-Time Sequence Images", PROCEEDINGS OF GRAPHICS INTERFACE 2008, 30 May 2008 (2008-05-30), Toronto, Ont., Canada, pages 33 - 40, XP055101699, ISBN: 978-1-56-881423-0 *
MIGUEL GRANADOS ET AL: "Optimal HDR reconstruction with linear digital cameras", 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 13-18 JUNE 2010, SAN FRANCISCO, CA, USA, IEEE, PISCATAWAY, NJ, USA, 13 June 2010 (2010-06-13), pages 215 - 222, XP031726033, ISBN: 978-1-4244-6984-0 *
PECE, F.; KAUTZ, J: "Bitmap movement detection: HDR for dynamic scenes", PROC. CVMP, vol. 1-8, 2010
SEN, P.; KALANTARI, N. K.; YAESOUBI, M.; DARABI, S.; GOLDMAN, D.; SHECHTMAN, E: "Robust patch based hdr reconstruction of dynamic scenes", ACM TOG, vol. 31, 2012, pages 6
SIDIBE, D.; PUECH, W.; STRAUSS, O: "Ghost detection and removal in high dynamic range images", PROC.EUSIPCO, 2009
VEKSLER, O.; BOYKOV, Y.; MEHRANI, P: "Superpixels and supervoxels in an energy optimization framework", PROC. ECCV, vol. 6315, 2010, pages 211 - 224

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10019848B2 (en) 2015-07-31 2018-07-10 Adobe Systems Incorporated Edge preserving color smoothing of 3D models
US10319083B2 (en) 2016-07-15 2019-06-11 Samsung Electronics Co., Ltd. Image artifact detection and correction in scenes obtained from multiple visual images
US9955085B2 (en) 2016-09-22 2018-04-24 Apple Inc. Adaptive bracketing techniques
CN112085803A (en) * 2020-07-27 2020-12-15 北京空间机电研究所 Multi-lens multi-detector splicing type camera color consistency processing method
CN112085803B (en) * 2020-07-27 2023-11-14 北京空间机电研究所 Multi-lens multi-detector spliced camera color consistency processing method
CN116051449A (en) * 2022-08-11 2023-05-02 荣耀终端有限公司 Image noise estimation method and device
CN116051449B (en) * 2022-08-11 2023-10-24 荣耀终端有限公司 Image noise estimation method and device

Similar Documents

Publication Publication Date Title
Xu et al. Real-world noisy image denoising: A new benchmark
Jacobs et al. Automatic high-dynamic range image generation for dynamic scenes
Chakrabarti et al. Depth and deblurring from a spectrally-varying depth-of-field
JP4593449B2 (en) Detection device and energy field detection method
Jinno et al. Multiple exposure fusion for high dynamic range image acquisition
Srikantha et al. Ghost detection and removal for high dynamic range images: Recent advances
CN102970464B (en) Image processing apparatus and image processing method
KR101442153B1 (en) Method and system for processing for low light level image.
Tico et al. Motion-blur-free exposure fusion
US20130287296A1 (en) Method and device for image processing
CN108694705A (en) A kind of method multiple image registration and merge denoising
WO2014206503A1 (en) Automatic noise modeling for ghost-free image reconstruction
Lamba et al. Harnessing multi-view perspective of light fields for low-light imaging
Cho et al. Single‐shot High Dynamic Range Imaging Using Coded Electronic Shutter
Lv et al. An integrated enhancement solution for 24-hour colorful imaging
Aguerrebere et al. Simultaneous HDR image reconstruction and denoising for dynamic scenes
KR101921608B1 (en) Apparatus and method for generating depth information
Tallon et al. Space-variant blur deconvolution and denoising in the dual exposure problem
van Beek Improved image selection for stack-based hdr imaging
Wang et al. Rethinking noise modeling in extreme low-light environments
Kakarala et al. A method for fusing a pair of images in the JPEG domain
Gallo et al. Stack-based algorithms for HDR capture and reconstruction
Lelégard et al. Detecting and correcting motion blur from images shot with channel-dependent exposure time
Johnson High dynamic range imaging—A review
Goossens et al. Reconstruction of high dynamic range images with poisson noise modeling and integrated denoising

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13748078

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 13748078

Country of ref document: EP

Kind code of ref document: A1