WO2014206503A1

WO2014206503A1 - Automatic noise modeling for ghost-free image reconstruction

Info

Publication number: WO2014206503A1
Application number: PCT/EP2013/066942
Authority: WO
Inventors: Christian Theobalt; Miguel GRANADOS; Kwang In Kim
Original assignee: MAX-PLANCK-Gesellschaft zur Förderung der Wissenschaften e.V.
Priority date: 2013-06-27
Filing date: 2013-08-13
Publication date: 2014-12-31

Abstract

In order to obtain images from a combination of individual images, e.g. in high dynamic range imaging or panoramic stitching, the invention proposes a computer- implemented method, comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image. Preferably, the irradiance is determined based on a subset of the set of individual images, selected according to a statistical model.

Description

Automatic Noise Modeling for Ghost-free image Reconstruction

The present invention relates to a computer-implemented method and a device for constructing combined images from a set of individual images that are ghost-free, for example the construction of high dynamic range images or panoramic images of a dynamic scene.

Introduction

The acquisition of high dynamic range (HDR) or panoramic images of dynamic scenes without introducing ghosting is difficult. Even when using modern cameras with au- tomatic exposure bracketing, the inter-frame capture time between input images can be long enough to cause significant object displacement between images of cluttered of dynamic scenes, e.g. in cities or at popular tourist destinations, or in scenes with fast motion. When pixel colors from different images are averaged to construct an HDR image, ghosting artifacts are introduced.

In the prior art, strategies for avoiding such artifacts include aligning the scene before color averaging, performing joint alignment and reconstruction using one reference image from the LDR set or detecting regions with moving objects and excluding their images from the average.

Although optical flow methods can correct short displacements caused by camera shake and moving objects, they typically fail to estimate large displacements, and have difficulties with disocclusions occurring in highly cluttered and highly dynamic scenes.

Joint alignment and reconstruction methods define a reference image to which all other images are patch-wise aligned. Ill-exposed regions in the reference are filled using an adaption of the bi-directional similarity function between the remaining input images and the HDR result. However, a single reference image might not correspond to the desired output, and a better result could be composited using parts from different images. For example, people in any chosen reference image may be occluded in other input images. In such cases, the dynamic ranges of reference image objects cannot be completed. Most HDR construction methods try to detect image regions that could produce ghosting artifacts and exclude them from the average. In general, these methods assume that the images are already aligned, and rely on an ability to test if the colors observed for the same pixel in different images are consistent. Consistency is tested with criteria such as pair-wise irradiance difference, irradiance difference to a background model, distance to the intensity mapping function, variance of the irradiance estimates, average ratio between images, probability of the distance to a background model, correlation with a reference image, difference of the entropy on local image patches and difference between gradient orientations. However, each of these consistency tests requires setting fixed thresholds that are unlikely to generalize well to the noise proper- ties of different cameras and exposure settings. All of these strategies fail under challenging conditions that occur in reality. There is no single best method and the selection of an adequate approach depends on the user's goal. Similar problems occur when constructing panoramic images from a set of images. It is therefore an object of the present invention to provide an improved method and a device for constructing a combined image, e.g. an HDR or a panorama image from a set of individual images in a wide variety of situations, including dynamic scenes with strong clutter and dynamics, with a reduced likelihood of shorting artifacts. These objects are achieved by a method for constructing a combined image from a set of individual images, comprising the steps of determining an irradiance of a pixel of the combined image, based on the set of individual images; and outputting the combined image. The step of determining the irradiance of a pixel of the combined image may comprises determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset. The subset may be determined based on a statistical model of color values in the set of individual images.

Colors are observed at the same pixel location across different exposures in an LDR set. To test whether two colors correspond to the same irradiance and so correspond to the same object, the invention considers their noise distributions of the color values measured by the camera. Noise distributions depend on the camera and exposure settings, and can be modeled using Gaussian distributions. Distribution variance is proportional to the light intensity and is inversely proportional to the squared exposure time, and depends on camera parameters such as the gain factor and the readout noise parameters. Given that the noise depends on the scene irradiance and the camera parameters, no fixed threshold can be reliably set to detect image differences across camera models and scenes. According to the invention, the noise distribution may be predicted from the input images and used to normalize the color consistency tests. This automatic noise modeling approach improves the discriminative power of ghosting detection.

The statistical model may based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera. The gain factor may estimated based on regions of essentially constant illumination in the input images. The subset may be determined based on a measure of spatial coherence. The subset may be determined based on a variance of the subset.

The combined image may be a high dynamic range image. The combined image may also be a panoramic stitching of the individual images. Outputting the combined may comprise at least one step of storing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display. The method may be implemented in special purpose hardware, e.g. for inclusion in a camera, or on a general purpose computer.

In general, there can be multiple ghost-free images that are consistent with a set of input images. According to a further aspect of the invention, a final image may be chosen such that each pixel has high signal-to-noise (SNR) ratio and is spatially compatible with its neighbors in other images.

This optimization directly produces results with lower noise than existing methods, and is especially useful for images acquired in low light, e.g., night shots.

In addition, the invention comprises a simple method for estimating the camera gain factor from arbitrary images, enabling an automatic prediction of a camera noise range. The method is characterized in that the gain factor is estimated based on regions of essentially constant illumination in the input images. In summary, the HDR imaging method according to the invention fully automatically takes advantage of a camera noise model for performing reliable ghost-free reconstruction across different cameras and scenes. It obtains the irradiance of every pixel with lower noise and fewer artifacts than existing state-of-the-art approaches, even for very challenging scenes including crowded places with small and large object displacements and low-light shots. All these scenes are computed with no parameter tuning.

The invention further comprises a device for constructing a combined image from a set of individual images, comprising an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like; an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and an output module for outputting the combined image, such as a display and / or a network and / or a storage interface.

These and other aspects of the present invention will now be explained in connection with an embodiment of the invention and using the annexed figure, in which

Fig. 1 shows a schematic flow diagram of a method for according / or constructing an HDR image to an embodiment of the invention;

Fig. 2 is a one-dimensional illustration of HDR reconstruction based on consistent subsets of individual images; Fig. 3 shows the effects of varying parameters β and λ in equation 7;

Fig. 4 illustrates a process of gain calibration according to an embodiment of the invention; Fig. 5 shows confidences of camera gain estimation;

Fig. 6 illustrates the sensitivity of a deghosting method according to an embodiment of the invention to gain calibration; shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars); shows on the bottom: hand-held capture via in-camera bracketing. The dynamic car motions are reconstructed ghost free. Bottom: cluttered busy square sequence, where naive averaging produces severe artifacts (left- hand side) and the inventor's result is ghost free (right-hand side). shows a comparison to Sen et al. on the Cafe terrace sequence (top). The third image was selected as reference for the method of Sen et al. Here, their method encounters difficulties extending the dynamic range of ill- exposed regions, which results in a washed-out appearance (indicated by arrows). In contrast, the inventive method automatically selects well- exposed sources for every region. shows a comparison with the method of Sen et al. on the Square at night sequence (top). The second exposure was selected as reference for Sen et al.'s method. Due to noise, their method finds few similar patches in other exposures. This implies that the dynamic range cannot be effectively extended using other input images (middle). The inventive method selects consistent sources with as low variance as possible, preventing the appearance of noise in the result (bottom). shows a comparison with the method of Zimmer et al. on the busy square sequence: (a) Reference image, (b) optical-flow alignment of an additional input image to the reference, (c) result after HDR reconstruction using (a) and (b), and (d) our result. shows a comparison of the inventive consistency detector with other state- of-the-art ghosting-detection methods. Here, the differences between a pair of images of the busy square (Fig. 8, right) are shown in red on top of their average color shows semantic inconsistencies and interactive correction: The inventive algorithm may produce semantic inconsistencies (a). These can appear when the color difference falls below the noise level (top), when all objects in a given image region are partially ill exposed (middle), or when objects are partially occluded (bottom). These inconsistencies can be corrected interactively by editing the labels (b). The results after editing are shown in (c). Detailed description of an embodiment

In the following, a detailed embodiment of a method for a ghost-free combination of images is explained in relation to methods for constructing an HDR image from a set of individual images.

Figure 1 shows a method for constructing an HDR image from a set of individual images according to an embodiment of the invention. The input is a set of images taken with a static or hand held camera at different exposure times, where pixel values in the images are the raw output of the camera, i.e., before any of the camera's internal processing.

In step 110, if captured hand-held, the images are robustly registered using a global homography computed with RANSAC from sparse SURF keypoint matches.

With an aligned image set, the method estimates an irradiance image where each pixel is constructed as a weighted average of colors of the corresponding pixels across the input images in steps 120, 130, 140 and 150. Ghosting artifacts would be generated by averaging a set of pixels which includes an inconsistent subset. Instead, the method identifies a consistent subset of images per pixel location and reconstructs the final irradiance value as an average of consistent pixel colors. This avoids having to select a reference image, which might hinder the capabilities for dynamic range extension, or having to build a background model, which requires that the background be more like- ly to be observed at every image location— this is not necessarily true for cluttered scenes.

Image noise estimation

Even when assuming a static scene and constant camera parameters, input image noise varies by exposure time. The two main temporal noise sources are known as shot noise and readout noise. Shot noise is introduced by the process of light emission, which follows a Poisson distribution where the variance is equal to the mean. Readout noise comprises several other signal-independent sources affecting the acquisition process of digital cameras; it is modeled well by a Gaussian distribution with zero mean.

In CCD/CMOS sensors, the number of photon-electrons collected by the camera at every pixel is linearly proportional to the incident irradiance. This derives from the properties of the photo-electric effect on silicon-based sensors for visible wavelengths. The raw camera output is also linearly proportional to the number of collected photon- electrons. This relation is known as the camera response function f. The slope of this function corresponds to the camera's gain factor g. This factor is proportional to the ISO setting, e.g., the gain at ISO400 is four times the gain at ISO100.

Since the response function f is linear for raw output, it is possible to recover the number of photon-electrons collected by the camera to approximate the probability distribution of each pixel measurement. For a non-saturated raw camera output v_t(p) on image i and pixel p , the inverse of the response function, i.e. the amount of collected photon-electrons, is estimated by

(1)

where the dark frame b_i is an image acquired with same exposure time as v_; but without incoming light (e.g., with the lens cap on). The product ίμ(ρ) between the image' s exposure time t_i and the incident irradiance x(p) is known as the exposure, which is proportional to the number of photon-electrons collected by the camera. Dark frames measure the camera output induced by thermal energy only (not by light). In the present embodiment, it is assumed that the values in the dark frame are negligible or, equivalently, that dark frame subtraction is performed in-camera, which is common in modern digital cameras. Thus, the dark frame b^p) is replaced with the black level L_Q of the camera.

The exposure ίμ(ρ) follows a Poisson distribution, and the uncertainty in its measurement corresponds to the shot noise. This distribution is approximated using a Gaussian to model the variance of the irradiance estimate x(p) . From equation 1, the variance of x(p) in image i can be derived as

where σ is the variance of the readout noise, which is also modeled using a Gaussian. To evaluate equation 2, the parameters g , L₀ , and t_i need to be estimated the. The exposure time t_i can be read directly from the digital image file.

The black level Lo, and the readout variance

are calibrated using the method described in [Janesick 2001 ; Granados et al. 2010] . This method estimates Lo and σ as the mean and variance, respectively, of the pixel values of a black frame, i.e., an image taken with no incident light and no integration time, practically, a very short exposure time. In principle, this data could be obtained for every camera model from the manu- facturer.

Figure 2 shows a one-dimensional illustration of HDR reconstruction. An HDR image can be reconstructed by averaging the irradiance estimates derived from the color of corresponding pixel locations in the input images. Ghosting artifacts appear whenever sets of inconsistent colors are included in the average. The problem of HDR deghosting can be defined as selecting consistent subsets of colors for every pixel.

In the present embodiment, two pixels at corresponding locations in different images are consistent if the corresponding color difference follows the predicted color differ- ence distribution, and a group of pixels is self-consistent if all the pixels are pair-wise consistent. Given two irradiance observations

(p) , x_j ^k (p) at pixel p and color channel k , which are derived from the pixel colors v (p) , (p) on images ¹ , , respectively, using the inverse of the camera response function (equation 1), detecting ghosting artifacts requires testing whether these irradiance observations are consistent, i.e. if they correspond to measurements of the same incident light. Existing algorithms solve this problem by relying on pre-determined thresholds, which are difficult to set. This requirement can be avoided by exploiting the noise model according to the present embodiment of the invention. The probability d (p) - x) (p) may be estimated: Since

Gaussian, which for consistent pairs has zero mean and variance.

where σ _k and σ _k are obtained from equation 2. Given the variance σ _k , the probability that observations at pixel p on images i, j are consistent may be estimated by comparing the corresponding irradiance differences with the expected noise distribution of the images on every color channel:

where N is the standard Gaussian random variable with mean zero and variance one. In practice, the estimate Pr(/? I {i, j}) , can be noisy (e.g., when the image is taken under low-light or when the camera has a high readout noise). For this reason, prior to estimating the probabilities, the difference image d^ (p) is smoothed using bilateral filter- ing. This step may be referred to as noise-adaptive difference filtering (DF). A distance kernel of large bandwidth is used, and a range kernel with variable bandwidth a_r = 2a_{dt (p)} that is proportional to the predicted image noise. This filtering introduces dependencies between the distributions of neighboring pixels. However, this dependency occurs mostly between pixels that have already similar distributions. Given this similarity, the net effect of the filtering is an attenuation of the tails of the difference distribution. This allows obtaining higher detection sensitivity for the same specificity level.

Since the noise variance

is different at every pixel and image in the sequence, the variance of the difference function ^_{t (p)} also varies for every pixel and image pair.

This observation is integral to the inventive technique: As other reconstruction and deghosting methods do not automatically model noise, they are not likely to generalize well to the noise properties of different cameras and exposure settings. Consistency test for sets of images

Let V = {v_; }, i e T be the set of images in the exposure sequence. Based on the pair- wise consistency measure (equation 4), the probability that a given subset L_; e 2^V is consistent at a pixel p is defined as the minimum of the pair- wise consistency:

Pr(p \ L_t ) = min{Pr(p I {i, j}e L_l x L_l )}. (5)

For the case of a singleton L_l (i.e., |L = 1 ) the corresponding consistency probability is given as the probability that the corresponding observations is well exposed:

Pr(p I {/}) = 1 - maximin Pr(v {p )), max Pr(v {p ))} (6) k e k oe with I G {R, G, B}. Pr_ue and Pr_oe correspond to the under- and over-exposure probability, respectively, of an observation according to the distribution of the (Gaussian) readout noise, when centered at the black level and saturation level, respectively.

In this definition, all color channels need to be under-exposed to consider an observation v_{(p) inconsistent, and if any color channel is over-exposed v_{(p) is considered inconsistent.

Compositing of consistent sets

Since more than one subset of images can be consistent for a given pixel location, the choice of a particular subset to be averaged is ill-posed. This choice may be regularized by requiring that the selected subsets be spatially color-consistent. Together, the pixel-wise consistency test and the spatial consistency test cast the HDR deghosting problem as a Markov random field (MRF)-type global energy minimization.

To obtain a ghost-free HDR image, a consistent subset of irradiance estimates is selected for every pixel to reconstruct the final pixel value. However, given the presence of moving objects, there can be more than one consistent subset. Arbitrarily any one subset may introduce unnatural color discontinuity in the final image (see yellow arrows in figure 6). This problem is resolved by introducing a spatial continuity measure as a regularizer, and finds a solution by minimizing a global energy that takes into account the consistence at every pixel location as well as their spatial coherence. The fi- nal result is represented as a labeling F := F(p) that assigns to each pixel p the index of an element in 2^V . This labeling is obtained by minimizing the energy functional:

comprising terms for the consistency potential, the variance potential and the prior potential, where L_pq corresponds to the index of the subset L_F u L_Fq e L, N denotes the

4-neighborhood system on the set of pixel locations Ω, denotes the confidence value (see below) , and β and γ are weighting hyper-parameters.

In equation 7, the role of the consistency potential is to penalize image sets that do not have a high consistency probability, whereas the variance potential ensures that the final reconstruction has low noise by penalizing groups with larger variance. Additionally, the prior potential encourages the final reconstruction to agree with its spatial neighbors at every pixel.

In the consistency and prior potentials, instead of penalizing the consistency probability directly, a confidence value is set to determine whether a set of images L_F is consistent or not. This encodes an important design choice: W want to select any consistent group, not the most consistent one. This design gives more freedom to the optimization algorithm to construct the final composite.

The variance potential prevents the generation of trivial solutions. Well-exposed observations from a single image are defined as consistent. Under this definition, selecting a single well-exposed image for reconstructing the whole image would create a labeling with minimum energy. This selection is undesired since the information contained in other consistent images is left out of the average, thus degrading the SNR of the resulting irradiance estimates. Instead, whenever two distinct sets are consistent, the set that produces lower-variance estimates regardless of the set size is preferable. The variance potential V(L_{ ) encodes preference by assigning higher costs to groups that provide higher-variance estimates. The relative variance of each estimate is:

where the variance of each grou

Parameter selection There are three hyper-parameters to be tuned in equation 7: The weight λ for the variance potential, the confidence value a of the consistency tests, and the weight β of the prior potential. The parameter λ is set to 0: 1 to ensure that the variance potential in equation 7 produces order-of-magnitude lower costs than the consistency potential. This design instructs the algorithm to prefer consistent subsets, but when presented with several consistent options, it will prefer the one with the least noise. The other two parameters were determined based on a performance evaluation using the challenging busy square sequence (figure 8). The confidence value a was set to 0.98, which provides a good trade-off between sensitivity and specificity of ghost detection when compared to a manual annotation of the scene (see Sec. 3 for details). In preliminary experiments, variations of a did not affect the results significant- ly.

Parameter β is set to 20, which is the lowest value that did not introduce visual discontinuities on the test sequence (see figure 6). Once determined, the parameters α,β, γ ; were fixed for all experiments.

Figure 3 shows the effects of varying parameters β and λ in equation 7. The right- hand side colors correspond to the estimated labeling, which is proportional to the noise of the selected subset (blue: higher SNR, red: higher SNR). Parameter β is set to 2, is set to 0.1 (outlined in red), since these produce a good trade-off between low noise and spatial consistency. These parameters were kept fixed during all experiments.

When noisy subsets are not penalized ( λ = 0; top row), the algorithm mostly selects a single image as source except for ill-exposed regions (white arrows), as only such re- gions are considered inconsistent. This behavior holds regardless of the weight β given to the prior potential. If noisy subsets are penalized mildly, i.e., less than inconsistent subsets (λ = 0: 1; middle row), the remaining subsets of larger SNR (shaded in blue and green colors) are preferred providing they are consistent, resulting in labelings that adapt more to the scene. In this configuration, as β of the prior potential increases, visual discontinuities (marked by yellow arrows) are eliminated from the deghosted image (e.g., in β = 10; 20). When noisy subsets are penalized as much as inconsistent ones ( λ 1; bottom row), it becomes affordable to include objects that are partially ill-exposed (pointed by purple arrows) if they appear on the longest (less noisy) image. These results support the inventor's choice of λ .

Optimization and final reconstruction To obtain a minimum cost labeling F*, the expansion-move algorithm (BOYKOV, Y., VEKSLER, O., AND ZABIH, R. 2001. Fast approximate energy minimization via graph cuts. IEEE TP AMI 23, 11, 1222- 1239) is applied. With the resulting labeling, the final irradiance map is estimated as a weighted average:

where w_t{p) corresponds to the probability that ν_{(ρ) is well exposed. The weighting function W . = , , <J² leads to a result close to the maximum likelihood so- lution, and it is constrained to apply identical weights to every color channel in a given pixel.

If not provided by the manufacturer, the camera gain g can be calibrated by a method according to the invention that works directly from an input image set using regions of constant illumination in the input images. More specifically, an input image, e.g. the best exposure of the input set, is divided into super pixels (VEKSLER, O., BOYKOV, Y., AND MEHRANI, P. 2010. Superpixels and supervoxels in an energy optimization framework. In Proc. ECCV, vol. 6315, 211-224) and then the mean and variance of their color values are estimated. From the resulting mean-variance scatter plot (figure 3-top), the minimum variance is selected for each digital value, and RANSAC (FISCHLER, M. A., AND BOLLES, R. C. 1981. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24, 6, 381-395) is used to fit a line that passes through (L₀,

), i.e., through the expected variance at the black level.

Figure 4 illustrates this process. The top row shows the mean and variance color value of each super pixel (yellow and red dots). Red dots at the top and the bottom correspond to low-variance super pixels that are used for calibration. Yellow dots represent the remaining super pixels. Green lines show the predicted noise by image-based calibration, blue dashed lines show the prediction by flat-field calibration. The super pixels with minimum variance are selected as proxies for images exposed with a constant illumination at every pixel, such that every pixel color can be assumed to be a sample of the same random variable (shown in red). This selection is justified as only shot noise and readout noise contribute to the variance of image regions with constant illumination and, therefore, these noise sources determine the lower bound of the color variance. Using super pixels to estimate the lower bound of image variance was proposed in [Liu et al. 2008] for image denoising, but has not been applied for HDR re- construction.

The deghosting method according to the invention is robust to calibration errors, so even in cases where the gain is overestimated (b), the final images are still free of ghosting artifacts.

Figure 5 shows box plots of the 1st, 25th, 50th, 75th and 99th percentiles of the distribution of gain factors obtained from a flat- field calibration (JANESICK, J. 2001. Scientific charge-coupled devices. SPIE Press, GRANADOS, M., AJDIN, B., WAND, M., THEOBALT, C, SEIDEL, H.-P., AND LENSCH, H. P. A. 2010. Optimal HDR reconstruction with linear digital cameras. In Proc. CVPR, 215-222), comprising 36 samples of flat field images, and the distribution of factors obtained from image-based calibration (a sample of seven images, each from a different scene; two shown in figure 4). The gray line denotes the true gain of the camera. The expected gain for both methods is very close, but the variance of image -based calibration is higher. Despite this, the gain estimate can still be used to reconstruct ghostfree HDR images (see figure 6). The red curve illustrates the dependency between the gain factor and the image variance prediction.

The image-based calibration is sufficiently accurate. Importantly, since a wide range of scenes contain locally flat regions, this gain calibration approach allows applying the deghosting algorithm directly without requiring users to capture flat field images. However, its accuracy is content dependent, and figure 4b shows an example image from which the camera gain could not be correctly estimated. The image flat regions cover a limited color band, which misleads the slope estimation (figure 4b-top). That said, ghosting artifacts typically only appear when the variance within super pixels (and thus the gain) is underestimated (e.g. below the true gain, see figure 6), which is a highly unlikely scenario in practice. In general, when the camera gain is over-estimated, the predicted noise for the input images is under-estimated. This makes ghost detection stricter, thus reducing the SNR of the final HDR image because smaller consistent subsets will be found. As such, no ghosting artifacts are introduced by this error (see figure 6). Figure 6 shows the sensitivity of the inventive deghosting method to gain calibration accuracy. Here, g, <- g denote the mean and standard deviation of the flat-field estimates. The method is robust to slight under-estimation (b) and large over-estimation (d) of the camera gain: When it is under-estimated (which occurs seldom, see figure 5), ghosting artifacts can appear (a, magenta arrow). Conversely, when the gain is over- estimated, it leads to low SNR (d), but it does not introduce ghosting artifacts.

The following table 1 shows the results of an experimental evaluation:

Sequence HH SC SD LD LL Camera Est. gain factor

Acrobat (Fig. 1 ) x X X Cation 550D 0.6597

Street traffic (Fig. 8) x X Canon 550D 0.3753

Flower shop (Fig. 1 ) X X Can on S5 0.2390

Busy square (Fig. 8) X X X Canon 55 0,2417

Cafe terrace (Fig. 9) X Canon S5 0.2250

Square at night (Fig. 10) X X X X Canon S5 0.4125

Table 1: Summary of test sequences. HH; Hand-held, SC: scene clutter, SD: small object displacements, LD: large object displacement, LL: low light. Gain factor for ISO J 00 setting.

Several exposure sequences were obtained using a compact digital camera (Canon Powershot S5IS, lObit ADC) and a digital SLR (Canon EOS 550D, 14bit ADC). The camera's black level ( L^ = 32 and L_Q = 2048, respectively) and readout variance ( a_R ²

= 2:655 and

= 61:01, respectively) were estimated from a black frame. The gain factor was estimated independently for every sequence using image-based calibration as described above. Although the gain needs to be estimated only once for any given camera model, it was calibrated on each sequence in order to validate the robustness of the inventive method. For reference, the gain factors obtained from flat-field calibration were g = 0:2394 and g = 0:4795, respectively.

Per scene, three or five images were captured in RAW mode at steps of one or two stops, respectively. The input color image is constructed from the green, red, and blue observation found on each 2 x 2 block of pixels in the un-demosaiced raw image. One of the four observations in each block is not used. If captured hand-held the images are registered using a global homography computed with RANSAC from sparse SURF keypoint matches. After HDR reconstruction, the images were white balanced and tone mapped.

Figure 7 shows examples of hand held capture with both small displacements (trees, people shifting their weight) and large displacements with fast motion (acrobat, cars). Focusing on small displacement quality in figure 7, of it show that the inventive method produces convincing results. The flower shop and busy square (figure 8) sequences show how strong scene clutter can cause severe ghosting artifacts in an HDR reconstruction which includes every image into the irradiance average. In addition, the square at night (figure 10) sequence shows that the inventive method is robust to high image noise. The cafe' terrace sequence (figure 9) contain relatively small object displacements for which previous reference-image-based methods are designed (SEN, P., KALANTARI, N. K., YAESOUBI, M., DARABI, S., GOLDMAN, D., AND SHECHTMAN, E. 2012. Robust patch based hdr reconstruction of dynamic scenes. ACM TOG 31, 6). Even under small displacements, which are well handled by reference-image-based methods, the inventive method produces results with less washed out regions and lower noise.

The method of Sen et al. finds patch-wise correspondences between the reference and the remaining input images. As the reference image is of low dynamic range, regions that are ill-exposed or contain high noise might not be matched correctly to other exposures. This is demonstrated in figure 9, where the dynamic range of over-exposed regions could not be enhanced (indicated by arrows). Additionally, figure 10 shows that strong noise in the reference may restrict correspondence finding in other images for range enhancement, leading to a noisy HDR image. In contrast, the inventive method is designed to select sets of images that are both consistent and have low noise, resulting in HDR images with comparatively less noise. In general, the inventive method could also generate noisy image regions (see figure 8, right) if this guarantees consistency, as this is weighted more than achieving low noise (see equation 7). Zim- mer et al. establish correspondences using optical flow, which will fail on objects that undergo large displacements or disocclusions. This failure case is shown on the person in figure 11, where ghosting artifacts are introduced after two instances of a person undergoing local motion cannot be properly aligned. In contrast, the inventive method selects a single self-consistent image, thus preventing the introduction of ghosting artifacts.

Comparison with detect-and-exclude methods The inventive method was compared against the top four performing methods reported by Sidibe et al. (SIDIBE, D., PUECH, W., AND STRAUSS, O. 2009. Ghost detection and removal in high dynamic range images. In Proc.EUSIPCO), according to their sensitivity score: Grosch (GROSCH, T. 2006. Fast and robust high dynamic range image generation with camera and object movement. In Proc. VMV.)], Heo et al. (HEO, Y. S., LEE, K. M., LEE, S. U., MOON, Y., AND CHA, J. 2010. Ghost-free high dynamic range imaging. In Proc. ACCV, vol. 4, 486-500), and Pece and Kautz (PECE, F., AND KAUTZ, J. 2010. Bitmap movement detection: HDR for dynamic scenes. In Proc. CVMP, 1-8). The inventors used their own implementation of these methods using the specified parameters, whenever available. All detect-and exclude methods, including ours, work in two stages: Detect inconsistent regions, and reconstruct the HDR image using consistent parts only. Since the inconsistency detection is often noisy, they apply different regularization techniques before the reconstruction stage (e.g., Gaussian smoothing, morphological operations, or MRF priors; the inventive method applies the latter). Therefore, to exclude the effect of different regularization strategies (i.e., of different image priors), only the detection stage of every method is compared (see figure 12). For the comparison, the first two input images of the busy square sequence were used. As ground truth, a manual segmentation of their differences was constructed (figure 12a). Table 2 summarizes the sensitivity and specificity achieved by each method in classifying pixels as consistent or inconsistent with respect to the ground truth:

Detection strategy Sensiti ity Specificity

Proposed method i-DF), = 95.0% 0.583 0.750

Proposed method (-DF), a = 98.0% 0.542 0.881

Proposed method (-DF), a = 99.9%» 0.480 0.979

Proposed method (+DF), = 95,0%) 0.536 0.899

Proposed method (+DF), a = 98.0%, §.513 0.947

Proposed method (+DF), a = 99.9%, 0.467 0.987

Absolute difference [Grosch 20061 0.436 0.926

IMF probability [Heo et al. 2010] 0.200 0.963

Metratome ordering [Sidibe et al. 200 J 0.246 0.994

Median threshold [Pece and Kautz 2010] 0. 158 0.999

Table 2

To facilitate a fair comparison, results are presented with and without applying the difference filtering (DF) step of the inventive method. Among previous methods, the Grosch method achieved the best sensitivity (43.6%), which thresholds the absolute irradiance difference between the images (figure 12g). The methods of Sidibe' et al. (figure 12f) and Pece and Kautz (figure 12h) achieve the highest specificity (99.4% and 99.9%) but the lowest sensitivity (24.6% and 15.8%). This occurs as both methods are based on invariants that are satisfied whenever two pixels correspond to the same light intensity, but this is not always violated by moving objects.

The method was tested with confidence values = {0 : 95, 0 : 98, 0 : 999}, and with and without applying noise-adaptive difference filtering (DF). In all cases, the inven- tion had higher results than previous methods (46.7-58.3% vs. 43.6% for Grosch). With the adaptive DF, the specificity was comparable to that of other methods, including those methods based on invariants. The best trade-off was obtained at = 0.98 with sensitivity and specificity of 51% and 95%, respectively (figure 12c). The method achieves the best sensitivity, which is crucial for removing ghosts, without compromis- ing the specificity, which is crucial for producing low-noise HDR images.

Claims

Claims 1. Computer-implemented method for constructing a combined image from a set of individual images, comprising the steps of: determining an irradiance of a pixel of the combined image, based on the set of individual images; and

- outputting the combined image.

2. The method of claim 1, wherein the step of determining the irradiance of a pixel of the combined image comprises - determining, for the pixel, a subset of individual images from the set; and estimating the irradiance of the pixel, based on the selected subset.

3. The method of claim 2, wherein the subset is determined based on a probability distribution of color values in the set of individual images.

4. The method according to claim 3, wherein the probability distribution is based on a gain factor and/or read-out noise parameters of an image acquisition device, e.g. a camera.

5. The method according to claim 4, wherein the gain factor is estimated based on regions of essentially constant illumination in the input images.

6. The method according to claims 2 or 3, wherein the subset is determined based on a measure of spatial coherence.

7. The method according to claims 2, 3 or 6, wherein the subset is determined based on a variance of the subset.

8. The method of claim 1, wherein outputting comprises at least one step of stor- ing the combined image in a computer-readable medium, communicating the combined image over an electronic communications network, including radio, and / or rendering the combined image on a display.

9. The method of claim 1, wherein the combined image is a high dynamic range image.

10. The method of claim 1, wherein the combined image is a panoramic stitching of the individual images.

11. Method for estimating a camera gain factor from a set images, characterized in that the gain factor is estimated based on regions of essentially constant illumi- nation in the input images.

12. Device for constructing a combined image from a set of individual images, comprising: - an image acquisition module for acquiring the set of individual images, such as a scanner, a camera, a computer-readable memory, a network interface or the like;

an image processing module for determining an irradiance of a pixel of the combined image, based on the set of individual images; and

- an output module for outputting the combined image, such as a display and

/ or a network and / or a storage interface.

13. Method for constructing a combined image from a set of individual images, comprising the steps of: automatically determining parameters of an image sensor used for acquiring the individual images;

determining an irradiance of a pixel of the combined image, based on the set of individual images and the parameters; and

- outputting the combined image.

14. The method of claiml3, wherein the parameters comprise a gain factor of the image sensor.

15. The method of claim 14, wherein the combined image is a high dynamic range image.