WO2018136373A1

WO2018136373A1 - Image fusion and hdr imaging

Info

Publication number: WO2018136373A1
Application number: PCT/US2018/013752
Authority: WO
Inventors: Jing Liao; Lu Yuan
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2017-01-20
Filing date: 2018-01-16
Publication date: 2018-07-26
Also published as: CN108335279A; CN108335279B

Abstract

In implementations of the subject matter described herein, a solution for image fusion in high dynamic range imaging is provided. In this solution, differences between corresponding pixels in each of a plurality of raw images and in a same reference image (also referred to as pixel differences) are determined. Based on a distribution of a portion or all of the pixel differences, pixel thresholds for respective raw images are determined and then used for comparison with the pixel differences to identify noise pixel of the raw images to be excluded from image fusion. Pixels in the raw images that are not excluded can be fused to obtain a fused image. Through the solution, a proper and dedicated pixel threshold can be determined for each of the raw images and is used to exclude noise pixel(s) in that raw image, resulting in an image of high quality by fusing remaining pixels.

Description

IMAGE FUSION AND HDR IMAGING

BACKGROUND

[0001] Compared with the luminance range that is visible to human eyes in a realistic circumstance, the luminance range captured by a sensor available in a digital imaging device (for example, a camera) is usually much smaller. As a conventional digital imaging device takes an image of a scene at a single exposure, the image only contains a limited range of luminance contrast. Depending on whether a high or low exposure is used, many details of too bright or dark regions in the scene would be lost. To maintain more details of the scene, High Dynamic Range (HDR) imaging is becoming a more and more popular imaging technology in a digital imaging device. The image obtained from the HDR imaging is also referred to as a HDR image, which can provide a high range of luminance from the dark region to a completely illuminated region in the scene.

[0002] To produce a HDR image, the digital imaging device will capture a plurality of raw images in the same scene in a relatively short period of time and obtain a fused image by fusing these raw images. In the fused image, favorable pixels in different regions of the raw image are preserved and unfavorable pixels are discarded, thereby presenting a richly detailed scene graph. The fused image can be used as a HDR image directly in some cases. In some other conditions, it is also possible to continue processing the fused image, for example, by applying tone mapping to the fused image to adjust the exposure of the image in order to produce a HDR image of higher quality.

SUMMARY

[0003] In accordance with implementations of the subject matter described herein, a solution for image fusion in HDR imaging is provided. In this solution, differences between corresponding pixels in each of a plurality of raw images and in a same reference image (also referred to as pixel differences) are determined. Based on a distribution of a portion or all of the pixel differences, pixel thresholds for respective raw images are determined and then used for comparison with the pixel differences to identify noise pixel of the raw images to be excluded from image fusion. Pixels in the raw images that are not excluded can be fused to obtain a fused image. Through the solution of the subject matter described herein, a proper and dedicated pixel threshold can be determined for each of the raw images to be processed and is used to exclude noise pixel(s) in that raw image, resulting in an image of high quality obtained from the fusion of the remaining pixels. . [0004] This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

BRIEF DESCRIPTION OF THE DRAWINGS

[0005] Fig. 1 illustrates a block diagram of a computing environment in which implementations of the subject matter described herein can be implemented;

[0006] Fig. 2 illustrates a block diagram of a high dynamic range imaging system in accordance with some implementations of the subject matter described herein;

[0007] Fig. 3 illustrates a block diagram of the image fusion stage of the system of Fig. 2 in accordance with some implementations of the subject matter described herein;

[0008] Fig. 4 illustrates a schematic diagram of example multi-image alignment in accordance with some implementations of the subject matter described herein;

[0009] Fig. 5 illustrates a schematic diagram of example image fusion in accordance with some implementations of the subject matter described herein;

[0010] Fig. 6 illustrates a block diagram of the tone mapping stage of the system of Fig. 2 in accordance with some implementations of the subject matter described herein;

[0011] Fig. 7 illustrates a schematic diagram of example exposure fusion in accordance with some implementations of the subject matter described herein;

[0012] Fig. 8 illustrates a flowchart of an image fusion process in accordance with some implementations of the subject matter described herein; and

[0013] Fig. 9 illustrates a flowchart of a tone mapping process in accordance with some of the implementations of the subject matter described herein.

[0014] Throughout the drawings, the same or similar reference symbols refer to the same or similar elements.

DETAILED DESCRIPTION

[0015] The subject matter described herein will now be discussed with reference to several example implementations. It would be appreciated these implementations are discussed only for the purpose of enabling those skilled persons in the art to better understand and thus implement the subject matter described herein, rather than suggesting any limitations on the scope of the subject matter.

[0016] As used herein, the term "includes" and its variants are to be read as open terms that mean "includes, but is not limited to." The term "based on" is to be read as "based at least in part on." The term "one implementation" and "an implementation" are to be read as "at least one implementation." The term "another implementation" is to be read as "at least one other implementation." The terms "first," "second," and the like may refer to different or same objects. Other definitions, explicit and implicit, may be included below.

Overview of HDR Imaging and Image Fusion

[0017] In various imaging technologies, especially the HDR imaging technology, image fusion is an important process of image processing. Image fusion relates to fusing a plurality of raw images of a scene into an image. To obtain a fused image of higher quality, it is expected to fuse as many favorable pixels as possible in the plurality of raw images but discard unfavorable pixels. During the process of filtering the unfavorable pixels, the plurality of raw images are compared with a reference image to determine corresponding pixel differences. If a pixel difference is larger than a certain pixel threshold, the corresponding pixel in a raw image is excluded from image fusion. The pixels in a raw image that differ greatly from the reference image are usually noise with respect to the reference image, such as outlier pixels caused by a camera movement or moving objects or image noise caused by other factors. Therefore, pixels to be excluded can also be referred to as noise pixels.

[0018] The identification and exclusion of the noise pixels will impact the quality of the fused image. The pixel threshold determines which pixel of each raw image may be considered as noise pixels. As such, the selection of the pixel threshold impacts the quality of image fusion to a large extent. In some conventional image fusion methods, the pixel threshold is set to a certain fixed value based on experience. However, due to differences of software and hardware performances and the use manner of the capturing devices (for example, the cameras) for scene capturing, noise deviation ranges of the captured raw images are also different. Thus, a fixed pixel threshold value cannot always present good effect for the fusion of raw images captured by different cameras in different utilization scenes. In some other image fusion methods, the pixel threshold is set to a fixed value depending on the camera in use. In other words, a proper pixel threshold is set by considering the performance parameters of a specific camera and possible ways of using it. However, such pixel threshold is only applicable to the fusion of images captured by that specific camera, which shows significant limitations.

[0019] In the use cases of HDR imaging, image fusion also affects the quality of the HDR image expected to be obtained. In some cases, the result of image fusion is directly considered as a HDR image. For example, if a plurality of raw images are captured at different exposures (with exposures ranging from high to low), a HDR image with a higher luminance range can be generated by fusing those images. If a plurality of raw images are captured at the same and normal exposure, the fused image thereof can also present richer details than the raw images and may thus be considered as a HDR image. In other cases, a plurality of raw images can be captured at the same exposure (for example, an exposure lower than a normal exposure). After the under-exposed images are fused, it is also possible to continue to perform tone mapping to adjust the exposure of the fused image, thereby obtaining a HDR image. In these cases, if the quality of the fused image is poor, for example, if the noise pixels therein are not properly filtered out or some favorable pixels are excluded by mistake, it would be unbeneficial for the generation of the HDR image.

[0020] Some potential problems in the image fusion of HDR imaging are discussed above. According to implementations of the subject matter described herein, there is provided a HDR imaging scheme to address one or more of the above defects. In accordance with the HDR imaging scheme proposed herein, instead of setting a fixed pixel threshold, a specific pixel threshold is determined dynamically for each of a plurality of raw images. The pixel threshold can be determined based on a distribution of pixel differences between each of the raw images and a same reference image and used to filter a noise pixel(s) in that raw image. A noise pixel can be identified as a pixel of a raw image that has a pixel difference with a corresponding pixel of the reference image exceeded the pixel threshold. Since a specific pixel threshold is estimated adaptively for each raw image, image fusion of high quality can be performed for raw images captured by different cameras more flexibly.

[0021] In other implementations of the subject matter described herein, there is also provided a scheme for adjusting an exposure of the fused image. Such exposure adjustment is mainly for raw images captured at a lower exposure than a normal one. The reason to capture the raw images at a low exposure is that under-exposed raw images are more favorable for pixel alignment, noise cancellation, and/or prevention of unrecoverable overexposure of images. As mentioned above, if the raw images are captured at a low exposure, after image fusion, it is possible to continue to perform tone mapping to adjust the exposure of the obtained fused image, thereby generating a HDR image with a good luminance range. According to some implementations of the subject matter described herein, the exposure of the fused image can be adjusted with reference to a reference image having an expected exposure.

[0022] Some example implementations of the subject matter described herein can be described specifically with reference to the drawings. Example Environment

[0023] Basic principles and various example implementations of the subject matter described herein will now be described with reference to the drawings. Fig. 1 illustrates a block diagram of a computing environment 100 in which implementations of the subject matter described herein can be implemented. It would be appreciated that the computing environment 100 described in Fig. 1 is merely for illustration but not limiting the function and scope of the implementations of the subj ect matter described herein in any manner. As shown in Fig. 1, the computing environment 100 includes a computing device 100 in a form of a general -purpose computer device. The components of the computing device 100 include, but are not limited to, one or more processors or processing units 110, a memory 120, a storage device 130, one or more communication units 140, one or more input devices 150, and one or more output devices 160.

[0024] In some implementations, the computing device 100 may be implemented as various user terminals or service terminals. The service terminals may be servers, large- scale computer devices and other devices provided by various service providers. The user terminals, for example, are any type of mobile terminals, fixed terminals, or portable terminals, including mobile phones, multimedia computers, multimedia tablets, Internet nodes, communicators, desktop computers, laptop computers, notebook computers, netbook computers, tablet computers, personal communication system (PCS) devices, personal navigation devices, personal digital assistants (PDAs), audio/video players, digital camera/camcorders, positioning devices, television receivers, radio broadcast receivers, electronic book devices, game devices, or any combination thereof, including the accessories and peripherals of these devices or any combination thereof. It is also contemplated that the computing device 100 can support any type of interface to the user (such as "wearable" circuitry and the like.).

[0025] A processing unit 110 may be a physical or virtual processor and perform various processes based on programs stored in the memory 120. In a multi-processor system, a plurality of processing units execute computer-executable instructions in parallel to improve parallel processing capacity of the computing device 100. The processing unit 110 can also be referred to as a Central Processing Unit (CPU), a microprocessor, a controller, or a microcontroller.

[0026] The computing device 100 usually includes various computer storage media. Such media can be any available media accessible by the computing device 100, including but not limited to volatile and non-volatile media, and removable and non-removable media. The memory 120 can be a volatile memory (for example, a register, cache, Random Access Memory (RAM)), non-volatile memory (for example, a Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), flash memory), or any combination thereof. The memory 120 includes one or more program modules 122 configured to perform functions of various implementations described herein. The modules 122 can be accessed and run by the processing unit 110 to achieve the respective functions. The storage device 130 can be any removable or non-removable medium and may include machine-readable media which can be used for storing information and/or data and accessed in the computing device 100.

[0027] The communication unit 140 communicates with a further computing device via communication media. Additionally, functions of components in the computing device 100 can be implemented by a single computing cluster or multiple computing machines connected communicatively for communication. Therefore, the computing device 100 can be operated in a networking environment using a logical link with one or more other servers, personal computers (PCs) or another general network node. As required, the computing device 100 can also communicate via the communication unit 140 with one or more external devices (not shown) such as a storage device, display device and the like, one or more devices that enable users to interact with the computing device 100, or any devices that enable the computing device 100 to communicate with one or more other computing devices (for example, a network card, modem, and the like). Such communication may be achieved via an input/output (I/O) interface (not shown).

[0028] The input device 150 may include one or more input devices, such as a mouse, keyboard, touch screen, tracking ball, voice-input device, and the like. Particularly, the input device 150 includes a camera 152 which is configured to capture one or more images automatically or according to user instructions. The output device 160 can be one or more output devices, such as a display, loudspeaker, printer, and the like. The images captured by the camera 152 can be outputted directly by the output device 160 or transmitted to other devices via the communication device 140.

[0029] In some implementations, images captured by the camera 152 can be further processed in the computing device 100. For example, in the implementation of HDR imaging, the camera 152 can capture a plurality of raw images (for example, 102-1, 102- 2, ... , 102-N and so on, collectively referred to as raw image 102) of the same scene within a short period of time and use these images as inputs of the module 122. The sizes of the plurality of raw images 102 are the same or similar. The camera 152 can capture the plurality of raw images 102 in a burst mode. The number of the raw images depends on the default configuration or user configuration of the camera 152. In the example shown in Fig. 1, the number is N=3. However, it would be appreciated that the camera 152 can capture more or less (for example, 2) raw images. The module 122 performs HDR imaging function for raw image 102 to obtain HDR image 104. In some implementations, the module 122 provides HDR image 104 to the output unit 160 for output.

[0030] Fig. 2 illustrates an example of the module 122 for HDR imaging according to some implementations of the subject matter described herein. The module 122 may include an image fusion stage 210 to obtain a plurality of raw images 102 from the camera 152 and perform image fusion on these raw images to generate a fused image 212. In some implementations, the module 122 may also include a tone mapping stage 220 to perform tone mapping on the fused image 212 to adjust its exposure. The tone mapping stage 220 outputs a tone-mapped HDR image 104. The cases where tone mapping is required may be when the camera 152 captures the raw images 102 at a low exposure. For example, after selecting a predetermined exposure (which may also be referred to as a normal exposure) for a particular scene automatically or by the user, the camera 152 captures images with an exposure lower than the predetermined exposure (for example, a low exposure value of 1.0, 1.5, or 2.0). Since the exposure of the raw images 102 is low, exposure adjustment of the fused image 212 may be needed. In some other implementations, the fused image 212 may be the final HDR image 104. In this case, the tone mapping stage 220 can be omitted.

[0031] It would be appreciated that the images 102, 104, and 212 in Figs. 1 and 2 are given only for the purpose of illustration. The images captured by the camera 152 can be different depending on the particular scene. In some implementations, the raw images 102 may not be captured by the camera 152 but obtained from other sources via the input device 150 or the communication device 140. In these implementations, the computing device 100 may not include the camera 152. In the implementations of the subject matter described herein, a "raw image" refers to a scene image before fusion, which can be an image obtained directly from a camera or after some imaging processing. There is no limitation on the format of the raw images 102, which may be any compressed or non- compressed image format, including but not limited to a RAW format, JPEG format, TIFF format, BMP format, and the like. The example implementations of the image fusion stage 210 and the tone mapping stage 220 in the module 122 will be discussed in detail in the following. Image Fusion

[0032] Fig. 3 illustrates a block diagram of an example implementation of the image fusion stage 210 shown in Fig. 2. The main purpose of image fusion is to select favorable pixels and discard unfavorable noise pixels from a plurality of raw images with relatively more noise, which can help reduce noise and avoid "false image" caused by camera movement or object movements in the scene during image capturing, thereby producing a clear fused image. To achieve the purpose of image fusion, the image fusion stage 210 includes a noise pixel identification module 320 to identify noise pixels to be discarded during the image fusion. The image fusion stage 210 may also include a fusion module 330 to fuse the raw images 102 from which the noise pixels have been excluded. In some implementations, to better implement the image fusion, the image fusion stage 210 may further include a multi-image alignment module 310 to align the plurality of raw images 102 before discarding the noise pixels. A detailed description of functions achieved in the respective modules of the image fusion stage 210 will be provided below.

Multi-image Alignment

[0033] The multi-image alignment module 310 can align each of the raw images 102 to a same reference image. Image alignment can reduce the impact of camera movement or object movement during the capturing of the plurality of raw images 102 on the image fusion. Such impact is more apparent in the case where the raw images 102 are captured at different exposures. In some implementations, the multi -image alignment module 310 can select a raw image from the plurality of raw images 102 as a reference image.

[0034] The reference image can be selected randomly. To reduce the impact of movement caused by the press or touch on the camera shutter by the user initially or the appearance or disappearance of objects in the scene, the image(s) captured early or late among the plurality of raw images 102 may be avoided to be select as the reference image. In an implementation, the second captured image of the plurality of raw images 102 can be selected as the reference image (such as the raw image 102-2).

[0035] In some other implementations, another image than the plurality of raw images 102 can be selected as a reference image. For example, another image of the same scene that is collected separately can be used as the reference image. The size of the reference image can be the same as the raw images 102 or can be obtained by downsizing or upsampling an original reference image with a size larger or smaller than that of the raw images. For example, the original reference image and the raw images 102 are captured with different sizes. The original reference image is then scaled down to the same size as the raw images to generate the reference image.

[0036] Various image aligning methods, which are currently known or to be developed in the future, can be employed to achieve the alignment of the plurality of raw images 102 to the same reference image. One of the alignment methods based on a homography matrix is introduced below. Fig. 4 illustrates a schematic diagram of aligning raw images to a reference image. In Fig. 4, each of the raw image 102 is represented as Fi, where /^'={ 1, N} and N^2, and the reference image 410 is represented as F_r. For each raw image F_t (other than the one used as the reference image), camera movement from the raw image F_t to reference image F_r is first estimated, which can be obtained by determining a whole homography matrix H^ based on the raw image j and reference image F_r . Then the raw image F_t is warped according to the homography matrix H^ . For example, a mapping from the reference image F_r to the raw image j may be calculated by multiplying the homography matrix H^ by the coordinates of respective pixels of the reference image F_r.

[0037] Alternatively, to save the calculation overhead, it is also possible to divide the reference image F_r into a plurality of blocks (for example, blocks of 8 x 8 pixels) 402. The raw image F_r may be similarly divided into a plurality of blocks 404 with the same size. Then, a central pixel p of each block 402 of the reference image F_r is multiplied with the homography matrix Hi to determine a corresponding pixel H_t Xp for this pixel p mapped in the raw image F_t. Thus, a translation vector 412 for the block 402 is calculated as Δρ = Hi x p— p. The translation vector 412 can be used to wrap the block 404 in the raw image Fj. Similar mapping and wrapping is implemented for each of the blocks in the raw image F_t and the reference image F_r, to achieve alignment from the raw image F_t to the reference image F_r.

Identification of Noise Pixel

[0038] The noise pixel identification module 320 determines noise pixels in the plurality of raw images 102 (which probably have been aligned). To identify the noise pixels, a pixel threshold for each of raw images 102 is determined. The determination of the pixel thresholds is dependent on a reference image. The reference image used by the noise identification module 320 has the same size as the raw images. Moreover, the reference image can be the same as that used in the image alignment or may be selected or generated in a similar way (for example, selected as one of the plurality of raw images 102 or obtained by scaling an original reference image with a different size). [0039] The noise pixel identification module 320 may determine the pixel thresholds depending on the specific fusion methods employed in the fusion module 330. Generally, the fusion module 330 performs image fusion in the original size of the raw images 102. Therefore, the determination of the pixel thresholds can be performed on the original resolution of the raw images.

[0040] According to the implementations of the subj ect matter described herein, for each of the raw images 102, the noise pixel identification module 320 determines pixel differences between corresponding pixels of the raw image 102 and the reference image. In the context of the subject matter described herein, "corresponding pixels" of two images refers to two pixels having the same coordinate in the two-dimensional space x-y of the two images. As the raw image 102 and the reference image have the same size, each pixel in the raw image 102 corresponds to one pixel in the reference image. Therefore, the pixel differences between all corresponding pixels of each raw image 102 of the reference image can be determined. These pixel differences can form a difference map.

[0041] In some implementations, a pixel difference between two pixels can be calculated as a difference between values of the pixels. The values of the pixels can be determined by a color space of the images. Examples of the color space include, but are not limited to, RGB, LAB, HSL, HSV, and the like. If a pixel includes a set of values, then the pixel difference between two pixels can be calculated as a distance between the two sets of values, such as the Euclidean distance. It is supposed that the value(s) of a pixel p in a raw image 102 is represented as F^p) and the value(s) of a corresponding pixel p in the reference image F_r is represented as F_r(p), then a pixel difference between the two pixels can be represented as |Fi(p) — F_r(p) | , where the operational symbol | | calculates a difference between two parameters. The value(s) of a pixel p in a difference map D formed by the pixel differences between the corresponding pixels p of the raw image F_t and reference image F_r is Z)(p) = |Fi(p) — F_r(p) | .

[0042] In some implementations, the noise pixel identification module 320 can determine a pixel threshold for a raw image 102 based on a distribution of at least a portion of the pixel differences between the raw image 102 and the reference image. A distribution represents a statistical variation of values of a plurality of pixel differences. In an implementation, statistics of the values of the pixel differences, such as statistics of the possible different values from the minimum pixel difference to the maximum pixel difference may be obtained. The noise pixel identification module 320 sets the pixel threshold as a value that is larger than a predetermined percentage (for example, 80%, 90% or 95%) of the pixel differences. In other words, the setting of the pixel threshold enables at least a part (for example, 20%, 10%, or 5%) of the corresponding raw image 102 to be identified as noise pixels.

[0043] In some other implementations, since there are some pixels that are not expected to be considered in each raw image 102 or reference image, the pixel difference(s) calculated from on these pixels belong to an outlier pixel difference(s) and thus is not suitable for the determining of the pixel threshold. In these implementations, the noise pixel identification module 320 can further select an outlier pixel difference(s) from the determined pixel differences and determine the pixel threshold based on the distribution of remaining pixel differences other than the outlier pixel difference(s). For example, statistics of the values of the remaining pixel differences can be obtained, and the pixel threshold is set as a value that is larger than a predetermined percentage (for example, 80%, 90%, or 95%) of the remaining pixel differences.

[0044] Since over-exposed pixels cannot give provide more details of the captured object, pixel differences related to one or more over-exposed pixels in the raw image 102 or reference image can be excluded from the image fusion. During the process of excluding the outlier pixel difference(s) caused by overexposure, the noise pixel identification module 320 can determine whether a given pixel difference is an outlier pixel difference based on luminance of a respective pixel of the raw image or reference image . If the luminance at a certain pixel is too high (for example, exceeding a predetermined luminance threshold), a pixel difference determined from this pixel is an outlier pixel difference. The luminance at a pixel can be determined based on the values of this pixel, for example, the values in a specific color space.

[0045] Alternatively, or in addition, the noise pixel identification module 320 can also determine an outlier pixel difference(s) based on the values of the pixel differences. If the value of a certain pixel difference is too high, it shows that the difference between values of the corresponding pixels of the raw image 102 and reference image is too large (for example, exceeding a predetermined difference threshold). This means that there may be a flashing object in either the raw image 102 or the reference image or there is a sensing error of the camera sensor at the position of the pixel. Therefore, the pixel difference calculated from these pixels can be discarded.

[0046] In some other implementations, since it is difficult to align edges of an object captured in the image during the image fusion, it is expected that pixels representing object edges are regarded as noise pixels and excluded from the image fusion. Therefore, the noise pixel identification module 320 may identify pixel differences corresponding to the pixels of the object edges in the raw image 102 as outlier pixel differences. As pixels in the region of the object edges generally differ significantly from adjacent pixels in a certain direction, it is possible to determine a variation between a certain pixel of the raw image 102 and adjacent pixels and identify the pixel difference calculated from this pixel as an outlier pixel difference when the variation is large. The variation can also be calculated as a gradient of the raw image 102 at the pixel towards a certain direction in the two- dimensional space (x-y space) of the raw image. If the gradient exceeds a predetermined variation threshold, the corresponding pixel difference is determined as an outlier pixel difference. Other parameters can also be used to represent a variation from a value of one pixel to values of its one or more adjacent pixels.

[0047] Some examples of selecting outlier pixel differences from pixel differences between the raw image 102 and the reference image are provided above. In some implementations, an outlier pixel difference can also be selected from all the outlier pixels by calculating a mask M for the difference map D between the raw image 102 F_t and reference image F_r. According to the above example, the mask (p) at a respective pixel p can be determined as: r 0, if luma( i(p)) > σ _ovg or luma(F_r(p)) > σ _ovg

where luma( ) represents luminance of a respective pixel of an image, for example, luma(Fj(p)) represents luminance of the raw image j at a pixel p;

F_r(p) | represents a pixel difference between the raw image F_t and reference image F_r at corresponding pixels p; grad x( ) and grad_y{ ) represent gradients at a respective pixel P of an image in the x direction and y direction; and min ( ) represents taking the minimum value between grad x( ) and grad_y{ ). a_ove , a_out and o_edge represent a predetermined luminance threshold, a predetermined difference threshold, and a predetermined variation threshold, respectively. These thresholds can be set to particular values based on experience, for example, σ_ονβ=220, a_out=15, and cr_ed5e=24 (supposing that the maximum value of a pixel is 256 and the maximum value of luminance is 240). Of course, it is only a specific example and these thresholds can be set to other values as needed.

[0048] It can be seen from Equation (1) that (p)=0 means that the luminance at a respective pixel p of the raw image F_t or reference image F_r is too large, the pixel difference is too large, or the variation between the pixel p and the adjacent pixels in the raw image Fj is too large. Then, the corresponding pixel difference D(p) is considered as an outlier pixel difference. If (p)=l, the corresponding pixel difference D(p) can be taken into account in determining the pixel threshold. It would be appreciated that in some implementations, it is also possible to select only one or two of the three conditions to determine the noise pixels. Furthermore, other conditions can also be set to determine if a given pixel in the raw image is a noise pixel.

[0049] The noise pixel identification module 320 can determine a pixel threshold for each of the raw images 102 based on the above process. The pixel threshold can be used to filter noise pixels from each of the raw images 102. Specifically, the noise pixel identification module 320 compares each of the pixel differences with the pixel threshold. If the pixel difference exceeds the pixel threshold, the corresponding pixel in the raw image 102 is identified as a noise pixel. If the pixel difference does not exceed the pixel threshold, the corresponding pixel can be used for the image fusion.

Direct Average Fusion

[0050] The fusion module 330 can perform image fusion based on remaining pixels other than the noise pixels in the plurality of raw images. The fusion can be implemented across a plurality of images through multiple fusion methods. A simple image fusion method is to, for corresponding pixel coordinates of the plurality of raw images, average the remaining pixels other than the noise pixels across the plurality of raw images. The value F_d(p) of the fused image 212 (represented as F_d) at a pixel p can be determined as follows:

where N represents the number of raw images 102 and o_t represents the pixel threshold for the raw image F_t . It can be seen from Equation (2) that if a pixel difference — i - p) I between the raw image F_t and reference image F_r at corresponding pixels p is smaller than the pixel threshold a the value of the pixel in the raw image Fj can be used for averaging with other raw images. If pixel differences for two of the three raw images 102 to the reference image at corresponding pixels p are smaller than the pixel threshold a_t, the values of the pixels p in the two raw images are averaged to obtain a value of the fused image 212 at a pixel p.

Pyramid Fusion in Original Size

[0051] Some outlier values of different sizes in the raw image 102 may be difficult to be removed in the pixel-wise average fusion described above. Additionally, the average fusion could possibly result in an unsmooth transition between pixels or between blocks of the fused image 212. To improve the quality of the fused image 212 (such as to remove the outlier values and/or to obtain smoothness), in some implementations, the fusion module 330 can employ other technologies that can achieve smooth image fusion, for example, pyramid fusion such as Gaussian pyramid fusion or Laplacian pyramid fusion. The plurality of raw images 102 can be fused through Gaussian pyramid fusion and Laplacian pyramid fusion technologies that are currently known or to be varied in the future.

[0052] A brief introduction of the process of Gaussian pyramid fusion and Laplacian pyramid fusion is provided below. In the process of Gaussian pyramid fusion, for each of the raw images 102, a set of intermediate raw images with different sizes can be generated by continuous filtering and downsampling. These intermediate raw images form a pyramid structure, with each layer corresponding to an intermediate raw image of a size. In some implementations, the sizes of the intermediate raw images in every two layers decrease at a 2x rate.

[0053] During the fusion, an intermediate fused image can be determined from direct average fusion of intermediate raw images with the same size in the pyramid structures of the plurality of raw images 102 . The process of generating the intermediate fused image is similar to the direct average fusion process of the plurality of raw images 102 described above. The intermediate fused images (still in a pyramid structure) from the plurality of layers of the pyramid structures are then used to reconstruct the fused image. The process of Laplacian pyramid fusion is similar to that of Gaussian pyramid fusion, and the difference only lies in the generation of a Laplacian pyramid for each raw image 102 and the reconstruction of the fused image. The number of layers of a Gaussian pyramid and Laplacian pyramid can both be predefined, such as 2, 3, 4, 5, or more.

[0054] During the process of pyramid fusion, since each raw image will be converted to intermediate raw images of different sizes and the average fusion is respectively performed at different sizes, in some implementations, the noise pixel identification module 320 can determine a respective pixel threshold for an intermediate raw image at each layer of each pyramid structure and identify a noise pixel(s) in the intermediate raw image based on the pixel threshold. Supposed that a c-layer pyramid structure is generated for a given raw image F_t, intermediate raw images for respective layers

[0055] A respective intermediate pixel threshold

ach intermediate raw image F- . The calculation of the intermediate pixel threshold

can be similar to the process of determining a pixel threshold σ_¾ in the original size of the raw images 102, and thus is omitted here. It is to be noted that to determine the intermediate pixel threshold

it is possible to utilize a similar process to that of the raw images to generate a similar pyramid structure for the reference image so as to calculate an intermediate pixel threshold

based on one of the intermediate reference images having a same size as the intermediate raw image.

[0056] As described above, the intermediate fused image for each layer can be obtained by the average fusion of the intermediate raw images at the respective layers of the pyramids. It is similar to the direct fusion process of the raw images 102 as discussed above, which can be represented as follows:

where F^(p) represents a pixel p of the fused image F^ at the /-th layer of a pyramid; F (p) represents a pixel p oi an intermediate raw image F at the /-th layer of the pyramid for the raw image F^; F_r' (p) represents a pixel p oi the intermediate raw image F_r ^J at the /-th layer of the pyramid for the reference image F_r; and

represents a pixel threshold for the intermediate raw image

at the /-th layer. According to Equation (3), when the pixel difference between the intermediate raw image

and the intermediate reference image F_r ^J is smaller than the pixel threshold σ- , the pixel p of the intermediate raw image F is used for fusion. The intermediate fused images for respective layers of the pyramid structures of the plurality of raw images 102 are used to generate the fused image 212.

Hybrid Fusion

[0057] It has been described above the construction of a pyramid structure on the basis of the original size of the raw images 102 to perform the fusion. In some other implementations, to reduce computational costs and improve processing speed, the fusion module 330 may not start the pyramid fusion on the basis of the original size of the raw images 102, but only implement the simple average fusion described above in the original size of the raw images 102. Then, the raw images 102 and the reference image are downsized to a predetermined size and the pyramid fusion is performed on the basis of this predetermined size. The finally fused image 212 is determined on the basis of the result obtained from these two types of fusion. This hybrid fusion can not only enable the fused image 212 to be smooth but also achieve fast processing, which is suitable to be implemented by terminals with limited processing capability, such as a smartphone, a camera, or the like.

[0058] Fig. 5 is a schematic diagram of the hybrid fusion where a first image fusion layer 501 performs the average fusion in the original size of the raw images 102 and a second image fusion layer 502 performs the pyramid fusion in the downsized size. In the first image fusion layer 501, the noise pixel identification module 320 in the image fusion stage 210 determines respective pixel thresholds of the plurality of raw images (102-1, . . . , 102-N) according to the direct average fusion process discussed above, and after identifying the noise pixels, the fusion module 330 averages the remaining pixels across these raw images to generate a first intermediate fused image 518.

[0059] In the second image fusion layer 502, each raw image 102-1, 102-2, 102-N (represented as Fi) is scaled down to generate a corresponding thumbnail image 520, 522, 524 (represented as j I). In some implementations, each of the raw images 102 can be scaled down to 1/2, 1/4 or 1/16 of its original size. Then, a plurality of thumbnail raw images can be implemented through the pyramid fusion, such as the Gaussian pyramid fusion or Laplacian pyramid fusion. Fig. 5 illustrates an example of Laplacian pyramid fusion.

[0060] In this example, a three-layered Laplacian pyramid structure 504 is constructed for each thumbnail raw image 520, 522, 524, each pyramid structure including a set of intermediate thumbnail images with different sizes. For example, for the thumbnail raw image 520, the following images can be generated: an intermediate thumbnail image 530 with the same size as the image 520, an intermediate thumbnail image 540 with a size half as that of the image 520, and an intermediate thumbnail image 550 with a size 1/4 as that of image 520. For the raw images 522 and 524, intermediate thumbnail images with the same three sizes can also be generated, including the intermediate thumbnail images 532 to 552 and intermediate thumbnail images 534 to 554. In other implementations, a pyramid structure with more or less layers can also be constructed for each thumbnail raw image.

[0061] Similar to the pyramid fusion process discussed above with regard to the raw images 102, in the second image fusion layer 502, the noise pixel identification module 320 can be used to determine a corresponding intermediate pixel threshold for each of the intermediate thumbnail images at different layers of the pyramid structure so as to identify the noise pixel(s) therefrom. During the fusion, the fusion module 330 can be used to generate a fusion result 538, 548, or 558 for each layer of the pyramid based on the above Equation (3). These fusion results can be used to reconstruct a second intermediate fused image 528 for the second fusion layer 502. The second intermediate fused image 528 has the same size as the thumbnail raw images 520, 522 and 524.

[0062] The fusion module 330 determines the fused image 212 for the plurality of raw images 102 based on the first and second intermediate fused images 518 and 528. In some implementations, as the first and second intermediate fused images 518 and 528 have different sizes, a quasi-Gaussian or quasi-Laplacian pyramid fusion method can be employed to achieve the fusion of these two images with the different sizes. Specifically, images with different sizes can be generated from the first intermediate fused image 518 in the original size and form a pyramid structure. Then, the image in the pyramid structure with the same size as the second intermediate fused image 528 is replaced with the second intermediate fused image 528. For example, if the size of the second intermediate fused image 528 is 1/16 of the size of the raw image 102 and the resolutions of the layers in the pyramid structure are decreased at a 2x rate, the second intermediate fused image 528 can be used to replace the image at the third layer from the bottom up in the pyramid structure. After the replacement, the fused image 212 is generated from the pyramid fusion by the conventional reconstruction method.

[0063] In some implementations, due to the pyramid fusion in the hybrid fusion (that is, in the second fusion layer 502), intermediate pixel thresholds for the raw images 102 at different sizes can be obtained. These intermediate pixel thresholds can be further used to guide the identification of noise pixels at the raw images 102. In other words, for a raw image F it is possible to determine a noise pixel not only based on its pixel threshold but also based on an intermediate pixel threshold for a certain intermediate thumbnail image generated from the thumbnail raw image F_t i that is corresponding to the raw image F_t. In some implementations, if a pixel difference between corresponding pixels of the raw image F_t and the reference image exceeds the pixel threshold then it is determined whether to identify the corresponding pixel in the raw image F_t as a noise pixel further based on the intermediate pixel threshold for the intermediate thumbnail image. [0064] Specifically, supposing that an intermediate thumbnail image is represented as F I, which represents the intermediate thumbnail image at the /-th layer in the pyramid structure generated from the thumbnail raw image F; I and its intermediate pixel threshold can be represented as σ- I. For a given pixel in the given raw image F^ its corresponding pixel in the intermediate thumbnail image F- i can be first determined. For example, if the size of the intermediate thumbnail image F- i is 1/4 of that of the raw image F for a given pixel pi in the given raw image F the coordinate values of its corresponding pixel i in the intermediate thumbnail F- i is 1/4 of the coordinate values of the pixel pi (in a coordinate representation in a two-dimensional x-y space of an image).

[0065] After determining the corresponding pixel pi in the intermediate thumbnail image F I, a pixel difference between the pixel pi and its corresponding pixel pi in the intermediate reference image F_r ^J I with the same size can be calculated. If the pixel difference is smaller than the corresponding intermediate pixel threshold

I, the pixel pi in the raw image F_j is not a noise pixel. If the pixel difference related to the pixel pi exceeds the intermediate pixel threshold

i or the pixel difference related to the pixel i exceeds the pixel threshold o then the pixel pi in the raw image F_j is determined as a noise pixel. In some implementations, the intermediate pixel threshold

i of the intermediate thumbnail image F I at any layer of the pyramid structure for the thumbnail image F can be selected to guide the determination of the noise pixel of the raw image F^.

[0066] An example of identifying whether a pixel pi in the raw image F; is a noise pixel based on the two thresholds can be represented as follows:

where when w^l, it means that the pixel pi in the raw image F_t is not a noise pixel; when Wi=0, it indicates that the pixel pi in the raw image F; is a noise pixel.

[0067] The implementations of various mage fusions in the image fusion stage 210 have been discussed above. As mentioned, in some cases, the fused image 212 output by the image fusion stage 210 can be considered as a HDR image. In some other cases, if the raw images 102 are captured at a low exposure for better alignment and de-noising of the images, the fused image 212 can be further processed (such as by performing tone mapping) to obtain a HDR image with a greater range of luminance. Tone Mapping

[0068] Fig. 6 is a detailed block diagram illustrating the tone mapping stage 220 of Fig. 2. The main target of tone mapping is to adjust or correct the exposure of the fused image 212 output by the image fusion stage 210. As shown in Fig. 6, the tone mapping stage 220 includes an exposure adjustment module 610 to adjust the exposure of the fused image 212 based on a reference image 602 with a predefined exposure, to obtain an adjusted image 612. In addition, the tone mapping stage 220 further includes an exposure fusion module 620 to generate a FIDR image 104 based on the adjusted image 612.

[0069] The reference image 602 used in the tone mapping process may be different from the reference image used in the image fusion process. In some implementations, the reference image 602 may be a preview image of the scene that is identical to the scene of the raw images 102, which is captured by the camera 152 before the plurality of raw images 102 are captured. The exposure of the preview image can be an exposure adjusted automatically by the camera 152 based on lights and focus areas of the scene or an exposure set by user. This exposure is higher than that used in capturing the raw images 102 and thus can present a better global exposure condition of the scene. As the exposure of the preview image is confirmed by the user, by adjusting the fused image 212 based on the exposure of the preview image, the global exposure of the generated HDR image 104 can satisfy the user.

[0070] The preview image 602 can be captured and stored by the camera 152 automatically but has a size smaller than that of the raw images 102 captured normally by the camera, which size is thus also smaller than that of the fused image 212. To perform the exposure adjustment, the exposure adjustment module 610 may first change the size of the fused image 212 as identical to that of the preview image 602. Alternatively, the reference image 602 may be an image of the same scene as the raw images 102 (for example, an image obtained before or after the raw image 102 is captured) that is captured with a predefined exposure (for example, an exposure adjusted automatically) by the camera 152. In this case, the reference image 602 has the same size as that of the raw images 102 (and thus as the fused image 212), and there is no need to perform downsizing of the fused image 212. Of course, other images with a global or partial exposure that can be used to guide the scene of the raw images 102 can also be used as the reference image 602 and the fused image 212 can be scaled into the same size as that of the reference image 602 as needed.

[0071] To enable using the reference image 602 to change the exposure of the fused image 212 or the scaled fused image 212 in a right way, in some implementations, the fused image 212 can be aligned to the reference image 602. The multi-image aligning method described above in the process of image fusion can be employed to align the two images. In some other implementations, compared with the image alignment during the image fusion, not high accuracy is required for the alignment of the reference image 602 with the fused image 212 in the tone mapping. Some simple image alignment methods can also be utilized to align the fused image 212 to the reference image 602.

[0072] After the reference image 602 and the fused image 212 are aligned, the exposure adjustment module 610 may adjust the exposure of the fused image 212 as similar to that of the reference image 602, which can be achieved by, for example, a histogram equalization method. Specifically, the exposure adjustment module 610 may adjust the values of some pixels in the fused image 212 based on the reference image 602. In some implementations, since the reference image 602 and the fused image 212 present the scene at different moments, the exposure adjustment module 610 may need to process inconsistent pixels in the two images. The exposure adjustment module 610 may determine pixel differences between corresponding pixels of the fused image 212 and the reference image 602 (such as the Euclidean distance between values of the pixels) and compare the pixel differences with a predetermined difference threshold. If a pixel difference is lower than the predetermined difference threshold, then the pixel of the reference image 602 is used to replace the corresponding pixel in the fused image 212. If a pixel difference is larger than the predetermined difference threshold, the pixel of the fused image 212 remains. This process can be represented as follows:

R M = ί ^Λ°^(Ρ)' ^{if |} '^{d 1 (P) " Λο (ρ) Ι <} °^out (5) 1^W \F'_d i (p), if \F'_d i (p) - R₀(p) \≥ a_out

where R₀(p) represents a pixel p of the reference image 602; F'_d i (p) represents a pixel p of the fused image 212 that has been downsized and aligned with the reference image 602; R^p represents the adjusted image after replacement of the pixels; and °out represents a predetermined difference threshold. The predetermined difference threshold a_out may be configured as any value, for example, 10, 15, 20 (supposing that the highest value of a pixel is 256) based on experience, so as to exclude inconsistent outlier pixels caused by camera movement or object movement.

[0073] In some implementations, since some pixels are replaced by pixels of the reference image 602, there may be some over-exposed pixels in the image R^p) after filtering the outlier values. In this case, the exposure adjustment module 610 may adjust the luminance of some pixels in the image R^p) based on the underexposed fused image 212. Specifically, the exposure adjustment module 610 may adjust pixels (such as over-exposed pixels) with high luminance (for example, higher than a predetermined luminance threshold) in the image R^p), for example, by performing smooth processing on these pixels. Over-exposed pixels can be further adjusted. For example, the value of a given pixel in the fused image 212 may be weighted with the value of a given pixel in the image R^p to obtain a new pixel value, which can be expressed as follows:

R₂(p = (1 - a) x RM + a x F_d i (p) (6) where F_d i (p) represents a pixel p of the fused image 212 that has been scaled down (but not aligned with the reference image 602, that is, not affected by the reference image 602) and represents a weight ranging from 0 to 1.

[0074] In an implementation, the weight a for linear weighting can be any predetermined value ranging from 0 to 1. In some other implementations, to obtain smoother transition for the over-exposed pixels, a can be determined by a smooth step function to limit a to only smooth the over-exposed pixels with greater luminance in the image R^p). The smooth step function of a can be represented as follows:

where Ziima(i?₁(p)) represents the luminance of the image R^p at a pixel p, a and b can be determined as larger luminance values and b is larger than a, for example, a =200 and b =220 (supposing that the highest value of luminance is 240). Of course, a and b may also be set as other luminance values. The smoothstepQ function in Equation (7) represents that when R^p is smaller than a, a = 0; when i?₁(p) is larger than b, a = 1; when R^p) is a specific value between a and b, a may be a value between 0 and 1 and this value can be determined by the specific value of Ri(p)- The more R^p is approximating to b, the closer a is to l . This setting of a makes it possible to only smooth the over-exposed pixels.

[0075] In some other implementations, the exposure adjustment module 610 may alternatively or additionally perform further exposure correction on the image ^( ) or thi?₂(p) with the adjusted exposure. For example, details of a dark region or light region in the image may be further enhanced with various automatic exposure correction techniques that are currently known or to be developed in the future. The exposure adjustment module 610 outputs the adjusted image 612. As the exposure adjustment module 610 performs adjustment for the fused image 212 in a pixel-wise manner, the adjusted image 612 can present a good global exposure. However, since the smoothness between some pixels or blocks may not be good enough, further optimization may be performed in the exposure fusion module 620 to obtain an image of higher quality.

[0076] The exposure fusion module 620 may process the fused image 212 based on the adjusted image 612. In some implementations, the exposure fusion module 620 may determine a luminance weight map for respective pixels in the fused image 212 by comparing the luminance of the adjusted image 612 with that of the fused image 212. In the case that the adjusted image 612 has a different size than the fused image 212 (for example, the size of the adjusted image 612 is smaller than that of the fused image 212), the adjusted image 612 can be scaled to be consistent with the fused image 212. For each pixel of the fused image 212, a luminance weight can be determined by comparing the luminance of the scaled adjusted image (represented as 612') and the fused image 212 at the corresponding pixels, which can be represented as:

W p) = luma(R₃ T (p))/luma(F_d(p)) (⁸) where F_d(p) represents a pixel p of the original fused image 212 (the fused image received from the image fusion stage 210); R₃ T (p) represents a pixel p of the adjusted image after scaling (for example, scaling up) to the same size as the fused image F_d(p); luma( ) represents the luminance at the pixel R₃ T (p) or F_d(p); and V (p) represents the value of the luminance weight map W at the pixel p.

[0077] The exposure fusion module 620 may generate a HDR image 104 by fusing the luminance weight map W and the fused image F_d 212. In some implementations, the pixel p of the HDR image 104 is determined by multiplying W(p) with the value F_d(p) of the corresponding pixel p in the fused image 212. As such simple fusion may lead to some error such as a gap in the image, in some other implementations, the exposure fusion module 620 may achieve the fusion of the luminance weight map W and the fused image F_d by means of pyramid fusion such that the luminance weights can be applied to the fused image at different sizes.

[0078] Fig. 7 illustrates an implementation of such pyramid fusion. As shown, a set of intermediate fused images 720, 730 and 740 with different sizes are generated from the fused image F_d 212 and form a pyramid structure (such as a Laplacian or Gaussian pyramid). A set of intermediate luminance weight maps 722, 732 and 742 with the same sizes as the intermediate fused images 720, 730 and 740 are generated from the luminance weight map W 712. To preserve the luminance weights in the luminance weight map, the luminance weight map W 712 can be constructed as a Gaussian pyramid instead of a Laplacian pyramid.

[0079] The exposure fusion module 620 may multiply intermediate fused images with intermediate luminance weight maps with the same sizes in the two pyramids, for example, may calculate a product of the corresponding pixel values, to generate respective intermediate fused images 724, 734 and 744. The fusion of a Laplacian pyramid and a Gaussian pyramid generates a Laplacian pyramid. In other words, the intermediate fused images 724, 734 and 744 form another Laplacian pyramid. Therefore, Laplacian pyramid reconstruction can be applied to the intermediate fused images 724, 734 and 744 to generate the HDR image 104.

[0080] The tone mapping based on a reference image with a predetermined exposure performed in the tone mapping stage 220 has been described above. In some other implementations, other approaches can also be used to adjust the exposure of the fused image 212 to optimize the underexposure of the fused image 212. For example, a global exposure of the fused image 212 can be increased by a predetermined amount. Alternatively, or in addition, proper exposures can be analyzed for different scenes or objects by means of machine learning and the like, such that different exposure adjustments may be applied to different regions of the fused image 212 (the dark region, the light region, and the like). The scope of the subject matter described herein is not limited in this regard, as long as the exposure of the underexposed fused image 212 can be enhanced to a proper level. In other implementations, it is also possible that exposure adjustment is applied to the fused image 212; instead, other suitable processing can be used to obtain the corresponding HDR image.

Example Processes

[0081] Fig. 8 illustrates a flowchart of an image fusion process 800 in accordance with some implementations of the subject matter described herein. The process 800 may be implemented with the computer device 100, for example, as the module 122 in the memory 120 of the computer device 100.

[0082] At 810, the computer device 100 obtains a plurality of raw images and a first reference image of a scene, the plurality of raw images and the first reference image having a same size. The plurality of raw images may be captured by the camera 152 of the computer device 100 on the scene or obtained by the input unit 150 or communication unit 140 from other sources. In some implementations, the exposures of the raw images may be the same and may be lower than a predetermined exposure of the camera selected by the user. In some implementations, one of the plurality of raw images may be selected as the first reference image.

[0083] At 802, the computer device 100 fuses the plurality of raw images based on the first reference image to obtain a fused image. The fusion includes, for each of the plurality of raw images: determining pixel differences between corresponding pixels of the raw image and the first reference image, determining a pixel threshold for the raw image based on a distribution of at least a portion of the pixel differences, and identifying a noise pixel of the raw image to be excluded from the fusing by comparing the pixel differences with the pixel threshold.

[0084] In some implementations, determining pixel threshold may include selecting an outlier pixel difference from the pixel differences based on at least one of the following: luminance of the raw image at a respective pixel, luminance of the first reference image at a respective pixel, values of the pixel differences, and a variation between a respective pixel and adjacent pixels of the raw image; and determining the pixel threshold based on the distribution of remaining pixel differences other than the outlier pixel difference.

[0085] In some implementations, the fusion of the raw images may include generating a first intermediate fused image by averaging remaining pixels other than the noise pixel across the plurality of raw images; downsizing the plurality of raw images to generate a plurality of thumbnail raw images; downsizing the first reference image to generate a thumbnail reference image; fusing the plurality of thumbnail raw images based on the thumbnail reference image to generate a second intermediate fused image; and generating the fused image based on the first intermediate fused image and the second intermediate fused image.

[0086] In some implementations, fusing the plurality of thumbnail raw images may include generating a set of intermediate reference images with different sizes from the thumbnail reference images; and for each of the plurality of thumbnail raw images: generating a set of intermediate thumbnail images with different sizes from the thumbnail raw image, and fusing the intermediate thumbnail images based on the intermediate reference images for generating of the second intermediate fused image. [0087] In some implementations, fusing the intermediate thumbnail images may include for each of the intermediate thumbnail images: determining intermediate pixel differences between corresponding pixels of the intermediate thumbnail image and the intermediate reference image in a same size, determining an intermediate pixel threshold for the intermediate thumbnail image based on a distribution of at least a portion of the intermediate pixel differences, and identifying a noise pixel of the intermediate thumbnail image to be excluded from the fusing of the intermediate thumbnail images by comparing the intermediate pixel differences and the intermediate pixel threshold.

[0088] In some implementations, identifying the noise pixel of the raw image may include: for a first pixel in the raw image, determining a second pixel corresponding to the first pixel from a given intermediate thumbnail image generated from the thumbnail raw image corresponding to the raw image; determining a pixel difference between the second pixel and a corresponding pixel of a given intermediate reference image of the intermediate reference images, the given intermediate reference image having a same size as the given intermediate thumbnail image; and in response to the pixel difference between the second pixel and the corresponding pixel exceeding the intermediate pixel threshold, identifying the first pixel of the raw images as a noise pixel.

[0089] Fig. 9 illustrates a flowchart of a tone mapping process 900 in accordance with some implementations of the subject matter described herein. The process 900 can be implemented by the computer device 100, for example, implemented as the module 122 in the memory 120 of the computer device 100. At 910, the computer device 100 obtains a second reference image with a predetermined exposure. The predetermined exposure of the second reference image may be higher than the same exposure of the raw images, and the second reference image may be different from the first reference image. At 920, the computer device 100 adjusts an exposure of the fused image based on the second reference image to obtain an adjusted image. At 930, the computer device 100 generates a HDR image based on the adjusted image.

[0090] In some implementations, adjusting the exposure of the fused image may include at least one of the following: in response to a pixel difference between a pixel of the fused image and a corresponding pixel of the second reference image being lower than a predetermined difference threshold, replacing the pixel of the fused image with the pixel of the second reference image; and adjusting a value for a pixel of the fused image that has luminance higher than a predetermined luminance threshold.

[0091] In some implementations, generating the HDR image may include determining a luminance weight map for pixels of the fused image by comparing luminance of the adjusted image with luminance of the fused image; and fusing the luminance weight map and the fused image to generate the HDR image.

[0092] In some implementations, obtaining the second reference image may include obtaining a preview image of the scene as the second reference image, the preview image being collected before the plurality of raw images are captured.

Example Implementations

[0093] Several example implementations of the subject matter described herein are illustrated in the following.

[0094] In an aspect, a computer-implemented method is provided in the subject matter described herein, which comprising: obtaining a plurality of raw images and a first reference image of a scene, the plurality of raw images and the first reference image having a same size; and fusing the plurality of raw images based on the first reference image to obtain a fused image, the fusing comprising: for each of the plurality of raw images, determining pixel differences between corresponding pixels of the raw image and the first reference image, determining a pixel threshold for the raw image based on a distribution of at least a portion of the pixel differences, and identifying a noise pixel of the raw image to be excluded from the fusing by comparing the pixel differences with the pixel threshold.

[0095] In some implementations, determining the pixel threshold comprises: selecting an outlier pixel difference from the pixel differences based on at least one of the following: luminance of the raw image at a respective pixel, luminance of the first reference image at a respective pixel, values of the pixel differences, and a variation between a respective pixel and adjacent pixels of the raw image; and determining the pixel threshold based on the distribution of remaining pixel differences other than the outlier pixel difference.

[0096] In some implementations, the fusing further comprises: generating a first intermediate fused image by averaging remaining pixels other than the noise pixel across the plurality of raw images; downsizing the plurality of raw images to generate a plurality of thumbnail raw images; downsizing the first reference image to generate a thumbnail reference image; fusing the plurality of thumbnail raw images based on the thumbnail reference image to generate a second intermediate fused image; and generating the fused image based on the first intermediate fused image and the second intermediate fused image.

[0097] In some implementations, fusing the plurality of thumbnail raw images comprises: generating a set of intermediate reference images with different sizes from the thumbnail reference images; and for each of the plurality of thumbnail raw images: generating a set of intermediate thumbnail images with different sizes from the thumbnail raw image, and fusing the intermediate thumbnail images based on the intermediate reference images for generating of the second intermediate fused image.

[0098] In some implementations, fusing the intermediate thumbnail images comprises: for each of the intermediate thumbnail images: determining intermediate pixel differences between corresponding pixels of the intermediate thumbnail image and the intermediate reference image in a same size, determining an intermediate pixel threshold for the intermediate thumbnail image based on a distribution of at least a portion of the intermediate pixel differences, and identifying a noise pixel of the intermediate thumbnail image to be excluded from the fusing of the intermediate thumbnail images by comparing the intermediate pixel differences and the intermediate pixel threshold.

[0099] In some implementations, identifying the noise pixel of the raw image further comprises: for a first pixel in the raw image, determining a second pixel corresponding to the first pixel from a given intermediate thumbnail image generated from the thumbnail raw image corresponding to the raw image; determining a pixel difference between the second pixel and a corresponding pixel of a given intermediate reference image of the intermediate reference images, the given intermediate reference image having a same size as the given intermediate thumbnail image; and in response to the pixel difference between the second pixel and the corresponding pixel exceeding the intermediate pixel threshold, identifying the first pixel of the raw images as a noise pixel.

[00100] In some implementations, the method further comprises: obtaining a second reference image with a predetermined exposure; adjusting an exposure of the fused image based on the second reference image to obtain an adjusted image; and generating a high dynamic range mage based on the adjusted image.

[00101] In some implementations, adjusting the exposure of the fused image comprises at least one of the following: in response to a pixel difference between a pixel of the fused image and a corresponding pixel of the second reference image being lower than a predetermined difference threshold, replacing the pixel of the fused image with the pixel of the second reference image; and adjusting a value for a pixel of the fused image that has luminance higher than a predetermined luminance threshold.

[00102] In some implementations, generating the high dynamic range image comprises: determining a luminance weight map for pixels of the fused image by comparing luminance of the adjusted image with luminance of the fused image; and fusing the luminance weight map and the fused image to generate the high dynamic range image. [00103] In some implementations, obtaining the second reference image comprises: obtaining a preview image of the scene as the second reference image, the preview image being collected before the plurality of raw images are captured.

[00104] In some implementations, obtaining the plurality of raw images comprises: obtaining the plurality of raw images with a same exposure, the same exposure being lower than the predetermined exposure of the second reference image.

[00105] In some implementations, obtaining the first reference image comprises: selecting one of the plurality of raw images as the first reference image.

[00106] In an aspect, a device is provided in the subject matter described herein, which comprises: a processing unit; and a memory coupled to the processing unit and comprising instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including: obtaining a plurality of raw images and a first reference image of a scene, the plurality of raw images and the first reference image having a same size; and fusing the plurality of raw images based on the first reference image to obtain a fused image, the fusing comprising: for each of the plurality of raw images, determining pixel differences between corresponding pixels of the raw image and the first reference image, determining a pixel threshold for the raw image based on a distribution of at least a portion of the pixel differences, and identifying a noise pixel of the raw image to be excluded from the fusing by comparing the pixel differences with the pixel threshold.

[00107] In some implementations, determining the pixel threshold comprises: selecting an outlier pixel difference from the pixel differences based on at least one of the following: luminance of the raw image at a respective pixel, luminance of the first reference image at a respective pixel, values of the pixel differences, and a variation between a respective pixel and adjacent pixels of the raw image; and determining the pixel threshold based on the distribution of remaining pixel differences other than the outlier pixel difference.

[00108] In some implementations, the fusing further comprises: generating a first intermediate fused image by averaging remaining pixels other than the noise pixel across the plurality of raw images; downsizing the plurality of raw images to generate a plurality of thumbnail raw images; downsizing the first reference image to generate a thumbnail reference image; fusing the plurality of thumbnail raw images based on the thumbnail reference image to generate a second intermediate fused image; and generating the fused image based on the first intermediate fused image and the second intermediate fused image.

[00109] In some implementations, fusing the plurality of thumbnail raw images comprises: generating a set of intermediate reference images with different sizes from the thumbnail reference images; and for each of the plurality of thumbnail raw images: generating a set of intermediate thumbnail images with different sizes from the thumbnail raw image, and fusing the intermediate thumbnail images based on the intermediate reference images for generating of the second intermediate fused image.

[00110] In some implementations, fusing the intermediate thumbnail images comprises: for each of the intermediate thumbnail images: determining intermediate pixel differences between corresponding pixels of the intermediate thumbnail image and the intermediate reference image in a same size, determining an intermediate pixel threshold for the intermediate thumbnail image based on a distribution of at least a portion of the intermediate pixel differences, and identifying a noise pixel of the intermediate thumbnail image to be excluded from the fusing of the intermediate thumbnail images by comparing the intermediate pixel differences and the intermediate pixel threshold.

[00111] In some implementations, identifying the noise pixel of the raw image further comprises: for a first pixel in the raw image, determining a second pixel corresponding to the first pixel from a given intermediate thumbnail image generated from the thumbnail raw image corresponding to the raw image; determining a pixel difference between the second pixel and a corresponding pixel of a given intermediate reference image of the intermediate reference images, the given intermediate reference image having a same size as the given intermediate thumbnail image; and in response to the pixel difference between the second pixel and the corresponding pixel exceeding the intermediate pixel threshold, identifying the first pixel of the raw images as a noise pixel.

[00112] In some implementations, the acts further includes: obtaining a second reference image with a predetermined exposure; adjusting an exposure of the fused image based on the second reference image to obtain an adjusted image; and generating a high dynamic range mage based on the adjusted image.

[00113] In some implementations, adjusting the exposure of the fused image comprises at least one of the following: in response to a pixel difference between a pixel of the fused image and a corresponding pixel of the second reference image being lower than a predetermined difference threshold, replacing the pixel of the fused image with the pixel of the second reference image; and adjusting a value for a pixel of the fused image that has luminance higher than a predetermined luminance threshold.

[00114] In some implementations, generating the high dynamic range image comprises: determining a luminance weight map for pixels of the fused image by comparing luminance of the adjusted image with luminance of the fused image; and fusing the luminance weight map and the fused image to generate the high dynamic range image.

[00115] In some implementations, obtaining the second reference image comprises: obtaining a preview image of the scene as the second reference image, the preview image being collected before the plurality of raw images are captured.

[00116] In some implementations, obtaining the plurality of raw images comprises: obtaining the plurality of raw images with a same exposure, the same exposure being lower than the predetermined exposure of the second reference image.

[00117] In some implementations, obtaining the first reference image comprises: selecting one of the plurality of raw images as the first reference image.

[00118] In an aspect, the subject matter described herein provides a computer program product being tangibly stored on a non-transitory computer storage medium and comprising machine-executable instructions, the machine-executable instructions, when executed on a device, causing the device to obtain a plurality of raw images and a first reference image of a scene, the plurality of raw images and the first reference image having a same size; and fuse the plurality of raw images based on the first reference image to obtain a fused image, the fusing comprising: for each of the plurality of raw images, determining pixel differences between corresponding pixels of the raw image and the first reference image, determining a pixel threshold for the raw image based on a distribution of at least a portion of the pixel differences, and identifying a noise pixel of the raw image to be excluded from the fusing by comparing the pixel differences with the pixel threshold:

[00119] The functionally described herein can be performed, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-Programmable Gate Arrays (FPGAs), Application-specific Integrated Circuits (ASICs), Application-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

[00120] Program code for carrying out methods of the subject matter described herein may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

[00121] In the context of this disclosure, a machine readable medium may be any tangible medium that may contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

[00122] Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the subject matter described herein, but rather as descriptions of features that may be specific to particular implementations. Certain features that are described in the context of separate implementations may also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation may also be implemented in multiple implementations separately or in any suitable subcombination.

[00123] Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter specified in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. A computer-implemented method, comprising:

obtaining a plurality of raw images and a first reference image of a scene, the plurality of raw images and the first reference image having a same size; and

fusing the plurality of raw images based on the first reference image to obtain a fused image, the fusing comprising: for each of the plurality of raw images,

determining pixel differences between corresponding pixels of the raw image and the first reference image,

determining a pixel threshold for the raw image based on a distribution of at least a portion of the pixel differences, and

identifying a noise pixel of the raw image to be excluded from the fusing by comparing the pixel differences with the pixel threshold.

2. The method according to claim 1, wherein determining the pixel threshold comprises:

selecting an outlier pixel difference from the pixel differences based on at least one of the following:

luminance of the raw image at a respective pixel,

luminance of the first reference image at a respective pixel,

values of the pixel differences, and

a variation between a respective pixel and adjacent pixels of the raw image; and determining the pixel threshold based on the distribution of remaining pixel differences other than the outlier pixel difference.

3. The method according to claim 1, wherein the fusing further comprises: generating a first intermediate fused image by averaging remaining pixels other than the noise pixel across the plurality of raw images;

downsizing the plurality of raw images to generate a plurality of thumbnail raw images;

downsizing the first reference image to generate a thumbnail reference image; fusing the plurality of thumbnail raw images based on the thumbnail reference image to generate a second intermediate fused image; and

generating the fused image based on the first intermediate fused image and the second intermediate fused image.

4. The method according to claim 3, wherein fusing the plurality of thumbnail raw images comprises: generating a set of intermediate reference images with different sizes from the thumbnail reference images; and

for each of the plurality of thumbnail raw images:

generating a set of intermediate thumbnail images with different sizes from the thumbnail raw image, and

fusing the intermediate thumbnail images based on the intermediate reference images for generating of the second intermediate fused image.

5. The method according to claim 4, wherein fusing the intermediate thumbnail images comprises:

for each of the intermediate thumbnail images:

determining intermediate pixel differences between corresponding pixels of the intermediate thumbnail image and the intermediate reference image in a same size,

determining an intermediate pixel threshold for the intermediate thumbnail image based on a distribution of at least a portion of the intermediate pixel differences, and identifying a noise pixel of the intermediate thumbnail image to be excluded from the fusing of the intermediate thumbnail images by comparing the intermediate pixel differences and the intermediate pixel threshold.

6. The method according to claim 5, wherein identifying the noise pixel of the raw image comprises:

for a first pixel in the raw image, determining a second pixel corresponding to the first pixel from a given intermediate thumbnail image generated from the thumbnail raw image corresponding to the raw image;

determining a pixel difference between the second pixel and a corresponding pixel of a given intermediate reference image of the intermediate reference images, the given intermediate reference image having a same size as the given intermediate thumbnail image; and

in response to the pixel difference between the second pixel and the corresponding pixel exceeding the intermediate pixel threshold, identifying the first pixel of the raw images as a noise pixel.

7. The method according to claim 1, further comprising:

obtaining a second reference image with a predetermined exposure;

adjusting an exposure of the fused image based on the second reference image to obtain an adjusted image; and

generating a high dynamic range (HDR) image based on the adjusted image.

8. The method according to claim 7, wherein adjusting the exposure of the fused image comprises at least one of the following:

in response to a pixel difference between a pixel of the fused image and a corresponding pixel of the second reference image being lower than a predetermined difference threshold, replacing the pixel of the fused image with the pixel of the second reference image; and

adjusting a value for a pixel of the fused image that has luminance higher than a predetermined luminance threshold.

9. The method according to claim 7, wherein generating the HDR image comprises:

determining a luminance weight map for pixels of the fused image by comparing luminance of the adjusted image with luminance of the fused image; and

fusing the luminance weight map and the fused image to generate the HDR image.

10. The method according to claim 7, wherein obtaining the second reference image comprises:

obtaining a preview image of the scene as the second reference image, the preview image being collected before the plurality of raw images are captured.

11. A device, comprising:

a processing unit; and

a memory coupled to the processing unit and comprising instructions stored thereon, the instructions, when executed by the processing unit, causing the device to perform acts including:

12. The device according to claim 11, wherein determining the pixel threshold comprises: selecting an outlier pixel difference from the pixel differences based on at least one of the following:

luminance of the raw image at a respective pixel,

a luminance of the first reference image at a respective pixel,

values of the pixel differences, and

13. The device according to claim 11, wherein the fusing further comprises: generating a first intermediate fused image by averaging remaining pixels other than the noise pixel across the plurality of raw images;

14. The device according to claim 13, wherein fusing the plurality of thumbnail raw images comprises:

generating a set of intermediate reference images with different sizes from the thumbnail reference image; and

for each of the plurality of thumbnail raw images:

15. The device according to claim 11, wherein the acts further include:

obtaining a second reference image with a predetermined exposure;

generating a high dynamic range (HDR) image based on the adjusted image.