WO2011158225A1

WO2011158225A1 - System and method for enhancing images

Info

Publication number: WO2011158225A1
Application number: PCT/IL2011/000290
Authority: WO
Inventors: Yossi Deutsch
Original assignee: Mirtemis Ltd.
Priority date: 2010-06-17
Filing date: 2011-04-05
Publication date: 2011-12-22

Abstract

A method of enhancing a low resolution image is provided. The method is effected by obtaining a plurality of low resolution frames corresponding to the low resolution image, upscaling a frame of the plurality of low resolution frames, generating a plurality of simulated low resolution frames, identifying differences between the plurality of low resolution frames and the plurality of simulated low resolution frames and mapping the differences to a high resolution grid and generating an enhanced version of the low resolution image.

Description

SYSTEM AND METHOD FOR ENHANCING IMAGES

FIELD AND BACKGROUND OF THE INVENTION

The present invention relates to a system and method for enhancing still and video images.

The quality of video and still images is often determined via their pixel count which is mostly a function of the camera sensor (typically a CCD or CMOS sensor chip) used for capturing the image(s). The camera sensor converts captured light into discrete signals which ultimately form pixels on the resulting displayed image. The total number of pixels in the image determines its "pixel count". Consumes: digital cameras in use today utilize CMOS sensors capable of capturing 5-14 megapixels (MP) of visual data.

Super-resolution is a term for a mathematical approach for increasing the resolution of an image beyond that of the original pixel count. Super-resolution approaches typically utilize information from several images to create one upsized image by extracting details from one or more frames to reconstruct other frames. Super- resolution is different from image upsizing approaches which synthesize artificial details in order to increase the pixel count of an image.

Several super-resolution approaches are known in the art, see for example, Altunbasak et al. 2002 - Super-resolution still and video reconstruction from mpeg- coded video. IEEE Trans Circuits Sys. Video Technol. 12:217-226; Elad and Feuer. 1999 - Super-resolution reconstruction of image sequences; Farsiuet al. 2003 - Robust shift and add approach to super-resolution. SPIE Conf on Appl. Digital Signal and Image Process, pp. 121-130; Hardie et al. 1997 - Joint map registration and high- resolution image estimation using a sequence of under sampled images. IEEE Trans Image Process 6:1621-1633.

Most of these approaches utilize frame-to-frame movement of the capturing device and the objects within the captured scene to "fill-in" missing details and generate a supersized image or video sequence.

Although presently available super-resolution approaches are capable of enhancing still and video images, they are processor and memory intensive and thus cannot be effectively used for real-time video super-resolution. In addition, such approaches are typically insensitive to environmentally-produced artifacts, often magnifying such artifacts in resulting super-resolution images produced thereby.

Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention, suitable methods and materials are described below. In case of conflict, the patent specification, including definitions, will control. In addition, the materials, methods, and examples are illustrative only and not intended to be limiting.

Implementation of the method and system of the present invention involves performing or completing selected tasks or steps manually, automatically, or a combination thereof. Moreover, according to actual instrumentation and equipment of preferred embodiments of the method and system of the present invention, several selected steps could be implemented by hardware or by software on any operating system of any firmware or a combination thereof. For example, as hardware, selected steps of the invention could be implemented as a chip or a circuit. As software, selected steps of the invention could be implemented as a plurality of software instructions being executed by a computer using any suitable operating system. In any case, selected steps of the method and system of the invention could be described as being performed by a data processor, such as a computing platform for executing a plurality of instructions.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is herein described, by way of example only, with reference to the accompanying drawings. With specific reference now to the drawings in detail, it is stressed that the particulars shown are by way of example and for purposes of illustrative discussion of the preferred embodiments of the present invention only, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects of the invention. In this regard, no attempt is made to show structural details of the invention in more detail than is necessary for a fundamental understanding of the invention, the description taken with the drawings making apparent to those skilled in the art how the several forms of the invention may be embodied in practice.

In the drawings:

FIG. 1 is a flow chart illustrating the basic steps of the present method.

FIG. 2 is an image captured with a 640X480 CMOS camera.

FIG. 3 illustrates a 8X digital zoom of the image of Figure 2 generated using bilinear interpolation.

FIG. 4 illustrates a 6X digital zoom of the image of Figure 2 generated using the present approach.

DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention is of a system and method which can be to enhance images. Specifically, the present invention can be used to increase the pixel count of images and thus it can be used to enhance low resolution images.

The principles and operation of the present invention may be better understood with reference to the drawings and accompanying descriptions.

Before explaining at least one embodiment of the invention in detail, it is to be understood that the invention is not limited in its application to the details set forth in the following description or exemplified by the Examples. The invention is capable of other embodiments or of being practiced or carried out in various ways. Also, it is to be understood that the phraseology and terminology employed herein is for the purpose of description and should not be regarded as limiting.

Images captured by digital imaging sensors are definition limited by the resolution of the image sensor. Super resolution methods attempt to overcome the diffraction limits of the imaging sensor by reconstructing a high resolution image from a set of under-sampled, low resolution images. This enables an imaging system to output images with a resolution that is higher than the imaging sensor limits.

Prior art super resolution approaches typically utilize an image registration step in which the geometric transforms between the captured frames are analyzed and a shift/rotation/wrap is calculated with sub pixel accuracy followed by a super-resolution (SR) stage in which the registered low resolution images are fused into a single, high resolution image. Most multi-frame super-resolution methods require the algorithm to keep a history of captured frames and therefore memory requirement can become too demanding, especially for low power devices such as mobile phones and digital still cameras.

To overcome these limitations, several companies, such as NEC™ implemented a single frame super resolution approach (see, for example, published US Patent application 20090274385 and 20080267525) which overcomes the high memory requirements of prior art approaches.

Although the NEC approach substantially reduces memory requirements it still suffers from several limitations including not actually breaking the diffraction limits of the imaging sensor but, instead, trying to achieve a subjectively 'nicer' looking image.

While reducing the present invention to practice, the present inventors have devised a super-resolution approach which can be used to enhance low resolution images while traversing the limitations of prior art approaches. As is further described herein, the present approach can be used to rapidly enhance low resolution still and video images, thus enabling real time super-resolved zoom in digital cameras and cell phones and high definition streaming of low definition video.

Thus, according to one aspect of the present invention, there is provided a method of enhancing a low resolution image. As used herein, the term "enhancing" when used with respect to a low resolution image refers to increasing the pixel count of the image by a factor of at least 1.5, preferably 4X, 6X, 8X or more (e.g. 10X) or improving image quality via reduction of noise or anti-aliasing. As used herein, the phrase low resolution image refers to an image characterized by low pixel count (typically standard definition or less), and/or to an image which can be enhanced by noise reduction or anti-aliasing. Enhancement can also refer to increasing the pixel count or quality of a high definition image, for example, by converting a high resolution image into a super high resolution image. Thus, the present approach can enhance any image captured by a digital imaging sensor at any resolution.

The method of the present invention is effected by first obtaining a plurality of low resolution frames corresponding to the low resolution image. The low resolution (LR) frames can be obtained as a time sequence (microseconds to seconds) of frames, i.e. by sequentially capturing a plurality (e.g. 5, 10, 15 or more) frames of the same image (scene) manually or preferably automatically. Although each frame can represent the image from the same angle and cover the same X-Y coordinates, a slight shift (e.g. of at least 0.1% of the pixel physical size) in angle and/or X-Y coordinates resulting from, for example, intentional or accidental camera movement is preferred since it substantially increases the amount of information covered by the sequence of frames.

Each captured frame of the LR frame sequence is registered to a reference frame (e.g. previous frame) and the displacement information is stored.

Once the sequence of LR frames is captured, a single frame (preferably first or last in the sequence) is upscaled to create a first high resolution estimation by bicubic, bilinear or lanczos interpolation or any other known upscaling method. This step is performed once the frame is captured, thus, in instances where the first frame of the LR sequence is used, this frame is processed immediately following capture.

The upscaled frame is then utilized to generate a plurality of simulated low resolution frames by sub-sampling according to the SR factor (HR_size/LR_size) and shifting according to the displacements information retrieved from the LR sequence.

Once a sequence of simulated low resolution frames is generated it is compared to the original LR frames (captured by the device) by means of LI norm distance function and the difference between the captured and simulated frames is mapped to a high resolution grid to generate an enhanced version of the low resolution image. The enhanced version of the low resolution image is then interpolated, using, for example, a Gaussian filter to obtain the final enhanced image.

Comparison between the simulated LR frames and the original LR frames (captured by the device) is preferably effected by subtracting each pixel value of the simulated LR frames from its respective original LR frame of the captured sequence.

The result of the previous stage is an "error" image (showing the difference between the LR and simulated LR) and it is added to the estimated HR frame (described above).

Once the enhanced image is obtained it is preferably used to generate simulated LR frames, which are mapped to a new grid which is then compared to the grid generated from the original (captured) LR frames (first grid above). These steps are repeated multiple times in order to arrive at a final enhanced image (e.g. a high definition image in the case of an SD starting image). The present approach differs from prior art super resolution approaches. Whereas prior art approaches maintain (store) all the images of a captured sequence and compare captured and simulated low-resolution images, the present approach maps low resolution images into a high resolution grid, maps simulated low resolution images to another grid and identifies the differences between these grids. This traverses the need to maintain (store) captured LR images in memory and thus reduces the memory requirement.

It will be appreciated that the present approach reduces the memory requirements regardless of the number of LR frames used in the captured sequence. Such reduction in memory requirements is possible because the present approach does not need to store captured frames in memory, needing only 2 X HR_width X HR_ Height memory space instead of (HR_width X HRJheight ) + sequence length X LR_width X LR_height stored by prior art approaches.

Thus, in the case of 30 captured LR frames of 640X480 pixels each, 4X scaling requires:

mem_size = (30 X 640 X 480) + 4 X (640 X 480) = 10444800 pixels stored by prior art approaches, while with the present approach, it only requires:

mem_size = 2 X 4 X(640 X 480) = 2457600 pixels stored, i.e. less than a quarter of the memory usage.

Figure 1 is a flow chart illustrating the basic steps of the present methodology. A more detailed description of the present methodology is provided in Examples 1 and 2 of the Examples section which follows.

The present methodology can be used in enhancing digital zooming, of, for example, a digital camera or a digital microscope; in providing high definition stills; or for enhancing images by anti-aliasing or reduction of noise. It can be used to enhance any image of any type and content including real and animated images (e.g. CGI or 3D game images).

Example 2 of the Examples section which follows exemplifies use of the present method in enhancing digital zooming of a standard definition still image.

The present methodology can be implemented using a software application executed by processing unit of a device such as a personal computer, a server, a work station and the like, a handheld device such as a mobile phone or a PDA, a tablet, a digital stills or video camera and the like or via a dedicated unit (chip) configured for such purposes and utilized by any of the aforementioned devices.

Thus, according to another aspect of the present invention there is provided a system for enhancing a low resolution image.

The system includes a processing unit capable of executing the methodology described herein. Exemplary systems include desktop and mobile computers, digital cameras, gaming devices (e.g. Sony PSP™), mobile phones, TV Displays, and the like. As is mentioned hereinabove, the processing unit can be a dedicated processing unit, in which case , an ASIC or an FPGA is used to implement the method or it can be any processing unit capable of running dedicated software designed for executing the above described methodology.

In any case, the system is designed to process and enhance images captured by a camera of the system or for enhancing images (still and video) stored on or streamed to the system.

As used herein the term "about" refers to ± 10 %.

Additional objects, advantages, and novel features of the present invention will become apparent to one ordinarily skilled in the art upon examination of the following examples, which are not intended to be limiting. Additionally, each of the various embodiments and aspects of the present invention as delineated hereinabove and as claimed in the claims section below finds experimental support in the following examples. EXAMPLES

Reference is now made to the following examples, which together with the above descriptions, illustrate the invention in a non limiting fashion.

EXAMPLE 1

Super-Resolution algorithm

The following describes an algorithm for multi-frame super resolution suitable for use in low memory digital still applications. Prior art super-resolution approaches assume that N low resolution images represent different snapshots of the same scene. The real image that needs to be estimated is a single high-resolution image X of size Wsr x Hsr. Thus, each low resolution (LR) image 11, 12,..., In is treated as a down-sampled, degraded version of a real-life, actual image with a size Wlr x Hlr, where Wlr < Wsr and Hlr < Hsr.

Thus, prior art approaches create a super-resolution image by solving the following equation:

1112 i In = Dl · Bl · Wl^■ Dn · Bn · WnX + el ! en = Al ! An X+ el ! en

where D represents the down sampling matrix, B is the blur matrix and W is the geometric wrapping matrix.

This can be written in simple form as: l=AX+e

where Ak= Dk · Bk · Wk for k=l,2,..,N

The operator A represents all image degradation factors such as sub-sampling, blurring, wrapping, etc and e represents the Gaussian noise.

Solving the above for X is an inverse problem. Solving this problem can be done with numerical methods in an iterative approach.

By assuming that R is the inverse of matrix A one can use:

Xk+1= Xk + Rl-A- Xk

where A is the linear operator, 1 is the set of low resolution images and Xk is the kth approximation of the final super resolution image.

Because 1 is a set of low resolution images, prior art approaches require that all images be kept in memory in order to enable iterative processing of these images.

In order to apply this prior art approach to video processing, the following general algorithm is utilized:

(i) create an empty, temporary, high resolution image;

(ii) translate each pixel in every LR image into the HR image (translation and up-scale - R);

(iii) create a set of simulated LR images by down-sampling and shifting an approximation to final HR (operator A on X);

(iv) Compare each simulated LR with the original LR and create a diff image

(1-AX); (v) interpolate the result using a Gaussian filter (any other filter is possible) with sigma equals half the size of the resolution factor and the kernel size is big enough to follow the following guideline: min(kernel_values)/max(kernel_values) <=0.005

(vi) add the diff image to a finalHR image (Xk+1 = Xk + ...);

(vii) go to step 1 using the set of diff images (instead of the original LR set)

The problem with such implementation is that it requires the entire set of LR images for processing and thus requires large data storage capabilities.

In order to traverse the memory problem inherent to prior art approaches, the present inventors have devised a novel approach which forgoes the use of the entire LR image set and applies the following calculation:

where R is the approximation of the inverse of A in the accumulated high resolution space.

The following steps are utilized in order to implement the present approach, each input LR frame can be a color or a black and white frame. In cases where the input LR frame is in color, each plane is processed separately (optionally in parallel):

(i) each LR frame is up-scaled, translated and accumulated into a sumHRframes variable - this leads to loss of the actual set of low resolution images;

(ii) an estimated version of finalHR is then created (by simply resizing the last frame in traditional methods such bicubic interpolation);

(iii) N frames of simulated LR frames are then generated by down-sampling shifting and then up sampling according to the image registration stage;

(iv) all simulated images are then accumulated into a sum_of_simLR;

(v) the difference between the sumHRframes and sum_of_simLR is then identified;

(vi) the difference image is then interpolated by using a Gaussian filter (or any other suitable filter); and

(vii) the result is added to a final HR image. These steps are then repeated for each LR frame. Since comparisons between the simulated and actual LR are performed in the High resolution space, one must accommodate for that by increasing the size and sigma of the Gaussian filter to cover more area of the sparser matrix.

EXAMPLE 2

Super-resolution of a still image

The present approach was utilized to enhance an image captured with a 640X480 CMOS camera (Figure 2).

The size of the sequence of the LR frames needed to create the high resolution image was determined prior to execution.

(i) Each LR frame was registered to the previous captured LR frame and displacement parameters between the two frames were generated. If the images are translated, the result is Di(x) and Di(y) - representing the displacement on the horizontal and vertical planes, respectively, of the i'th frame. A set of displacements, D, was then generated for all input frames.

(ii) The size of the high resolution frame was estimated (at first iteration it can be created using standard bi-cubic scaling or any other known single frame scaling method) and N frames of simulated low resolution frames were generated by down- sampling, geometric translation (e.g. shifting using -l*Di(x,y) for the i'th frame), adding noise and blurring (this step can also utilize any other anticipated degradation processes such as artifacts caused by imperfection in the optical parts (e.g. lens, filters, etc).

The set of low resolution frames was then subtracted from the original low resolution frames. Each resulting subtracted frame represents the difference between an actual and a simulated low resolution frame. This step enabled prediction of an optimum solution.

(iii) Each difference frame was then mapped to a high resolution grid using upscaling and translation using Di(x,y) for the i'th frame.

(iv) The high resolution grid formed a sparse matrix with accumulated information of all the mapped difference frames. The high resolution grid was then interpolated using a Gaussian filter (with large support and appropriate sigma). The interpolated high resolution grid was then added to a final super resolution image and the resulting image was then used as input for the next iteration of the above described process starting with re-estimation of the high resolution image.

(v) Following completion of the iterative process, the final image was sharpened by interpolating it with a Gaussian filter (with parameters similar to those used in iv) and multiplying it by a sharpening factor. This resulted in a blurred image that was subtracted from the final image: Fsr = Fsr - a(H*Fsr) where * represents convolution and H represents a Gaussian kernel and Fsr is the final super resolved image.

In order to further reduce memory requirements, steps (ii)-(iv) were altered as follows. The simulated low resolution frames were mapped into an accumulated high resolution grid via upscaling and translating. The current estimation of the high resolution frame (at first iteration it can be created using standard bi-cubic scaling or any other known single frame scaling method) was then identified and N frames of simulated low resolution frames were generated via down-sampling, geometric translation (e.g. shifting it using -l*Di(x,y) for the i'th frame), adding noise and blurring. Each simulated low resolution image was then mapped to an accumulated high resolution simulation grid.

The simulation grid was then subtracted from the high resolution grid and he result was interpolated using a Gaussian filter (with higher support arid higher sigma in order to accommodate for doing it in high resolution space). The interpolated image was then added to the final high resolution image which in turn was used as an input for the next iteration.

The above described alteration to the present process reduced memory use to twice the size of the final, super resolved image regardless the size of the input sequence (same memory requirements if N=15 frames or N=100 frames).

Figure 3 illustrates a 6X digital zoom of the image of Figure 2 generated using bicubic interpolation. Figure 4 illustrates a 6X digital zoom of the image of Figure 2 generated using the present approach.

These results clearly show that the present approach results in enhanced resolution/definition, a reduction in moire pattern and other aliasing effect and an improved signal to noise ratio (SNR). It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination.

Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims. All publications, patents and patent applications mentioned in this specification are herein incorporated in their entirety by reference into the specification, to the same extent as if each individual publication, patent or patent application was specifically and individually indicated to be incorporated herein by reference. In addition, citation or identification of any reference in this application shall not be construed as an admission that such reference is available as prior art to the present invention.

Claims

WHAT IS CLAIMED IS:

1. A method of enhancing a low resolution image comprising:

(a) obtaining a plurality of low resolution frames corresponding to the low resolution image;

(b) upscaling a frame of said plurality of low resolution frames to generate an upscaled frame;

(c) using said upscaled frame to generate a plurality of simulated low resolution frames;

(d) identifying differences between said plurality of low resolution frames and said plurality of simulated low resolution frames; and

(e) mapping said differences to a high resolution grid and generating an enhanced version of said low resolution image.

2. The method of claim 1, wherein (d) is effected by:

(i) mapping said plurality of low resolution frames to a first grid;

(ii) mapping said plurality of simulated low resolution frames to a second grid; and

(iii) identifying said differences between said first grid and said second grid.

3. The method of claim 1, further comprising interpolating said differences prior to said generating said enhanced version of said low resolution image.

4. The method of claim 3, further comprising using said enhanced version of said low resolution image to repeat (c)-(e).

5. The method of claim 3, wherein said interpolating is effected using a Gaussian filter.

6. The method of claim 1, wherein said plurality of low resolution frames constitute a time sequence of said low resolution frames.

7. The method of claim 5, wherein said frame of said plurality of low resolution frames is a first or last frame of said time sequence.

8. A system for enhancing a low resolution image comprising a computing unit executing the method of claim 1.

9. The system of claim 8, wherein the system is a desktop or laptop computer, a tablet, a handheld device, a digital camera or a dedicated display.