A Method and System for Super Resolution
FIELD OF THE INVENTION
The present invention relates to super resolution enhancement of sampled data.
BACKGROUND OF THE INVENTION
Super-resolution refers to the enhancement of resolutions for images of scenery or objects acquired by an image capture device, such as a scanner or a CCD video camera, beyond the resolution inherent in the capture device, and beyond other limitations of the capture device. Super-resolution is used primarily to enhance the quality of captured images, and to increase the information content therein.
Many super-resolution techniques used with imaging sensors are based on micro scanning operations. When using πu^o-scanning operations, a scene or object being captured is sampled multiple times, each time with a sub pixel shift applied.
Prior art techniques for super-resolution do not achieve exact super resolution, but rather they perform over-sampling. Exact super resolution is limited by the fill factor; i.e., the ratio between the area of the region sensitive to radiation and the pitch. The pitch is the area between centers of adjacent detectors in a sensor panel - equivalently, the pitch is the area between adjacent pixels. For example, if the sensitivity to radiation only extends over half the distance separating detectors, then the fill factor is 0.5 * 0.5 = 25%. With a 25% fill factor, exact super-resolution using prior art methods can be obtained up to a factor of two in resolution in each dimension, since the regions of sensitivity to radiation are only half the size of the pitch, and can thus be shifted by half-pixels in each dimension without overlapping. Additional super-resolution, beyond a factor of two, has been achieved by prior art methods at the expense of decreased contrast.
For a fill factor of 80%, which is typical for image sensing devices, the limitation on exact super-resolution is -— ■ « 1.12. The experimental barrier is approximately at a factor of 1.5, with some decrease in contrast.
Prior art methods for obtaining super resolution from captured images are based on estimating signal distortion. These methods perform optimal estimation of signal distortion from the captured images, using Bayesian techniques and using criteria such as Maximum Entropy. These methods are advantageous when image acquisition is made over a large distance (such as satellite data acquired from outer space), since atmospheric transmission and turbulence have a major impact on limiting the resolution obtained.
One such method is described in I. Cheeseman, B. Kanefsky, R. Kraft, J. Stutz, R. Hanson, " Super-Resolved Surface Reconstruction from Multiple Images," in Maximum Entropy and Bayesian Methods, G. R. Heidbreder (ed.), Kluwer, the Netherlands, 1996, pgs. 293 - 308. This method is based on inverse graphics theory, and is used for ground modeling from outer space observations. An initial ground model is formed by letting each pixel "vote" on what the corresponding ground position should be, based upon the extent that the corresponding ground position contributes to that pixel. The initial ground model is then used to project what an image should be (i.e., predict each pixel value). The differences between the predicted pixel values and the observed pixel values are used to update the ground model until it cannot be further improved. This procedure produces an increase in both spatial resolution and gray-scale resolution.
Another such method is described in A. Zomet and S. Peleg, "Applying Super-Resolution to Panoramic Mosaics," IEEE Workshop on Applications of Computer Vision, Princeton, Oct 1998. This method attains super resolution using an iterative method for mosaicing. Given a video sequence scanning a static scene, a panoramic image can be constructed whose field of view is the entire scene. Each region in the panorama is covered by many overlapping image segments, and this can be exploited to enhance its resolution.
Another such method is described in Pearson, T. J. and Readhead, A. C. S. "Image Formation by Self-Calibration in Radio Astronomy," Ann. Rev. Astron. Astrophys. 22, 1984, pgs. 97-130. This method uses non linear methods, and is based on optimal spectrum estimation.
SUMMARY OF THE INVENTION
The present specification concerns data obtained by sampling a continuous signal, and describes methods and systems for enhancing the resolution of the data. For example, a barcode reader samples a barcode when scanning it, and the present invention can be used to enhance the resolution of the sampled data, thereby providing a better reconstruction of the barcode. For another example, a CCD camera samples an object or scene being viewed, to produce a digital image, and the present invention can be used to enhance the resolution of the digital image, thereby providing a better quality image.
The present invention enhances sampled data by effectively decreasing the sampling period. When used with digital images, the present invention provides sub-pixel accuracy. An original image quantized into pixel area elements can be enhanced using the present invention to a finer granularity quantization with sub-pixel area elements. For example, the enhanced image can have one-eighth pixel granularity.
The present invention overcomes the limitation of prior art methods to achieve exact resolution improvement by more than a factor of 1.5. Using the present invention, any desired resolution improvement may be obtained by attaching an appropriately designed mask to a sensor plane of a capture device. There is provided in accordance with a preferred embodiment of the present invention a method for enhancing the resolution of an image sensing device, including the steps of attaching a mask to a panel of detectors in an image sensing device, generating multiple fields of view, the multiple fields of view being related to one another by sub-pixel shifts, acquiring multiple images with the image sensing device from the multiple fields of view, and combining the multiple images into an enhanced image of higher pixel resolution than the pixel resolutions of the multiple images.
There is further provided in accordance with a preferred embodiment of the present invention a system for enhancing the resolution of an image sensing device, including an image sensing device comprising a panel of detectors, a mask attached to said panel of detectors, a motion generator generating multiple fields of view, the multiple fields of view being related to one another by sub-pixel shifts, image acquisition circuitry housed within said image sensing device acquiring multiple images from the multiple fields of view, and a combiner combining the multiple images into an enhanced image of higher pixel resolution than the pixel resolutions of the multiple images.
There is still further provided in accordance with a preferred embodiment of the present invention a method for enhancing the resolution of an image sensing device, including the steps of creating replicas of fields of view using an optical element attached to an image sensing device, acquiring multiple images with the sensing device from the replicas of fields of view, and combining the multiple images into an enhanced image of higher pixel resolution than the pixel resolutions of the multiple images.
There is additionally provided in accordance with a preferred embodiment of the present invention a system for enhancing the resolution of an image sensing device, including an image sensing device, an optical element attached to the image sensing device, the optical element being such as to create replicas of fields of view, image acquisition circuitry housed within the image sensing device acquiring multiple images from the replicas of fields of view, and a combiner combining the multiple images into an enhanced image of higher pixel resolution than the pixel resolutions of the multiple images.
BRIEF DESCRIPTION OF THE DRAWINGS
The present invention will be more fully understood and appreciated from the following detailed description, taken in conjunction with the drawings in which:
Figure 1 is an illustration of a one-dimensional cross-section of a prior art image sensing device, such as a CCD camera or a scanner;
Figure 2 is a prior art illustration of a pixel sensitivity (apodization) function, for modeling averaged light intensity arriving at a detector;
Figures 3A - 3C are illustrations of the results of computer simulations of a preferred embodiment of the present invention, as applied to CCD cameras;
Figures 4A - 4D are further illustrations of the results of computer simulations of a preferred embodiment of the present invention, as applied to CCD cameras;
Figures 5A - 5C are illustrations of the results of computer simulations of a preferred embodiment of the present invention, as applied to barcode readers;
Figure 6 is an illustration of three samplings of an object by a sensing device, separated by sub-pixel shifts, in accordance with a preferred embodiment of the present invention;
Figure 7 is a simplified illustration of a mask attached to a sensor panel of array detectors in an image sensing device, in accordance with a preferred embodiment of the present invention;
Figure 8 illustrates the shape of a mask in accordance with a preferred embodiment of the present invention;
Figures 9A - 9J illustrate the use of a mask in obtaining super- resolution, in accordance with a preferred embodiment of the present invention;
Figure 10 illustrates an enhanced image obtained by applying super-resolution to the low resolution CCD captured image illustrated in Figure 3B, without the use of a mask; and
Figure 11 is a simplified illustration of an alternating ("checkerboard") mask attached to a sensor panel of array detectors in an image sensing device, in accordance with a preferred embodiment of the present invention.
DETAILED DESCRIPTION OF A PREFERRED EMBODIMENT
The present invention concerns digital images captured by digital image acquisition devices, such as digital cameras and scanners, and provides a method and system for obtaining super-resolution imagery. Super-resolution refers to the ability to obtain a resolution of a digital image that is greater than that native resolution of the image acquisition device. The present invention uses multiple low resolution captures of a picture to obtain a high resolution digital image. For example, if 64 images of an object are captured by a CCD camera having a spatial resolution of 32 x 32 pixels, the present invention can be used to integrate the captured data and provide a digital image of the object at a spatial resolution of 256 x 256 pixels. The present invention makes efficient use of captured image data, in that there is a direct one-to-one relationship between the number of images captured and the ratio of resolution enhancement. That is, if K images are captured then the present invention enhances resolution of the capture device by a factor of K.
Additionally, the present invention applies to acquisition devices, regardless of their dynamic color range. In the example described above, the CCD camera can have as low as 1 bit per pixel color depth, or as high as 8 bits per pixel and higher, and the present invention is applicable.
As used in the present specification in reference to digital images, the terms "low resolution" and "high resolution" are intended to be relative terms, used to indicate an advantage of the present invention in providing digital images of an input object at a higher resolution than those produced by an image acquisition device.
Reference is now made to Figure 1, which illustrates a one- dimensional cross-section of a prior art image sensing device, such as a CCD camera or a scanner. An image sensing device 100 consists of an array of detectors 110 in a sensing panel (not shown), each of which measures a light intensity of a scene 120 at a specific pixel location. Light rays 130 emanating from scene 120 are diffracted by a lens 140, and arrive at detectors 110.
The array of detectors 110 in the sensing panel is two- dimensional, and the one-dimensional array of detectors illustrated in Figure 1 is but a single cross-section of the two-dimensional array. Similarly, Figure 1 illustrates a single cross-section of scene 120. The direction of the one- dimensional array of detectors is denoted by a variable x, and the light intensity of the scene is correspondingly denoted by u(x). The variable x is a continuous
variable, and scene 120 corresponds to a continuous signal. The array of detectors 110 samples scene 120, and produces a discrete digital image, comprising pixel locations and pixel color intensities. The spacings between detectors corresponds to the pixel size for the sampled digital image, and is referred to as the "pitch."
Reference is now made to Figure 2, which illustrates a one- dimensional cross-section of a prior art pixel sensitivity (apodization) function, for modeling averaged light intensity arriving at a detector. The measured light intensity at each detector 110 (Figure 1) is not a measure of the light intensity at a single point location of scene 120. Instead, it is an average intensity, averaged by a pixel sensitivity function g(x), or apodization. The pixel sensitivity function is illustrated as a one-dimensional function g(x) in Figure 2, but g(x) is only a single cross-section of a two-dimensional pixel sensitivity function g(x, y).
The support of the two-dimensional pixel sensitivity function corresponds to the region that is sensitive to radiation. This region typically extends across an area that is contained within the area between neighboring pixel centers. The ratio between the area of the two-dimensional support of the pixel sensitivity function and the area between pixel centers containing it is referred to as the "fill factor."
It is the averaging with g(x) that gives rise to complications in obtaining exact super-resolution in prior art methods.
Reference is now made to Figures 3A - 3C, which illustrate the results of computer simulations of a preferred embodiment of the present invention, as applied to CCD cameras. Figure 3A shows an input picture with spatial resolution of 256 x 256 pixels. Figure 3B shows an image of the input picture, captured by a CCD camera having a spatial resolution of 32 x 32 pixels, and 8 bits per pixel color depth. Figure 3C shows the reconstructed image obtained by the present invention by using 64 pictures like the one illustrated in Figure 3B. It can be seen that substantially all of the spatial details contained within the input picture are reconstructed in the image of Figure 3C.
Reference is now made to Figures 4A - 4D, which further illustrate the results of computer simulations of a preferred embodiment of the present invention, as applied to CCD cameras. Figure 4A shows a low resolution image captured from the input picture illustrated in Figure 3A, using a CCD camera having a spatial resolution of 32 x 32 pixels and a color depth of 1 bit per pixel. It can be seen that the image shown in Figure 4A is a black and white image. Figure 4B shows the reconstructed image obtained by the present invention by using 64 pictures like the one illustrated in Figure 4A. It can be seen
that although Figure 4B is not identical to Figure 3A, it nevertheless is a much better approximation than is Figure 4A, both in terms of spatial resolution and in terms of dynamic color range.
Similarly, Figure 4C shows a low resolution image captured from the input picture illustrated in Figure 3A, using a CCD camera having a spatial resolution of 32 x 32 pixels and a color depth of 2 bits per pixel. It can be seen that the image shown in Figure 4C is comprised of four colors. Figure 4D shows the reconstructed image obtained by the present invention, by using 64 pictures like the one illustrated in Figure 4C. It can be seen that Figure 4D is almost identical to Figure 3A.
When the present invention was applied to a CCD camera having a spatial resolution of 32 x 32 pixels and a color depth of 4 bits per pixel (not shown), the reconstructed image (not shown) was indistinguishable from the original.
Reference is now made to Figures 5A - 5C, which illustrates the results of computer simulations of a preferred embodiment of the present invention, as applied to barcode readers. Shown in Figure 5A is an original barcode. Shown in Figure 5B is the image captured by a bar code reader. As a result of the limited resolving capability of the barcode reader, the captured image shown in Figure 5B is substantially different from the barcode shown in Figure 5 A. Shown in Figure 5C is a reconstructed image of the barcode using the present invention, based on eight vertical replications. A comparison between Figure 5A and Figure 5C indicates that the present invention successfully overcomes the limitations of conventional barcode readers in resolving scanned barcodes.
The present specification describes the process for acquiring multiple low resolution images from a single picture, and the way to combine the multiple low resolution image data to generate a high resolution image.
Conventions and Notation
In the ensuing description it is assumed that a sensing device samples an object being acquired in lines, such as horizontal lines, using N pixels per line. The present invention is preferably applied separately in two dimensions, such as in a horizontal dimension and a vertical dimension, to increase the pixel resolutions in each of the dimensions. Since the applications of the present invention in each of the dimensions are similar, for the sake of clarity and definiteness the present specification describes a preferred embodiment of the present invention in a single dimension, such as a horizontal dimension. In
particular, rather than use two coordinates (x, y) in the discussion below, a single x coordinate is used, and rather than work with double integrals and double sums, single integrals and single sums are used. Similarly, two-dimensional doubly periodic masking of a two-dimensional array of detectors in a sensing panel is described hereinbeiow using a one-dimensional periodic masking function m(x).
When the sensing device samples a line, the detected energy of the n-th pixel is given by u[n] = ju(x + nAx)g(x)dx , n = 0, 1, ..., N-l , (1)
where u(x) denotes the captured object along a specific line, Δx denotes the pixel width and g(x) denotes the pixel sensitivity (i.e. apodization).
Multiple Image Acquisition
In a preferred embodiment of the present invention, the sensing device samples a line K times, each time with a shift in the captured object relative to the capture device by an amount of Δx I K. Letting ujjn] denote the detected energy of the n-th. pixel in the &-th sampling, Equation 1 generalizes to
n = 0, 1, .... N-l; k = 0, l, ..., K-l . (2) Together, the various samples u/r{n] provide KN samples given by y[n] = iu(x + n —)g(x)dx , n = 0, 1, ..., KN-1 . (3)
In an preferred embodiment of the present invention, the shifts in successive image acquisitions by Δx I K are implemented by using vibrations of the sensing device. For example, the sensing device can be placed on a vibrating platform. The parameters of the vibrations are estimated by vibration sensors or by use of appropriate algorithms. The vibration parameters are used to synchronize the sampling of the object so that the sampling occurs at times when the vibrated sensing device is located at appropriate sub-pixel shifts.
In an alternate embodiment of the present invention, an assembly of one or more rotated mirrors can be used to shift successive fields of view of a sensing device by sub-pixel shifts. The rotated mirrors reflect the scene to the sensing device.
In yet another embodiment, the shifts in successive image acquisitions by Δx I K can be implemented using an optical element instead of the imaging lens of the sensing device. The optical element is attached to the aperture
of the sensing device, and serves like a grating to create replicated views of an object. Typically gratings result in replicas since the Fourier transform of a grating is an impulse train of delta functions. This embodiment using an optical element to create replicated views is particularly well suited for objects that are sufficiently small so that the replicas do not overlap.
The abovementioned optical element can be designed, for example, by a multi-facet lens. It can also be designed by an algorithm such as the one described in Z. Zalevsky, D. Mendlovic and A. W. Lohmann, Gerchberg- Saxton algorithm applied in the fractional Fourier or the Fresnel domain, Optical Letters 21, 1996, pages 842 - 844, the contents of which are hereby incorporated by reference.
It should be apparent to those skilled in the art that it is not necessary that the multiple acquired images be separated by identical sub-pixel shifts. The present invention applies to multiple acquired images separated by sub-pixel shifts of arbitrary sizes. Moreover, successive shifts between acquired images do not have to be sub-pixel. The shifts can be larger than one pixel shifts, as long as the multiple acquired images are all non-integral pixel shifts from one another. Specifically, the relative shifts of the multiple acquired images with reference to a fixed origin can be set to values røΔx, rjΔx, ... , r^-iΔx, as long as the differences r - η are non-integral for any distinct indices i and;. The case of equal sub-pixel shifts described above corresponds to r# = kl K.
Reference is now made to Figure 6, which illustrates three samplings of an object by a sensing device, separated by sub-pixel shifts, in accordance with a preferred embodiment of the present invention. The object is indicated by a curve u(x), denoted by reference numeral 610. The curve u(x) denotes the color of the object along a line, as a function of position, x. Also indicated in Figure 6 is a curve g(x) denoting the sensitivity of a pixel, denoted by reference numeral 620. The curve g(x) is a characteristic of the sensing device, and indicates how color values are localized by averaging.
Each row in Figure 6 indicates a sampling. The first row samples the object u(x) at positions indicated by dashed vertical lines 630. The six pixel values are denoted by UQ[0], uoflj, ... of5J. The second row samples the object u(x) at shifted positions indicated by dashed vertical lines 640. The six pixel values are denoted by ujfOJ, ujfl], ... u [5]. The third row samples the object u(x) at shifted positions indicated by dashed vertical lines 650. The six pixel values are denoted by U2f0j, 2flj, ... U2f5j.
The samplings indicated by the second and third rows is often referred to as "sub-pixel" sampling, since these samplings are centered at fractional pixel locations. Specifically, in Figure 6, the samples in the second row are centered at the one-third pixel locations, and the samples in the third row are centered at the two-thirds pixel locations. The value of K for the geometry illustrated in Figure 6 is K = 3. The three sets of samples together form the eighteen samples yfOJ = u0[0], yfl] = ujfOJ, y[2] = u2[0], y[3] = u0[l], ..., y[17] = U2[5], form the full set of samples at all of the pixel thirds locations.
Ax These eighteen samples y[n] are used to reconstruct the values u(0), u( — ),
The present invention uses the samples y[n] to approximately
Ax reconstruct the values of u(n — ) . Determining these reconstructed values
K achieves super-resolution with a resolution enhancement factor of K, since the sensing device has a resolution of N pixels per line, and the reconstructed signal has KN pixels per line.
Ax In order to describe the reconstruction of the values u(n — ) from the samples y[n], a classical result from signal processing is used, which relates the discrete Fourier transform of a sampled analog signal to the continuous Fourier transform of the analog signal.
A Bit of History: Frequency Representation of Sampling
In the ensuing description, s(t) is used to denote a continuous time signal, and S(jΩ) is used to denote its continuous-time Fourier transform, defined by:
S(/Ω) = f s(t)e-iς dt . (4)
Periodic sampling of s(t) with a sampling period of T produces a discrete time signal sfnj, defined by: sfn] = s(nT) . (5)
The discrete-time Fourier transform of s[n] is denoted by S(e3ω), and is defined by:
S(eiω) = js[n]e-iωn . (6)
An age-old result, referred to as the Poisson Summation Formula and which is one of the most prominent formulas used in discrete signal
processing, provides a relationship between the continuous-time Fourier transform S(jΩ) of a continuous time signal and the discrete-time Fourier transform S(eJω) of the sampled signal. Specifically, the Poisson Summation Formula states that
In fact, more generally the Poisson Summation Formula establishes that s(t + nT) = ; S(- j )e-' τ , (8) of which Equation 7 above is a special case corresponding to t = 0.
A simple proof of Equation 8, as indicated on page 210 of Rudin, W., " Real and Complex Analysis: Second Edition," McGraw-Hill, 1974, is as follows. The function /ft) defined by
is a periodic function with period T. As such, it has a standard Fourier series expansion of the form
where the Fourier coefficient c is given by
1 r .■Zitkt
= [ f(t)e' T dt . (11)
T
Upon substituting Equation 9 into Equation 11 it can be seen that
from which Equation 8 follows when Equation 12 is substituted back into Equation 10.
If the function s(t)e τ is used in Equation 8, instead of the function s(t), one obtains the result
By setting t = 0 in Equation 13 one arrives at the familiar sampling equation S(β>-) β l £s(;£ ≥* , (14)
■* *— 00 -t which appears as Equation 3.20 in A. V. Oppenheim and R. W. Schafer, " Discrete-Time Signal Processing," Prentice Hall, 1989. Equation 14 indicates that the discrete Fourier transform of a sampled analog signal is comprised of periodically repeated copies of the continuous Fourier transform of the analog
signal. The copies are shifted by integer multiples of the sampling frequency and superimposed. Equation 14 is often expressed in terms of delta functions; namely, that the Fourier transform of an impulse train ∑ δ(t - nT) is itself an impulse n-s-co
It is noted that Equation 14 can also be derived directly from
.ox
Equation 7 by using s(t)e τ instead of s(t).
For values of ω that are small enough to avoid significant aliasing in Equation 14, say ω < Ωs , one can approximate the sums on the right hand sides of Equations 13 and 14 by the term with k = 0. When this approximation is made, it in turn leads to the approximation
∑s(t + nT)e-iβm * S(e]ω)e τ , <ns - (15)
It is noted that as the sampling period, T, decreases, the approximation in Equation 15 becomes more accurate. The increase in accuracy is due to the fact that the aliasing, or overlap, between the copies on the right hand sides of
2π Equations 13 and 14 decreases, since the shifts — between copies increase.
The present invention uses a version of Equation 15 to obtain super-resolution. Specifically, instead of sampling s[n] - s(nT) as in Equation 5, the sampling is done through an integral y[n] = ^s(t + nT)g(t)dt , (16) where g(t) represents the sensitivity, or apodization, of a pixel. Multiplying both sides of Equation 15 by g(t) and integrating, results in ω ω
Y(e'ω) ^ S(e'ω)G(-j^) , < Ω„ (17) which is the form used in the present invention.
Referring back to Equation 3, it follows from Equation 17, upon
Ax replacing 7" by — , that K
Y(ejω) ωK
< Ω, , (18)
Ax
G(-] —) Ax
Ax where U(elω) is the discrete Fourier transform of the samples u(n—-) , sought to
K be reconstructed; namely,
U(eja) = ∑ u(n )e-iωn . (19)
Λ=-00 IS.
The Fourier transform Y(efω) is known from the samples yfnj in Equation 3, and the Fourier transform G(-j )is determined from knowledge of the sensitivity
Ax function g(x). It is noted that as K increases, the approximation in Equation 18 becomes more accurate.
Regarding the Fourier transform Y(eJω), the values of yfn] set forth in Equations above are only defined for indices n = 0, 1, ..., KN-1. For indices n outside of this range, yfn] is preferably defined to be zero. With such a convention, the Fourier transform Y(eϊω) is given by the finite sum
£V-t
Y(eJω) = ∑y[n]e-"- . (20) n*-n
In a preferred embodiment, the present invention uses Equation
18 to determine the Fourier transform U(eϊω), and then reconstructs the values
Ax u(n — ) , n = 0, 1, ..., KN - 1, from U(eJω) by using an inverse discrete Fourier K transform. Preferably a Fast Fourier Transform (FFT), as is familiar to those skilled in the art, is used to invert the spectral values U(eJω). The reconstruction is approximate, since Equation 18 is itself only an approximation. Since it is necessary to reconstruct KN values of u(x), it is accordingly necessary to apply
2π Equation 18 at KN frequencies that are spaced apart one from the next, smce the Fourier transform U(efω) is periodic with period 2π. In a preferred embodiment of the present invention, Equation 18 is applied at the KN frequencies
2*ά. ω = — , k = 0, l, ..., KN- l. (21)
KN
When the Fourier transform U(elω) is inverted it is possible that
Ax values of u(n — ) are non-zero outside of the index range n - 0, 1, ..., KN- 1. As K
Ax it is only the values of u(n — ) within the index range n - 0, 1, ..., KN - 1 that are
K of interest, the values outside this index range can be ignored.
An apparent difficulty arises in implementing Equation 18 if there are frequencies ω as given in Equation 21, at which G(-j ) = 0 , since this
Ax would entail division by zero at such frequencies.
The function G(-jΩ) is typically a real-valued function, since the pixel sensitivity function g(x) is typically symmetric about x - 0. Since a pixel is
typically more sensitive to light in its central area and less sensitive in its outer regions, the pixel sensitivity function g(x) is typically "bell-shaped," similar to Gaussian functions. In case g(x) happens to be a Gaussian function, its Fourier transform has no zeros, and there no divisions by zero are encountered in implementing Equation 18. However, for other sensitivity functions, such a difficulty can arise.
The present invention avoids such a difficulty by using specially constructed masks, as described hereinbelow.
Masking
In order to avoid division by zero in Equation 18, the present invention preferably emplovs an optical device to remove zeros of G(-j — ) that
Ax are less than the frequency Ωs. The frequency Ωs is the maximal frequency in the reconstructed data.
The present invention preferably attaches a mask in the form of a fine transmission grating with a period of Ax to the sensing device panel. As mentioned hereinabove, the mask under discussion is really a two-dimensional doubly periodic mask that covers a two-dimensional array of detectors. For the sake of clarity and definiteness it is being described as a single dimensional periodic mask, with the understanding that the one-dimensional analysis presented herein applies to each of the two dimensions of the sensing device panel.
Reference is now made to Figure 7, which is a simplified illustration of a mask attached to a sensor panel of array detectors in an image sensing device, in accordance with a preferred embodiment of the present invention. A sensor panel 700 in an image sensing device contains a two- dimensional array of detectors 710. Detectors 710 are depicted as individual cells within sensor panel 700. The array of detectors is periodic, with periodic spacings Δx and Δy in the horizontal and vertical dimensions, respectively. The spacings Δx and Δy correspond to the pixel spacings in the sampled digital image obtained from the image sensing device. A masking pattern 720 in the form of a fine sub- pixel transmission grating is used to generate a doubly periodic mask, with respective periods Δx and Δy in the horizontal and vertical dimensions. In a preferred embodiment of the present invention, the doubly periodic mask is attached to sensor panel 700. Each detector 710 is masked by the same masking pattern 720.
Use of a mask decreases the effective size of each pixel so that the Fourier transform G(jΩ) of its sensitivity function has a wider band, and its zeros accordingly move to higher frequencies. The apodized pixel, determined by the transmission pattern of the grating, is preferably designed in an iterative manner, as described hereinbelow, so that the bandwidth of the Fourier transform G(jΩ) of its sensitivity function is wider than the maximal frequency to be reconstructed. Use of a transmission grating has the disadvantage of decreasing the energy sensed at each pixel, since it blocks out some of the illumination.
A mask is described mathematically by a spatial function m(x), preferably taking values 0 and 1. The effect of a mask is to modify the pixel sensitivity function from g(x) to p(x) = g(x)m(x). An objective of choosing a suitable mask is to obtain a function p(x) whose Fourier transform does not have zeroes at frequencies that are between 0 and Ω
s, so that when the Fourier transform of the modified pixel sensitivity function p(x) is used in
Equation 18 instead of , there is no division by zero.
A rationale for using a mask to avoid zeroes of G is that typically an effect of a mask is to add frequencies that may have not been present beforehand.
To construct a mask as desired, the present invention preferably uses a grating in the form of a plurality of optically transparent slits. m(x) = l-∑rect≠-)® δ(x-xt) , (22)
where M is the total number of slits, x } X2, ..., xM are ■***ne locations of the slits and όxi is the width of the i-th slit. The function rectQ is given by fl, -1/ 2 < Z < 1/ 2, rect(z) = (23)
[0, otherwise and 0 denotes convolution.
Reference is now made to Figure 8, which illustrates the shape of a mask in accordance with Equation 22. Such a mask consists of M slits 810, centered at positions xj, x2, ..., xM-> eac*n having respective widths of δx[.
Preferably,
' the value of M is a number between zero and K, and the values & are each approximately equal to the sub-pixel size Δx I K. As can be seen in Figure 8, the values of the mask m(x) are 0 inside the slits and 1 outside the slits. The total width of the mask if Δx; namely, the width of a pixel. The mask is intended to affect the shape of g(x) at sub-pixel granularity. In practice, the granularity is discretized to Δx I K; namely, the sub-pixel shift between successive images
acquired by the image sensing device. For example, referring back to Figure 3C, the mask used in reconstructing the image shown therein is [0 1 0 1 1 0 0 0], each of the eight entries in this array representing a sub-pixel size of one eighth of a pixel. Thus the function m(x) for the mask used in Figure 3C is given for 0 < x < Ax by ≤_ x
Λ <. -
g A„x
Λ j (24)
The sensitivity function of each masked pixel is given by p(x) = g(x)m(x) , (25) and it can be seen that the Fourier transform oip(x) is given by P(jΩ) = P(jΩ;x
x,x
2,...,x
M ; δx
x,δx ,...,δx
M ) = , (26)
where sinc(z) denotes the function sinc(z) = . z
Preferably, to determine an optimal mask, a predetermined value of M is selected, and a search is made for position values xj, X2, ..., XM a*10* widths δxj, δx2, ■■■, δxj^ for which the |R(;'Ω)| is bounded away from zero as much as possible. Specifically, define
, (27)
where Ω
s denotes the frequency up to which no significant aliasing occurs, as defined above. Then optimal values for xj, X2, ■■■
> - and &L ^°2
> ••■■ ^M
ιe preferably determined by
(x, ,x2,...,xM ;δxvδx2,...,δxM ) =
= aτg- m?xxσ(x ,x2,...,xM ;δxl,δx2,...,δxιU ) , (28) where arg-max denotes the arguments xj, X2, -.., XM anc^ ^L ^2> ••■> ^M which maximize the function σ.
The search range over which optimal values of xj, X2, •••, *M an(^ δx , δx2, ..., δxfyf are sought is such that the total δxj + δx2 + ... + δxfyf of all the widths does not exceed the pixel width Δx. Additionally, each individual width δχt- must not be less than 2λ, where λ is the wavelength of the incoming light - otherwise, the light will not be affected by the mask. Moreover, construction of a mask and attaching it to the detectors may further limit the widths δx[ to exceed a minimum value of δxm[n.
From energy considerations, it is clear that on the one hand, it is desirable that each width δx[ be as small as possible, since δx[ is proportional to
the energy blocked by the mask. On the other hand, this must be traded off against the condition that jR(y'Ω)| be bounded away from zero.
Specifically, if Et denotes the total light energy for a typical viewing target, when viewed without a mask, then the energy detected when a mask is present is given by
M
E = E,(1- _L— ) , (29)
Ax since the energy blocked by the slits of the mask is proportional to the total width of the slits combined.
Thus for a given minimum energy Em[n, required to activate the detectors, the widths δx[ are constrained to satisfy
Condition (30), together with the condition that δxt ≥ δxmin , for i = 1, 2, ..., M, determine the search range for widths δxj, δx2, ■■-, δ M over which a maximum of i, X2, ..., XM; δxh &2> ■ ■> δ M) is sought.
When values of xj, X2, ..., XM and δxj, δx2, •••, δ*M are determined according to Equation 28, Equation 22 is used to generate an optimal mask.
In and alternative embodiment of the present invention, rather than maximize σ(x , X2, -.., XM> &L ^2> ■••> ^M)* ^e values of xj, X2, •••, M and δxj, δx2, ..-, δxM∞t determined by maximizing a weighted function oixj, X2, ..., XM; δxj, δx2, ..., M) δx , δx2, ..., δxjtf), where w is an appropriate weight function. For example, w can put higher weight on smaller values of the widths δx[, so that preference is given to masks having slits with smaller widths.
Reference is now made to Figures 9A - 9J, which illustrate the use of a mask in obtaining super-resolution, in accordance with a preferred embodiment of the present invention. Figure 9A illustrates an input object, such as a barcode, represented as a function u(x). Figure 9B illustrates the Fourier transform of u(x). Figure 9C illustrates a pixel sensitivity function g(x). Figure 9D illustrates the Fourier transform of g(x).
Figure 9E illustrates conventional sampling of u(x) as a digital signal, without using super resolution enhancement. Figure 9F illustrates the Fourier transform of the digital signal of Figure 9E. Figure 9G illustrates sampling of u(x) as a digital signal with a resolution eight times higher than the sampling illustrated in Figure 9E. Figure 9H illustrates the Fourier transform of
the digital signal of Figure 9G. Figure 91 illustrates a mask used in accordance with a preferred embodiment of the present invention.
Figure 9J illustrates the Fourier transform of a masked pixel, using the mask of Figure 91. By comparing Figure 9D with Figure 9J it can be seen that the spectrum of the masked pixel does not contain zeros within the region displayed. It can further be seen that an effect of the masking is to expand the spectrum of g(x).
Reference is now made to Figure 10, which illustrates an enhanced image obtained by applying super-resolution to the low resolution CCD captured image illustrated in Figure 3B, without the use of a mask. As can be seen in Figure 10, artifacts in the form of vertically oriented white stripes arise due to the presence of zeroes of G(-j — ) . When a mask is applied in accordance with
Ax a preferred embodiment of the present invention, the zeroes of G(-j ) are
Ax removed, and the result obtained is the image shown in Figure 3C. It may thus be appreciated that the use of a mask serves to remove the artifacts indicated in
Figure 10.
Improved Approximation
As explained hereinabove, the approximation in Equation 18 is based on the assumption that ω is small enough to avoid aliasing in Equation 14. A more accurate approximation can be made by taking into consideration the aliasing between adjacent copies of S in the right-hand side of Equation 14. However, in order to obtain the increase in accuracy, it is necessary to increase the sampling rate.
It is appreciated by those skilled in the art that the assumption that aliasing between copies of S in Equation 14 is due to overlap between adjacent terms S(j-) and S(; — - — ) is equivalent to the assumption that reduction of the sampling period from T to 772 removes aliasing. Moreover, the information at frequency ω in the discrete Fourier transform of a signal sampled with sampling period T corresponds with the information at both frequencies ω/2 and ω/2 + π in the discrete Fourier transform of the signal sampled with sampling period 772. A precise formulation of this relationship appears in the above referenced Oppenheim and Schafer as Equation 3.76, which describes a frequency-domain relationship between the input and the output of a sampling rate compressor ("downsampling").
These observations above are a basis for an improved approximation used in accordance with a preferred embodiment of the present invention, as described hereinbelow. Specifically, replace the sampling period, T, by 772 in Equation 16: y[n} = (t + n^)g(t)dt . (31)
Under the assumption that aliasing in Equation 14 is due to overlap between adjacent terms S(j-) and S(j ) , the approximation to S(eJω) becomes
S(e»)*X(j )XsU ! - 0 < ω < π (32) for positive values of ω, and
for negative values of ω.
Using similar analysis to that embodied in Equations 13 - 18, it can be shown that
.ω
U(eJω) * - Y(e yGHf.)+y(e'÷yGH<£L E.) 2 Ax . Ax
0 < ω < π , (34) for positive values of ω, and
-π < ω < 0 , (35) for negative values of ω. Equations 34 and 35 determine U(eiω) more accurately than Equation 18, since they incorporate aliasing between adjacent copies of S in the sampling representation for S(eiω). However, they require sub-pixel shifts of
Ax
— between successively captured images. This corresponds to a factor of 2K
2A increase in data acquisition, whereas the resolution is only enhanced by a factor of
K.
An alternative approach to improving the approximation in
Equation 18 by taking into consideration the aliasing between adjacent copies of S in the right-hand side of Equation 14, uses an alternating ("checkerboard") mask geometry. As mentioned hereinabove, the above discussion is based on use of a periodic mask, with a period of Δx. By using instead a mask with a period oϊ2Δx, one effectively generates two pixel sensitivity functions, gj(x) and g2(x), where g (x) applies to even-numbered detectors and g∑(x) applies to odd-numbered
detectors. Specifically, g (x) corresponds to g(x)m(x) and g2(x) corresponds to g(x)m(x+Δx), where g(x) is the pixel sensitivity function for the image sensing device and m(x) is the overall mask. Unlike the above case where the mask has a period of Δx, when the mask has a period of 2Δx there are effectively two masks operating (namely, the mask itself and the mask shifted by Δx), and consequently the even and odd-numbered pixels are processed differently.
Reference is now made to Figure 11, which is a simplified illustration of an alternating (" checkerboard") mask attached to a sensor panel of array detectors in an image sensing device, in accordance with a preferred embodiment of the present invention. A sensor panel 1100 in an image sensing device contains a two-dimensional array of detectors 1110. Detectors 1110 are depicted as individual cells within sensor panel 1100. The array of detectors is periodic, with periodic spacings Δx and Δy in the horizontal and vertical dimensions, respectively. The spacings Δx and Δy correspond to the pixel spacings in the sampled digital image obtained from the image sensing device.
In distinction to Figure 7, Figure 11 illustrates two different masking patterns 1120 and 1130 in the form of fine sub-pixel transmission gratings that are arranged in an alternating checkerboard pattern to generate a doubly periodic mask, with respective periods 2Δx and 2Δy in the horizontal and vertical dimensions. In a preferred embodiment of the present invention, the doubly periodic mask is attached to sensor panel 1100. The detectors 1110 are alternatively masked by the different masking patterns 1120 and 1130.
When using such a mask, Equation 2 (which is one-dimensional) separates into two equations, c Ax uΛn] = ju(x + nAx + k—)g, (x)dx , for even values of n; k = 0, 1, ..., K-l, and (36) uk[n] = \u(x + nAx + k — )g2(x)dx ,
for odd values of n; k - 0, 1, ..., K-l. (37)
By capturing twice as many images as previously, namely, 2K images, each detector moves over a full period of the mask, namely, 2Δx. Consequently, two sequences of the form of Equation 3 are acquired, namely, y,[n] = g(x)dx , n = 0, 1, ..., KN-1 , and (38)
y,[n] = ϊu(x + n—)g(x)dx , n = 0, 1, ..., KN-1 . (39)
-i κ
In turn, using the same analysis embodied in Equations 13 - 18, one arrives at the two conditions
v , ,*
a, . 1
c , . cύ. . . co 1 . ω 2π . ω - 2π. . ... Yz(e ) -S(j j)G
z (- -) +
S(J — )G
2(-j ——) , (41) for positive values of ω, 0 < ω < π, and y.(^
! ) S(j γ)G
1(-j j) + -S(j -γ-)G
l(-j -j—) (42)
Yι(e ) * S(j γ)G2(-j j) + yS(j )G2(-j ) , (43) for negative values of ω, -π < ω < 0. These equations can be used to solve for the individual copies S(j-) and S(; — - — ) , and then to recover S(el'ω) from
Equations 32 and 33. The final result obtained, expressed in terms of U(eJω), is
for positive values of ω, 0 < ω < π, and
for negative values of ω, -π < ω < 0. In Equations 44 and 45, the terms δ<jG(-jΩ) denote the difference δ*G(-jΩ) = G(-jΩ) - G(-j(Ω- Ψ)) . (46)
In distinction to the approach using Equations 34 and 35, whereby the improvement in accuracy stems from reducing the sub-pixel sampling shift in
Ax Ax half, from — to — , the approach using Equations 42 and 43 obtains K 2K
Ax improvement in accuracy by acquiring 2K sampled images, each shifted by K
Both approaches require collection of sampled data that is 2K as much data as in a single image captured by a sensing device. They differ in the types of masks used, and in the relative shifts between the multiple images acquired.
When implemented with full two-dimensional processing, both approaches described hereinabove require 4K? as much data as in a single image
captured by a sensing device, in order to enhance image resolution by a factor of two in each pixel dimension.
Geometric Super Resolution vs. Diffractive Super Resolution
The multiple image acquisition described above and illustrated in Figure 6 is geometric in nature; i.e., it is based on viewing the same spatial information several times, such that in each view the object is shifted by a sub- pixel amount. It can be implemented by shifting positions of the sensing device (" temporal multiplexing") or, equivalently, by shifting fields of view ("spatial multiplexing") using an optical element that replicates views of the object with sub-pixel shifts between the replicas. The resolution of a system limited by geometry corresponds to the detection sampling system; i.e., the resolution of system sensors.
In distinction, multiple image acquisition can also be diffractive in nature. The resolution of a system limited by diffraction corresponds to the finest detail that can pass through the system without being distorted, and is proportional to the size of the aperture in the optical system. Thus the resolution of the human visual system is limited by the extent to which the eyes are open. When an observer squints his eyes, the resolution of a scene being viewed decreases, since the size of the opening is decreased.
For example, resolution enhancement can be implemented by an assembly of two moving rotated gratings, a first grating attached to the object being captured, and a second grating attached to a sensor of the sensing device, and moving in a direction opposite to that of the first grating. Such an assembly can be used to enhance the resolution of a diffractive system, since it effectively increases the size of the aperture in the optical system. The movement of the grating attached to the object encodes its spatial information by a Doppler-like effect, and allows the information to be transmitted through a restricted aperture of an imaging lens. The decoding of the information is obtained from the second grating attached to the sensing device, which moves in a direction opposite to that of the first grating. Such an assembly of two rotated gratings is practical for microscopic applications.
However, in many imaging systems the object is distant from the observer, and attachment of a grating to the object is impractical. When applied to scanning systems such as barcode readers, the present invention preferably simulates attachment of a moving grating to the object by illuminating the object with an illumination having a spatial structure of a grating. The illumination is
shifted with time in order to simulate temporal motion of the grating. A specially designed diffractive optical element is attached to the illumination source so that a grating pattern appears in the object plane. The grating pattern is moved by phase modulating the light source, since linear phase modulations appear as shifts in a far field.
Additional Considerations
In reading the above description, persons skilled in the art will realize that there are many apparent variations that can be applied to the methods and systems described. For example, the integral in Equation 3 may be replaced by a discrete Gabor transform over u with window function g; namely, y[n] = ∑g(P (m - P) ) . (47)
Similarly, one may use instead a wavelet or Mellin transform, as is well known to persons skilled in the art.
Additionally, the shifts between the multiple images that are acquired by the sensing device are not required to be equal. Furthermore, the shifts between the multiple images are not required to be horizontally or vertically disposed, nor are they required to be disposed in a fixed direction. Wavelet transforms and Mellin transforms are particularly advantageous when dealing with unequal shifts between acquired images.
Additionally, the reconstruction algorithm used in the present invention, for generating a high resolution image from multiple low resolution images, need not be separable. That is, it need not be comprised of horizontal and vertical reconstructions. Rather, the reconstruction algorithm used in the present invention can be a genuine two-dimensional algorithm.
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made to the specific exemplary embodiments without departing from the broader spirit and scope of the invention as set forth in the appended claims. Accordingly, the specification and drawings are to be regarded in an illustrative rather than a restrictive sense.